CN100534001C - Sound collecting method and sound collecting device - Google Patents

Sound collecting method and sound collecting device Download PDF

Info

Publication number
CN100534001C
CN100534001C CNB2004800001742A CN200480000174A CN100534001C CN 100534001 C CN100534001 C CN 100534001C CN B2004800001742 A CNB2004800001742 A CN B2004800001742A CN 200480000174 A CN200480000174 A CN 200480000174A CN 100534001 C CN100534001 C CN 100534001C
Authority
CN
China
Prior art keywords
signal
covariance matrix
sound source
sound
filter coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB2004800001742A
Other languages
Chinese (zh)
Other versions
CN1698395A (en
Inventor
小林和则
古家贤一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Publication of CN1698395A publication Critical patent/CN1698395A/en
Application granted granted Critical
Publication of CN100534001C publication Critical patent/CN100534001C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

When a speech section is detected by a state judging section 14, the positions of the sound sources(91 to 9K) are determined by a sound source position detecting section 15. A covariance matrix of the received sound signal is determined correspondingly to each sound source by a covariance matrix calculating section 17 and stored in a covariance matrix storage section 18 correspondingly to the sound source. The levels of the collected sounds from the sound sources are deduced from the covariance matrices by a collected sound level deducing section 19. The filter factors are determined from the deduced collected sound levels and the covariance matrices by a filter factor calculating section 21 so that the output level may be a predetermined value. The determined filter factors are assigned to filters (121 to 12M). The received sound signals from the microphones are processed by means of the filters, and the results of the processings are added by an adder 13 and outputted as a speech transmission signal. Thus, irrespective of the positions of the sound sources, a speech transmission signal of desired level can be generated.

Description

Sound acquisition methods and sound deriving means
Technical field
The present invention relates to sound acquisition methods and sound deriving means, more particularly, relate to the sound acquisition methods and the sound deriving means that from a plurality of voice sound sources, obtain voice and before output, regulate their volume.
Background technology
For example, in the telecommunications meeting, people participate at different remote zones, if at each local voice that only go to obtain a plurality of participants that are sitting in diverse location in each remote zone with a microphone, the signal level that receives can be because the participant be apart from the different of the distance difference of microphone and their speech volume and difference is huge.The voice that reproduce at long-range receiving terminal are huge with the difference the participant of transmission ends on volume, sometimes, almost a participant and another participant can not be distinguished.
Figure 17 press calcspar formal description the basic structure of disclosed conventional acoustic deriving means, for example, Japanese patent application Kokai Publication 8-250944.Traditional sound deriving means by microphone 41, power calculation portion 42, portion 43 is set magnification factor (factor) and amplifier 44 is formed.Long-time (long-time) average power P of the signal that is received by microphone 41 calculates in power calculation portion 42 AveLong-time average power can by with signal square and will square after output obtain by the time integration.Then, magnification factor is provided with the long-time average power P of portion 43 based on the received signal that calculates by power calculation portion 42 AveWith preset expected transmission level P OptSet magnification factor G.Magnification factor G can pass through, and for example Xia Mian equation (1) calculates.
G=(P opt/P ave) 1/2 (1)
Amplifier 44 amplifies the signal of microphone reception and exports amplifying signal by the magnification factor G that sets.
By handling above-mentioned step, output signal power reaches expectation transmission level P Opt, can adjust automatically according to its volume.But, traditional sound acquisition methods, long-time average power determines because magnification factor is based on, and can produce several delays by tens seconds in the process of setting suitable magnification factor.Therefore, under the situation that a plurality of speakers attend and their voice are obtained with different level by microphone, can have problems promptly as long as the speaker becomes another from one, setting suitable magnification factor can postpone, and the result causes voice reproduced under inappropriate volume.
Target of the present invention provides sound deriving means and sound acquisition methods, in addition a plurality of speaker exists and their voice are obtained with varying level by microphone situation under, automatically the volume of adjusting each voice is to suitable value, and the program that realizes this method.
Summary of the invention
A kind ofly obtain the sound acquisition methods of sound by the microphone in a plurality of passages from each sound source, comprising according to the present invention:
(a) condition judgement step comprises the language determination step of judging language (utterance) section from the signal that described a plurality of passage microphones are received;
(b) sound source position detection steps promptly when having judged the language section in described language determination step, is surveyed the position of described each sound source from described received signal;
(c) frequency domain switch process, the conversion of signals that is about to described reception is a frequency-region signal;
(d) covariance matrix calculation procedure is promptly calculated the covariance matrix of described frequency domain received signal;
(e) covariance matrix storing step is promptly stored described covariance matrix based on the result of detection of described sound position detection steps to each sound source;
(f) filter coefficient calculation procedure is promptly calculated filter coefficient in described a plurality of passage based on the covariance matrix of described storage and predetermined output level;
(g) filter step is promptly carried out filtering to the received signal in described a plurality of passages respectively by the filter coefficient in described a plurality of passages; With
(h) stack step, the filtering result who is about in described a plurality of passage all superposes, and the result after will superposeing is provided as the transmission signal.
According to the present invention, obtain the sound deriving means of sound by the microphone in a plurality of passages that are positioned over acoustic space (acoustic space) from each sound source, comprising:
Condition judgement portion comprises the language detection unit, is used for determining the language section from the signal that is received by the microphone of described a plurality of passages;
Sound source position detection portion is used for being judged the position of back from described each sound source of described received signal detection when language continues section by described language detection unit;
The frequency domain converter section is used for described received signal is converted to frequency-region signal;
The covariance matrix calculating part is used to calculate the covariance matrix of the described frequency domain received signal of described a plurality of passages;
The covariance matrix storage part is used for based on the result who surveys by described sound position detection portion described each sound source being stored described covariance matrix;
The filter coefficient calculating part is used for calculating the filter coefficient of described a plurality of passages so that the transmission signal level of described each sound source becomes the level of expectation by the covariance matrix that utilizes described storage;
The filter of described a plurality of passages is used for respectively the signal that described microphone receives being carried out filtering by the filter coefficient that utilizes described a plurality of passages; With
Adder, be used for the output of the described filter of described a plurality of passages all stack and will superpose after output be provided as the transmission signal.
According to a second aspect of the invention, a kind of voice acquisition methods, be used for obtaining voice by the microphone of at least one passage of acoustic space from least one sound source, and the signal that receives in this space reproduces by loudspeaker, comprising:
(a) state determining step is promptly determined the language section and is received section from sound and the described signal that receives that the described microphone by described at least one passage obtains;
(b) frequency domain switch process, the conversion of signals that is about to described signal that obtains and described reception is a frequency-region signal;
(c) covariance matrix calculation procedure is promptly calculated covariance matrix and is calculated covariance at described reception section at described language section according to obtain signal and the received signal of described frequency domain;
(d) covariance matrix memory phase is promptly respectively to described language section and the described covariance matrix of described reception storage of sectors;
(e) filter coefficient calculation procedure, promptly the covariance matrix of being stored based on described language section and described reception section obtains the calculated signals filter coefficient to described at least one passage described and to described received signal calculating filter coefficient, so that echo, it is the part that is included in the received signal in the described received signal, can be eliminated;
(f) filter step promptly utilizes the filter coefficient of described received signal and the described filter coefficient that obtains signal of described at least one passage that described received signal and the described signal that obtains are carried out filtering; With
(g) stack step, the signal that is about to described filtering all superpose and provide output after the stack as sending signal.
Sound deriving means according to a second aspect of the invention comprises:
Microphone in a passage, is used for obtaining voice and being used to export the signal that obtains from sound source at least;
Loudspeaker is used to reproduce the signal of reception;
Condition judgement portion is used for judging the language section and receiving section from the signal of described signal that obtains and reception;
The frequency domain converter section is used for obtaining signal and described received signal is converted to frequency-region signal with described;
The covariance matrix calculating part is used for respectively the described calculated signals covariance matrix that obtains and receive to described language section and described reception section;
The covariance matrix storage part is used to be respectively described language section and the described covariance matrix of described reception storage of sectors;
The filter coefficient calculating part, be used for covariance matrix based on described storage and be described at least one passage obtain the calculated signals filter coefficient and for described received signal calculating filter coefficient to eliminate the echo in the described received signal;
Obtain traffic filter and received signal filter, obtain signal and described received signal has been set filter coefficient for described within it, be used for described signal and the described received signal of filtering obtained of filtering; With
Adder is used for all superposeing described output of obtaining traffic filter and described received signal filter, and is used to provide signal after the stack as sending signal.
According to the present invention, even when existing a plurality of speakers and their sound to be obtained with different level by a plurality of microphones, the directivity of microphone can be correct is controlled to each speaker is regulated the volume of voice automatically to suitable value.
Description of drawings
Fig. 1 is the calcspar of explanation according to the sound deriving means of first embodiment of the invention.
Fig. 2 is the calcspar of an example of the structure of condition judgement portion 14 in the exploded view 1.
Fig. 3 is the calcspar of an example of the structure of sound source position detection portion 15 in the exploded view 1.
Fig. 4 is the calcspar of an example of the structure of exploded view 1 median filter coefficient calculations portion 21.
Fig. 5 is the flow chart of first example of showing the sound acquisition methods of the sound deriving means utilize Fig. 1.
Fig. 6 is the flow chart of second example of showing the sound acquisition methods of the sound deriving means utilize Fig. 1.
Fig. 7 is the flow chart of the 3rd example of showing the sound acquisition methods of the sound deriving means utilize Fig. 1.
Fig. 8 is the calcspar of explanation according to the sound deriving means of second embodiment of the invention.
Fig. 9 is the calcspar of an example of the structure of condition judgement portion 14 among Fig. 8.
Figure 10 is the calcspar of explanation according to the sound deriving means of third embodiment of the invention.
Figure 11 is the calcspar of the example of the structure of condition judgement portion 14 in the exploded view 7.
Figure 12 is the calcspar of explanation according to the sound deriving means of four embodiment of the invention.
Figure 13 is the calcspar of explanation according to the sound deriving means of fifth embodiment of the invention.
Figure 14 is a calcspar of showing an example of the structure of weighting factor configuration part 21H among Figure 14.
Figure 15 is the calcspar of another example of the structure of weighting factor configuration part 21H in the exploded view 4.
Figure 16 is the calcspar of an example of the structure of the 21J of albefaction portion in the exploded view 4.
Figure 17 is a calcspar of showing an example of the covariance matrix storage part of using 18 when each embodiment contains the covariance matrix average function.
Figure 18 A is illustrated in the figure that first embodiment handles the analog voice waveform of speaker A before and B.
Figure 18 B is illustrated in the figure that first embodiment handles the analog voice waveform of speaker A afterwards and B.
Figure 19 shows the reception of simulation and sends speech waveform, and it has been showed according to the 3rd embodiment and eliminates echo and noise.
Figure 20 is the calcspar of the traditional sound deriving means of explanation.
Embodiment
First embodiment
Fig. 1 is the calcspar according to the sound deriving means of first embodiment of the invention.
The sound deriving means of this embodiment comprises the microphone 11 that is placed in the acoustic space in M the passage 1To 11 M, filter 12 1To 12 M, adder 13, condition judgement portion 14, sound position determination portion 15, frequency domain converter section 16, covariance matrix calculating part 17, covariance matrix storage part 18 obtains sound level Rating and Valuation Department 19 and filter coefficient calculating part 21.
In this embodiment, voice sound source 9 in acoustic space 1To 9 KThe position be detected, the covariance matrix that obtains signal is then calculated and is stored each voice sound source in frequency domain, and these covariance matrixes are used to calculating filter coefficient.These filter coefficients are used for the signal that filtering is obtained by microphone, thereby the signal that control comes from each voice sound source has fixing volume.In this embodiment, although do not specify, suppose from microphone 11 1To 11 MOutput signal be digital signal, it is to be converted to by digital-analog convertor under preset sampling frequency by the signal that microphone obtains.This hypothesis also is applicable to other embodiment of the present invention.
At first, condition judgement portion 14 surveys from each by microphone 11 1To 11 MSurvey language section (utterance period) in the signal that receives.For example, as shown in Figure 2, all are from microphone 11 in condition judgement portion 14 1To 11 MThe signal that receives all stacks up by the 14A of stack portion, and the output after the stack is respectively applied to the short time average power calculating 14B of portion and obtains short time average power (for example, about scope was at 0.1 to 1 second) P with the long-time average power calculating 14C of portion then AvSWith long-time average power (for example, about scope was at 1 to 100 second) P AvL, the short time average power and the ratio of average power then for a long time, R P=P AvS/ P AvL, in the 14D of division portion, calculate, and in language detection unit 14E power ratio R PWith predetermined language limiting value R ThUCompare; If the power ratio value of overstepping the extreme limit, the former is judged as indication language section so.
When being language sections by the result of condition judgement portion 14 judgements, sound source position detection portion 15 estimates the position of sound sources.Be used to estimate that the method for sound source position has, for example, the crosscorrelation method.
Suppose that M (M is equal to or greater than 2 integer) represents the number and the τ of microphone IjRepresented by i and j microphone 11 iWith 11 jThe measured value of the delay-time difference of the signal that is obtained (delay timedifference).The measured value that obtains the delay-time difference between signal can obtain the crosscorrelation between signal and survey its peak-peak position and obtain by calculating.Then, suppose m (m=1 wherein ..., M) sound of microphone obtains the position and is represented as (X m, Y m, Z m) and the sound source position of estimating be
Figure C200480000174D00131
Obtain the measured value of the delay-time difference between signal
Figure C200480000174D00132
It can obtain from these positions, expresses with equation (2).
τ ^ ij = 1 c ( x i - X ^ ) 2 + ( y i - Y ^ ) 2 + ( z i - Z ^ ) 2 - 1 c ( x j - X ^ ) 2 + ( y j - Y ^ ) 2 + ( z j - Z ^ ) 2 - - - ( 2 )
Wherein c is a speed of sound.
Then, obtain the measured value τ of the delay-time difference between signal IjWith the value of estimating Multiply by speed of sound
Figure C200480000174D0013141308QIETU
Being converted to distance value, its as from the position of the microphone that obtains voice separately to the measurement and the estimated value d of the difference of the distance the language sound source IjWith
Figure C200480000174D00141
The Mean Square Error e (q) of these values provides by equation (3).
e ( q ) = Σ i = 1 M - 1 Σ j = i + 1 M | d ij - d ^ ij | 2
= Σ i = 1 M - 1 Σ j = i + 1 M | d ij - ( x i - X ^ ) 2 + ( y i - Y ^ ) 2 + ( z i - Z ^ ) 2 - ( x j - X ^ ) 2 + ( y j - Y ^ ) 2 + ( z j - Z ^ ) 2 | 2
= Σ i = 1 M - 1 Σ j = i + 1 M | d ij - r i + r j | 2 - - - ( 3 )
Wherein q = ( X ^ , Y ^ , Z ^ ) . r iAnd r jRepresented the sound source position of estimating q = ( X ^ , Y ^ , Z ^ ) With microphone 11 iWith 11 jBetween distance.
Average variance e (q) by minimization equation (3) is separated, and it minimizes the measured value of the delay-time difference between the signal that obtains and the error between estimated value the sound source position that might obtain estimating.At this moment, although because equation (3) is nonlinear simultaneous equations and is difficult to resolve and finds the solution that the sound source position of estimation can obtain by utilizing the numerical analysis of revising one by one.
Sound source position for the estimation that obtains minimum equation (3)
Figure C200480000174D00147
The gradient of the specified point of equation (3) is calculated, and the sound source position of Gu Jiing is to revise on zero the direction up to gradient reducing error then; Therefore, the sound source position of estimation by repeat to u=0,1 .... the equation (4) below calculating is revised
q ( u + 1 ) = q ( u ) - α · grad e ( q ) | q = q ( u ) - - - ( 4 )
Wherein α revises step-length, and it is set to α〉0 value.q (u)Represent q to revise u time, and q ( 0 ) = ( X ^ 0 , Y ^ 0 , Z ^ 0 ) It is arbitrary initial value predetermined when u=0.Grad represents gradient, and it is expressed as following equation (5) to (10).
grad e ( q ) = ( ∂ e ( q ) ∂ X ^ , ∂ e ( q ) ∂ Y ^ , ∂ e ( q ) ∂ Z ^ ) - - - ( 5 )
∂ e ( q ) ∂ X ^ = 2 Σ i = 1 M - 1 Σ j = i + 1 M { d ij - r i + r j } × { x i - X ^ r i - x j - X ^ r j } - - - ( 6 )
∂ e ( q ) ∂ X ^ = 2 Σ i = 1 M - 1 Σ j = i + 1 M { d ij - r i + r j } × { y i - Y ^ r i - y j - Y ^ r j } - - - ( 7 )
∂ e ( q ) ∂ X ^ = 2 Σ i = 1 M - 1 Σ j = i + 1 M { d ij - r i + r j } × { z i - Z ^ r i - z j - Z ^ r j } - - - ( 8 )
r i = ( x i - X ^ ) 2 + ( y i - Y ^ ) 2 + ( z i - Z ^ ) 2 - - - ( 9 )
r j = ( x j - X ^ ) 2 + ( y j - Y ^ ) 2 + ( z j - Z ^ ) 2 - - - ( 10 )
As previously mentioned, by repeated calculation equation (4), the sound source position that might be obtained estimating by the place of minimization in error.
Fig. 3 is with the functional structure of the formal specification sound source position detection portion 15 of square.In this example, sound source position detection portion 15 comprises delay-time difference measurement section 15A, multiplier 15B, the 15C of distance calculation portion, Mean Square Error calculating part 15D, gradient calculation portion 15E, relevant detection unit 15F and the estimated position renewal 15G of portion.
Delay-time difference measurement section 15A from a voice sound source 9k language time, to each (i, j) to measuring delay-time difference by the crosscorrelation scheme,
i=1,2,...,M-1;
j=i+1,i+2,...,M
Based on passing through microphone 11 iWith 11 jThe signal that receives.Multiplier 15B is to the delay-time difference τ of each measurement IjMultiply by velocity of sound c to obtain sound source and microphone 11 iWith 11 jBetween range difference d IjThe 15C of distance calculation portion calculates by equation (9) and (10), the sound source position of the estimation of the estimated position renewal 15G of portion feedback
Figure C200480000174D00152
With microphone 11 iWith 11 jBetween apart from r iAnd r jBut in this case, the 15G of estimated position renewal portion is with the arbitrary initial value
Figure C200480000174D00153
Offer the 15C of distance calculation portion as the sound source position of estimating first.Mean Square Error calculating part 15D utilizes d Ij, r iAnd r jAbove-mentioned (i is j) to removing to calculate Mean Square Error to all by equation (3).Gradient calculation portion 15F utilizes the sound source position and the d of current estimation Ij, r i, r jBy equation (6), (7) and (8) calculate the gradient grad e (q) of Mean Square Error e (q).
Relevant detection unit 15F is with each element and the preestablished limit value e of the gradient grad e (q) of Mean Square Error ThCompare to judge whether each element is less than limiting value e Th, and if then export estimated position sound source position at that time
Figure C200480000174D00154
If each element is also not all less than e Th, then the 15G of estimated position renewal portion utilizes gradient grad e (q) and current estimated position q = ( X ^ , Y ^ , Z ^ ) Upgrade the estimated position by equation (4), and the estimated position after will upgrading q u + 1 = ( X ^ , Y ^ , Z ^ ) Offer the 15C of distance calculation portion.The 15C of distance calculation portion utilizes the estimated position of upgrading
Figure C200480000174D00157
And d IjCalculate r with reference to the mode that the front is identical iAnd r jAfter this, Mean Square Error calculating part 15D upgrades e (q), and gradient calculation portion 15E calculates the grad e (q) that upgrades then, and relevant detection unit 15F judges that whether the Mean Square Error e (q) that upgrades is less than limiting value e Th
Like this, estimated position
Figure C200480000174D00158
Renewal be repeated until that each element of the gradient grad e (q) of Mean Square Error becomes fully little (less than e Th), near this, estimate sound source 9 kThe position
Figure C200480000174D00159
Similar, the position of other sound source is also estimated.
The conversion of signals that frequency domain converter section 16 obtains each microphone is a frequency-region signal.For example, obtaining the signals sampling frequency is 16kHz, each microphone 11 m(m=1 ..., the sample that obtains signal M) uses fast fourier transform, and (Fast Fourier Transform FFT) handles with 256 samplings of every frame to obtain same number of frequency-region signal sample X m(ω).
Then, covariance matrix calculating part 17 calculating microphones obtain the covariance of signal and generate covariance matrix.Suppose X 1(ω) to X M(ω) representative is to each sound source 9 kThe microphone that obtains by frequency domain converter section 16 obtains the signal frequency-domain switching signal, a M of these signals * M covariance matrix R XX(ω) generally be expressed as following equation (11).
R XX ( ω ) = X 1 ( ω ) · · · X M ( ω ) X 1 ( ω ) * . . . X M ( ω ) *
Wherein * represents conjugate transpose.
Then, covariance matrix storage part 18 is based on the result of detection of sound source position detection portion 15, with covariance matrix R XX(ω) as each sound source 9 kA M * M covariance matrix R SkSk(ω) store.
Suppose A k(ω)=(a K1(ω) ..., a KM(ω)) represent each sound source 9 kThe M-passage obtain the weighted blend vector of signal, obtain sound level estimation portion 19 to each sound source 9 kUtilize each sound source 9 be stored in the covariance matrix storage part 18 by following equation (12) kThe covariance matrix R that obtains signal SkSk(ω) calculate the sound level that obtains.
P Sk = 1 W Σ ω = 0 W A k ( ω ) H R SkSk ( ω ) A k ( ω ) - - - ( 12 )
Hereinbefore, the weighted blend vector representation is the vectorial A that controllable frequency characteristic is arranged k(ω)=(a K1(ω) .., a KM(ω)), if but not having frequecy characteristic control is effectively, vectorial A kElement may be predefined value a K1, a K2..., a KMFor example, weighted blend vector A kElement to each sound source 9 kBecome more and more near sound source 9 at microphone corresponding to element kThe time, given increasing value.Under the extreme case, near sound source 9 k Microphone 11 mPairing element might be set at 1 and other element is set at 0, such as A k=(0 ..., 0, a Km=1,0 ..., 0).In the following description, for the sake of simplicity, a K1(ω) ..., a KM(ω) be shown a by simple table K1..., a KM
In the equation (12) HExpression complex-conjugate transpose, and A k(ω) HR SkSk(ω) A k(ω) can be expressed as following equation.
A k ( ω ) H R SkSk ( ω ) A k ( ω )
= a k 1 * ( a k 1 X 1 ( ω ) X 1 ( ω ) * + a k 2 X 2 ( ω ) X 1 ( ω ) * + . . . + a kM X M ( ω ) X 1 ( ω ) * )
+ a k 2 * ( a k 1 X 1 ( ω ) X 2 ( ω ) * + a k 2 X 2 ( ω ) X 2 ( ω ) * + . . . + a kM X M ( ω ) X 1 ( ω ) * )
· · · · · ·
+ a kM * ( a k 1 X 1 ( ω ) X M ( ω ) * + a k 2 X 2 ( ω ) X M ( ω ) * + . . . + a kM X M ( ω ) X M ( ω ) * )
= Ω ( ω ) - - - ( 13 )
Equation (12) means and obtains average power signal P SkBe to calculate divided by W by the value that the power spectrum sampled value (sample value) by Ω (ω) expression that will be provided by equation (13) on the frequency band 0 to W (number of samples) of the frequency-region signal that is generated by frequency domain converter section 16 will stack up after will superposeing then.
For example, suppose microphone 11 1Apart from sound source 9 1Recently, weighting factor a K1Value be to determine like this, by microphone 11 1The signal allocation that (first passage) obtains is to the maximum weighting and the weighting factor a that obtains signal of other passage K2, a K3..., a KMValue all less than a K1According to such weighting scheme, might increase from sound source 9 1The S/N of the signal that obtains or reduce the influence of room reflections (room reverberation) than the situation of not carrying out this weighting.That is each sound source 9, kThe optimal value of weighting factor of weighted blend vector promptly increased by this way with experimental technique by the layout of the directivity of microphone and layout and sound source, for example sound source 9 kThe S/N of pairing output voice signal also reduces room reflections and pre-determines.But according to the present invention, even carried out identical weighting at all passages, the signal that obtains from each sound source also can control to the level of expectation.
Then, filter coefficient calculating part 21 is for obtaining the voice calculating filter coefficient from each sound source with the volume of expectation.At first, suppose H 1(ω) to H M(ω) represent each filter that is connected to a microphone 12 1To 12 MThe frequency domain conversion of filter coefficient after form.Then, suppose the matrix that on behalf of these filter coefficients, H (ω) constitute by following equation (14).
H ( ω ) = H 1 ( ω ) · · · H M ( ω ) - - - ( 14 )
In addition, suppose X Sk, 1To X Sk, MRepresent k sound source 9 kLanguage during each microphone obtain the signal of signal frequency-domain conversion.
In this case, the condition that filter coefficient matrix H (ω) need to satisfy is that the signal component from each sound source has the level P of expectation when microphone obtains signal and all superposes with (ω) filtering of filter coefficient matrix H and filtered signal OptTherefore, following equation (15) is desirable condition, passes through stack sound source 9 according to it kThe signal that obtains of filtered signal with to from microphone 11 1To 11 MThe weighted blend vector A of the signal that obtains kIt is identical (ω) to multiply by the resulting signal of gain of expectation.
X Sk , 1 ( ω ) · · · X Sk , M ( ω ) H ( ω ) = P opt P Sk X Sk , 1 ( ω ) · · · X Sk , M ( ω ) A k ( ω ) - - - ( 15 )
K=1 wherein ..., K, k represent the sound source number.
Then, provide following equation (16) for obtaining filter coefficient matrix H (ω) by least square method solving condition equation (15).
H ( ω ) = { Σ k = 1 K C Sk R SkSk ( ω ) } - 1 Σ k = 1 K C Sk P opt P Sk R SkSk ( ω ) A k ( ω ) - - - ( 16 )
C wherein SkBe weighting factor its for k sound source position applied sensitivity constraint.Here said sensitivity constraint is to consider that sound source position flattens the frequency characteristic of current sound deriving means.The increase of this value has then increased the sensitivity constraint of the sound source of paying close attention to, and but the sound that allows flat frequency characteristic obtains damaged frequency characteristic for other sound source position.Therefore, C preferably SkUsually the value of setting greatly within scope 0.1 to 10 all sound sources are applied the restriction of comparison balance.
Fig. 4 is used for calculating the functional structure of the filter coefficient calculating part 21 of the filter coefficient of using equation (16) expression with the formal specification of square.In this example, covariance matrix R S1S1To R SkSkCorrespond respectively to sound source 9 1To 9 K, provide by covariance matrix storage part 18, be applied to multiplier 21A1 to 21AK, they multiply by the weighting factor C that is set by weighting factor configuration part 21H respectively there S1To C SKSound source 9 1To 9 KObtain sound level P S1To P SK, estimate by obtaining sound level estimation portion 19, be provided to a square ratio calculating part 21B1 to 21BK, square ratio between them there, (P Opt/ P S1) 1/2To (P Opt/ P SK) 1/2And predetermined desired output level p OptCalculated, and the value that calculates is provided to multiplier 21C1 to 21CK respectively with multiplying each other from the result of multiplier 21A1 to 21AK.The result who comes out from multiplier 21C1 to 21CK is supplied to multiplier 21D1 to 21DK, and they are further with weighted blend vector A there 1(ω) to A K(ω) multiply each other, and the back result's that multiplies each other summation matrix is calculated by adder 21E.On the other hand, the result's who comes out from multiplier 21A1 to 21AK summation matrix is calculated by adder 21F, and by inverse matrix multiplier 21G, by adder 21F inverse of a matrix matrix that calculates and the output multiplication that comes out from adder 21E to calculate filter coefficient H (ω).
Then, the filter coefficient H that calculates of filtered device coefficient calculations portion 21 1(ω), H 2(ω) ..., H M(ω) be set at filter 12 1To 12 MIn, with respectively to from microphone 11 1To 11 MThe signal filtering of obtaining.Filtered signal is all stacked up by adder 13, is provided as output signal by the output after its stack.
To provide the description of three uses below according to the example of sound deriving means of the present invention.
When as shown in Figure 5, first method begins in step S1 initial sound source number K be set to K=0.That follow is step S2, and condition judgement portion 14 periodically checks the language situations within it, and if detect language, in step S3, relate to sound source position detection portion 15 and survey sound sources.In step S4, judge the sound source position surveyed whether with before any one sound source position coupling of being surveyed, and if the position of coupling exist, corresponding to the covariance matrix R of that sound source position XX(ω) in step S5, in covariance matrix calculating part 17, recomputate, and the covariance matrix update that the usefulness of the covariance matrix in the corresponding region recalculates in the covariance matrix storage part 18 in step S6.
Find the position of coupling in the sound source position that does not have former detection in step S4, K increases by 1 in step S7, then at the covariance matrix R of step S8 corresponding to that sound source position XX(ω) in covariance matrix calculating part 17, newly calculated, and covariance matrix gets up in the new region stored of covariance matrix storage part 18 in step S9.
Then, in step S10, obtain signal level and estimate for 19 li obtaining sound level estimation portion according to the covariance matrix of storage, in step S11, estimate then obtain sound level and the filtered device coefficient calculations of covariance matrix portion 17 is used for calculating filter coefficient H 1(ω) to H M(ω), and in step S12 at filter 12 1To 12 MThe filter coefficient of setting upgrades with the value that newly calculates.
Second method, as shown in Figure 6, the value that preestablishes maximum sound source number is at K MaxIn and to preestablish initial sound source number K in step S1 be 0.The step S2 to S6 of back is consistent with situation shown in Figure 5; Promptly, microphone output signal is examined the language situation, if and detect language, its sound source position is detected out so, judge then whether the sound source position detect mates and survey before any one, if and matched position exists, to covariance matrix that should sound source position calculated and as the new matrix that upgrades in corresponding storage area stored.
Find the position of coupling in the sound source position that does not have former detection in step S4, K increases by 1 in step S7, and checks to judge that whether K is greater than maximum K in step S8 MaxIf do not surpass maximum K Max, the covariance matrix of the position of detecting so calculates in step S9, and covariance matrix is stored in the new region in step S10.When finding that K surpasses maximum K in step S8 Max, in step S11, set K=K Max, a covariance matrix that upgrades the earliest that is stored in step S12 then in the covariance storage part 18 is deleted, and is calculated new covariance matrix by covariance matrix calculating part 17 in step S13 be stored in that piece zone in step S14.Follow-up step S15, the step S10 among S16 and S17 and Fig. 5, S11 is the same with S12; That is, the obtain sound level estimated to each sound source calculates according to covariance matrix, and filter coefficient is calculated and be set in filter 12 1To 12 MThe method is well because the storage area of covariance matrix storage part 18 can arrive K by the maximum of restriction sound source number K than Fig. 5 method MaxAnd reduce.
In first kind and second method, as mentioned above, the detection of each voice the invariably accompany calculating of covariance matrix and the renewal of storage and filter coefficient, but the third method described below is not followed filter coefficient update when the sound source position of the language that detects mates any one sound source position of having surveyed.Fig. 7 illustrates the processing procedure of the third method.The initial value of sound source number K is set at 0 in step S1, and state detection portion 14 periodically checks the language situations in step S2 then, and if detect language, the sound source position of the sound source position detection portion 15 detection language that detect in step S3.Judge in step S4 whether the sound source position of surveying mates the sound source position of surveying before any one, and if the existence of the position of coupling, processing procedure is got back to step S2 and be need not to upgrade.If in step S4, there is not the position of coupling in any one sound source position of having surveyed, that is, if sound source 9 kMove to be different from before residing position, if perhaps increased new sound source, K increases by 1 in step S5, then in step S6 corresponding to the covariance matrix R of this sound source SkSk(ω) in covariance matrix calculating part 17, newly calculated, and it is stored in corresponding new region MA in the covariance storage part 18 in step S7 kCovariance matrix is acquired sound level estimation portion 19 and is used for estimating to obtain sound level in step S8 then, the all covariance matrix and the filtered device coefficient calculations of the sound level portion 21 that obtains of estimation are used for calculating the updated filter coefficient in step S9 then, and the updated filter coefficient is set to filter 12 in step S10 1To 12 M, then turn back to step S2.
The institute the above, according to the present invention, sound source position estimates from obtaining the signal of a plurality of microphones, then each sound source is calculated the covariance matrix that obtains signal, the filter coefficient that is used to regulate the volume of each sound source position is then calculated, and filter coefficient is used to the signal that obtains of filtering microphone, might obtain the output signal that volume is adjusted at each speaker location by it.
When the embodiment of Fig. 1 estimates each sound source 9 with reference to sound source position detection portion 15 kThe situation of coordinate position when being described, might calculate Sounnd source direction, that is, each sound source is at microphone 11 1To 11 MThe position, angle of arranging.The method of estimating Sounnd source direction is suggested, for example, at Tanaka, Kaneda, and Kojima, " Performance Evaluation of a Sound Source DirectionEstimating Method under Room Reverberation ", Journal of the Societyof Acoustic Engineers of Japan, vol.50, No.7,1994, pp.540-548.In brief, the covariance matrix that obtains signal only needs calculate and store each sound source.
Second embodiment
Fig. 8 is the functional block diagram according to the sound deriving means of first embodiment of the invention.
The sound deriving means of present embodiment comprises microphone 11 1To 11 M, filter 12 1To 12 M, adder 13, condition judgement portion 14, sound source position detection portion 15, frequency domain converter section 16, covariance matrix calculating part 17, covariance matrix storage part 18 obtains sound level estimation portion 19 and filter coefficient calculating part 21.
Present embodiment is to the effect that the sound level adjustment has increased noise attentuation of obtaining according to the sound deriving means of first embodiment of the invention.
At first, condition judgement portion 14 bases are from microphone 11 1To 11 MThe power of the signal that receives is surveyed language section and noise section.Condition judgement portion 14 comprises, as shown in Figure 9, as the situation of first embodiment, by the short time average power calculating 14B of portion and the long-time average power calculating 14C of portion each microphone obtained calculated signals short time average power P AvSWith long-time average power P AvL, short time average power and the ratio between average power then for a long time, R p=P AvS/ P AvL, 14D is calculated in division portion, then this ratio and language limiting value P ThUThe 14E of portion compares in the language detection, and if the power ratio value of overstepping the extreme limit, it is judged as indication language section and exists.Noise detection unit 14F is with power ratio R pWith noise margin value P ThNCompare, and if power ratio less than limiting value, it is judged as indication noise section and exists.
When being designated as the language section by the result of language detection unit 14E judgement, sound source position detection portion 15 surveys the sound source position of just surveying as the same mode that relates in the first embodiment of the invention.
Then, 16 conversions of frequency domain converter section are in each sound source 9 kThe language section and the noise section from microphone 11 1To 11 MThe signal that obtains is a frequency-region signal, and they are offered covariance matrix calculating part 17.Covariance matrix calculating part 17 as the identical mode of first embodiment of the invention to sound source 9 kCalculate the covariance matrix R that frequency domain obtains signal SkSk(ω).In addition, the covariance matrix calculating part calculates the covariance matrix R that obtains signal at the frequency domain of noise section NN(ω).
Covariance matrix storage part 18 is based on the result of sound source position detection portion 15 detections and the result of determination of condition judgement portion 15, to each sound source 9 1..., 9 kAt regional MA 1..., MA K, MA K+1The covariance matrix R of storage language section SkSk(ω) and the covariance matrix R of noise section NN(ω).
Obtain sound level estimation portion 19 and each sound source is estimated to obtain sound level P as the identical mode of first embodiment of the invention Sk
Then, 21 pairs of filter coefficient calculating parts are from each sound source 9 kVolume with expectation is obtained sound and is the attenuate acoustic noise calculating filter coefficient.At first, the condition of noise attentuation is calculated.Suppose to obtain the signal frequency-domain switching signal by X at noise section microphone N, 1(ω) to X N, M(ω) representative.If microphone obtains signal X N, 1(ω) to X N, M(ω) pass through filter 12 at the noise section 1To 12 MWith adder 13 back vanishing, this means that noise can be attenuated; Therefore, the condition of noise attentuation provides by following equation (17).
(X N,1(ω),...,X N,M(ω))H(ω)=0 (17)
By satisfying equation (17) simultaneously and obtaining the equation (15) of sound level in order to adjustment, mention as the front first embodiment of the invention, might realize obtaining sound level adjustment and noise attentuation simultaneously.
Then, provide following equation (18) for obtaining filter coefficient matrix H (ω) by least square method solving condition equation (15) and equation (17).
H ( ω ) = { Σ k = 1 K C Sk R SkSk ( ω ) + C N R NN ( ω ) } - 1 Σ k = 1 K C Sk P opt P Sk R SkSk ( ω ) A k ( ω ) - - - ( 18 )
C NIt is the weighting constant of noise attentuation rate; The increase of this constant numerical value has then increased noise attentuation speed.But, because C NIncrease reduced the sensitivity constraint of sound source position and increased the reduction (degradation) of the frequency characteristic of obtaining voice signal, C NNormally be set at big suitable value in scope 0.1 to 10.The meaning of other symbol with in first embodiment, be the same.
Then, the filter coefficient that calculates by equation (18) is set at filter 12 1To 12 MIn and be used for the filtering microphone and obtain signal.Filtered signal stacks up by adder 13, and the signal after the stack is provided as output signal.
As mentioned above, second embodiment of the invention allows noise attentuation except realizing obtaining the sound level adjustment in first embodiment of the invention.
The other parts of present embodiment are identical with first embodiment of the invention, so they no longer are described.
The 3rd embodiment
Figure 10 is the functional block diagram according to the sound deriving means of third embodiment of the invention.
The sound deriving means of present embodiment comprises loudspeaker 22, microphone 11 1To 11 M, filter 12 1To 12 MWith 23, adder 13, condition judgement portion 14, sound source position detection portion 15, frequency domain converter section 16, covariance matrix calculating part 17, covariance matrix storage part 18 obtains sound level estimation portion 19 and filter coefficient calculating part 21.
Present embodiment has increased loudspeaker 22 to reproduce from the speaker signal that receives and the filter 23 that is used for the filtering received signal of the participant that is positioned at remote location there to the sound deriving means of second embodiment of the invention, from the viewpoint that realizes, except obtaining the noise attentuation of sound level adjustment and second embodiment, increased the elimination of echo, it is by microphone 11 1To 11 MThe composition of the loudspeaker reproducing signal that is obtained.
Condition judgement portion 14 except the structure of condition judgement shown in Figure 4 portion 14, comprising as shown in figure 11: short time average power calculating 14B ' of portion and the long-time average power calculating 14C ' of portion are to calculate the short time average power P ' of received signal respectively AvSWith long-time average power P ' AvLThe 14D ' of division portion is to calculate their ratio R ' P=P ' AvS/ P ' AvLReceive detection unit 14G its with ratio R ' pWith predetermined received signal limiting value R ThRCompare, and if the former greater than the latter, decision state is for receiving section; With state determination portion 14H its based on language detection unit 14E, noise detection unit 14F with receive the result that detection unit 14G judged and determine state.When being received the result that detection unit 14G judged is when receiving section, state determination portion 14H determines that state is to receive section, and no matter the result of determination of language detection unit 14E and noise detection unit 14F how, otherwise when receiving detection unit 14G decision state is not to receive section, and state determination portion situation as shown in Figure 4 determines that according to the judgement of language detection unit 14E and noise detection unit 14F state is language or noise section.
The result who judges when condition judgement portion 14 is the language section, and sound source position detection portion 15 surveys the position of sound source as the identical mode that relates in the first embodiment of the invention.
Then, frequency domain converter section 16 obtains signal with microphone and received signal is transformed into frequency domain signal X 1(ω) ..., X M(ω) and Z (ω), and covariance matrix calculating part 17 calculate the covariance matrix that frequency domains obtain signal and received signal.Microphone obtains signal frequency-domain switching signal X 1(ω) to X MCovariance matrix R (ω) XX(ω) calculate by following equation (19) with frequency domain switching signal Z (ω).
R XX ( ω ) = Z ( ω ) X 1 ( ω ) · · · X M ( ω ) Z ( ω ) * X 1 ( ω ) * . . . X M ( ω ) * - - - ( 19 )
Wherein * represents conjugate transpose.
Then, in covariance matrix storage part 18, based on the result of detection of sound source position detection portion 15 and the result of determination of condition judgement portion 14, covariance matrix R XX(ω) be used as at the language section for each sound source 9 kThe covariance matrix R that obtains signal and received signal SkSk(ω), as the covariance matrix R that obtains signal and received signal at the noise section NN(ω), and as receiving the covariance matrix R that section obtains signal and received signal EE(ω) respectively at regional MA 1..., MA K, MA K+1, MA K+2In store.
Obtain the covariance matrix R of sound level estimation portion 19 based on each sound source S1S1..., R SKSKWith the predetermined weighted blend vector A that contains M+1 element to each sound source 1(ω) ..., A K(ω) by following equation (20) to each sound source 9 kSound level P is obtained in calculating Sk
P Sk = 1 W Σ ω = 0 W A k ( ω ) H R SkSk ( ω ) A k ( ω ) - - - ( 20 )
Then, filter coefficient calculating part 21 calculating filter coefficients are to obtain the voice of saying from each sound source with the volume of expectation.Suppose H 1(ω) to H M(ω) representative is connected respectively to the filter 12 of microphone 1To 12 MThe frequency domain conversion of filter coefficient after form, and supposition F (ω) representative is used for the form after the frequency domain conversion of filter coefficient of filter 23 of filtering received signal.Then, suppose that H (ω) represents that these filter coefficients constitute by following equation (21) given matrix.
H ( ω ) = F ( ω ) H 1 ( ω ) · · · H M ( ω ) - - - ( 21 )
In addition, suppose X E, 1(ω) to X E, M(ω) representative is obtained the signal frequency-domain switching signal at reception section microphone; Suppose Z E(ω) represent the frequency domain switching signal of received signal; Suppose X N, 1(ω) to X N, M(ω) representative is obtained the signal frequency-domain switching signal at noise section microphone; Suppose Z N(ω) represent the frequency domain switching signal of received signal; Suppose X Sk, 1(ω) to X Sk, M(ω) representative is in k sound source 9 of language section kMicrophone obtain the signal frequency-domain switching signal; And suppose Z Sk(ω) represent the frequency domain switching signal of received signal.
In this case, the condition that filter coefficient matrix H (ω) need to satisfy is to obtain signal and send signal when using (ω) filtering of filter coefficient matrix H and filtered signal all to stack up separately when microphone, and echo and noise signal are eliminated and only have and send the level transmission of voice signal with expectation.
Therefore, at the signal that receives section and noise section, equation (22) and (23) be desirable condition by their filtering after and superpose after signal be 0.
(Z E(ω)X E,1(ω)…X E,M(ω))H(ω)=0 (22)
(Z N(ω)X N,1(ω)…X N,M(ω))H(ω)=0 (23)
For signal at the language section, following equation be desirable condition by its filtering after and the signal of stack with microphone is obtained signal and received signal and multiply by the vectorial A of weighted blend that forms by predetermined M+1 element and expected gain kThe signal that obtains (ω) equates.
Z Sk ( ω ) X Sk , 1 ( ω ) . . . X Sk , M ( ω ) H ( ω ) = P opt P Sk Z Sk ( ω ) X Sk , 1 ( ω ) . . . X Sk , M ( ω ) A k ( ω )
(24)
Weighted blend vector A k(ω)=(a 0(ω), a K1(ω) ..., a KMElement a (ω)) 0(ω) representative weighting factor to received signal; Usually, it is set to a 0(ω)=0.
Then, for obtaining filter coefficient matrix H (ω), the equation below the condition that constitutes by least square method solve equation (22) to (24) provides:
H ( ω ) = { Σ k = 1 K C Sk R SkSk ( ω ) + C N R NN ( ω ) + C E R EE ( ω ) } - 1 Σ k = 1 K C Sk P opt P Sk R SkSk ( ω ) A k ( ω ) - - - ( 25 )
C EIt is the weighting constant of promoting the echo round trip loss; This value is big more, and promoting the echo round trip loss increases just many more.But, C EThe increase of value has been quickened to obtain the deterioration of signal frequency domain characteristic and has been reduced the noise attentuation characteristic.Therefore, C EUsually be set at big value suitable in scope 0.1 to 10.0.The meaning of other symbol identical with in second embodiment.
With this approach, filter coefficient can be determined with the form of adjusting volume and attenuate acoustic noise.
Then, the filter coefficient by equation (25) obtains is set in filter 12 1To 12 MWith 23, its respectively the filtering microphone obtain signal and received signal.Filtered signal is all stacked up by adder 13, and the signal after the stack of coming out from adder is used as and sends signal output.Other parts are identical with second embodiment of the present invention so no longer be repeated in this description.
As mentioned above, third embodiment of the invention allows also to have realized the echo elimination except obtaining sound level adjustment and the noise attentuation that second embodiment of the invention realizes.Eliminate ability when the 3rd embodiment is described to that second embodiment increased echo, echo is eliminated ability also can be increased to first embodiment.In this case, deleted among Figure 11 of noise detection unit 14F condition judgement portion 14 in detail display Figure 10, and the covariance matrix calculating part 17 among Figure 10 is disregarded calculation covariance matrix R at the noise section NN(ω).Therefore, can finish by following equation in the calculating of filter coefficient calculating part 21 median filter coefficients, its description according to the front is conspicuous.
H ( ω ) = { Σ k = 1 K C Sk R SkSk ( ω ) + C E R EE ( ω ) } - 1 Σ k = 1 K C Sk P opt P Sk R SkSk ( ω ) A k ( ω ) - - - ( 26 )
The 4th embodiment
Although top being described as increased the embodiment that echo is eliminated ability to obtain sound level adjustment and the noise attentuation ability of second embodiment, the 3rd embodiment of Figure 10 also can be configured to only to have the sound deriving means that noise attentuation and echo are eliminated ability.An example of this spline structure is showed in Figure 12.
Described in Figure 12, this embodiment has such structure, wherein sound source position detection portion 15 and obtain that sound level estimation portion 19 is deleted and covariance matrix calculating part 17 calculates the covariance matrix matrix R that sends signals in Figure 10 structure SS(ω), the covariance matrix R of received signal EE(ω), and the covariance matrix R of noise signal NN(ω), they are respectively stored in the storage area MA of covariance storage part 18 S, MA EAnd MA NIn.Echo is eliminated ability and can be utilized at least one microphone to realize, but has showed the example that uses M microphone here.
Condition judgement portion 14 is as in Figure 10 embodiment, from by microphone 12 1To 12 MJudge the language section in signal that obtains and the received signal, receive section and noise section; The corresponding component of describing among condition judgement portion and Figure 11 is identical in concrete structure and operation.Obtaining signal and received signal is converted to frequency domain by frequency domain converter section 16 and obtains signal X 1(ω) to X M(ω) with frequency domain received signal Z (ω), it is provided for covariance matrix calculating part 17.
Then, covariance matrix calculating part 17 generates the covariance matrix that frequency domain obtains signal and received signal.Microphone obtains signal frequency-domain switching signal X 1(ω) to X M(ω) and the covariance matrix R of the frequency domain switching signal Z (ω) of received signal XX(ω) calculate by following equation (27).
R XX ( ω ) = Z ( ω ) X 1 ( ω ) · · · X M ( ω ) Z ( ω ) * X 1 ( ω ) * . . . X M ( ω ) * - - - ( 27 )
Wherein * represents conjugate transpose.
Then, in covariance matrix storage part 18, based on the result of determination of condition judgement portion 14, covariance matrix R XX(ω) be used as in each sound source 9 of language section kThe covariance matrix R that obtains signal and received signal SS(ω), as the covariance matrix R that obtains signal and received signal at the noise section NN(ω), and as receiving the covariance matrix R that section obtains signal and received signal EE(ω) respectively at regional MA S, MA N, and MA EIn store.
Then, filter coefficient calculating part 21 obtains the voice of saying from sound source, and calculating filter coefficient is to eliminate echo and noise.Suppose H 1(ω) to H M(ω) represent the filter 12 that is connected to microphone respectively 1To 12 MThe frequency domain conversion of filter coefficient after form, and supposition F (ω) representative is used for the form after the frequency domain conversion of filter coefficient of filter 23 of filtering received signal.Then, suppose that H (ω) represents that these filter coefficients constitute by following equation (28) given matrix.
H ( ω ) = F ( ω ) H 1 ( ω ) · · · H M ( ω ) - - - ( 28 )
In addition, suppose X E, 1(ω) to X E, M(ω) representative is obtained the signal frequency-domain switching signal at reception section microphone; Suppose Z E(ω) represent the frequency domain switching signal of received signal; Suppose X N, 1(ω) to X N, M(ω) representative is obtained the signal frequency-domain switching signal at noise section microphone; Suppose Z N(ω) represent the frequency domain switching signal of received signal; Suppose X Sk, 1(ω) to X Sk, M(ω) representative is obtained the signal frequency-domain switching signal at language section microphone; And suppose Z S(ω) representative is in the frequency domain switching signal of language section received signal.
In this case, the condition that filter coefficient matrix H (ω) need to satisfy is the signal that obtains when microphone and when sending signal and using (ω) filtering of filter coefficient matrix H and filtered signal all to stack up separately, and echo and noise signal are eliminated and only have and send the level transmission of voice signal with expectation.
Therefore, at the signal that receives section and noise section, the signal with superposeing that equation (29) and (30) are desirable conditions by their filtering is 0.
(Z E(ω)X E,1(ω)…X E,M(ω))H(ω)=0 (29)
(Z N(ω)X N,1(ω)…X N,M(ω))H(ω)=0 (30)
For signal at the language section, below equation be desirable condition, by after its filtering and the signal of stack with microphone is obtained signal and received signal and multiply by the signal that obtains behind the vectorial A of weighted blend (ω) that forms by M+1 predetermined element and equate.
Z S ( ω ) X Sk , 1 ( ω ) . . . X Sk , M ( ω ) H ( ω ) = P opt P Sk Z S ( ω ) X Sk , 1 ( ω ) . . . X Sk , M ( ω ) A k ( ω ) - - - ( 31 )
Weighted blend vector A (ω)=(a 0(ω), a K1(ω) ..., a KMFirst element a (ω)) 0(ω) represent the weighting factor of received signal; Usually, it is set to a 0(ω)=0.
Then, for obtaining condition that filter coefficient matrix H (ω) constitutes by least square method solve equation (29) to (31) equation below providing:
H(ω)={R ss(ω)+C NR NN(ω)+C ER EE(ω)} -1R SS(ω)A(ω) (32)
C EIt is the weighting constant of promoting the echo round trip loss; This value is big more, and promoting the echo round trip loss increases just many more.But, C EThe increase of value has been quickened to obtain the deterioration of signal frequency domain characteristic and has been reduced the noise attentuation characteristic.Therefore, C EUsually be set at big value suitable in scope 0.1 to 10.0.The meaning of other symbol identical with in second embodiment.
With this approach, filter coefficient can be determined with the form of adjusting volume and reduction noise.
Then, the filter coefficient by equation (32) obtains is set in filter 12 1To 12 MWith 23, it is signal and the received signal obtained of filtering microphone respectively.Filtered signal is all stacked up by adder 13, and the signal after the stack of coming out from adder is used as and sends signal output.Other parts are identical with second embodiment of the present invention, therefore no longer are repeated in this description.
As mentioned above, the 4th embodiment of the present invention also allows the realization that echo is eliminated except the effect of noise attentuation.
The 5th embodiment
Figure 13 illustrates the 5th embodiment.According to the 5th embodiment, in the 4th embodiment of Figure 12, be detected at language section sound source position, the covariance matrix of each sound source is calculated and stored and calculated and store at the covariance matrix of noise section to noise.Then, the covariance matrix of these storages is used to calculating filter coefficient to eliminate noise and echo.The signal that microphone obtains and the signal of reception use these filter coefficient filtering, thereby obtain the transmission signal that noise and echo are eliminated.
The structure of the 5th embodiment and the 3rd embodiment are common, are deleted except obtaining sound level estimation portion 19 among Figure 10.
Condition judgement portion receives section and noise section as surveying the language section among the 3rd embodiment.The result who judges when condition judgement portion 14 is the language section, and sound source position 15 detection portions 15 estimate the position of each sound source 9k.What use among first embodiment of sound source position method of estimation and Fig. 1 is the same, no longer repeats.
Then, obtain signal and received signal is converted into frequency-region signal in frequency domain converter section 16, they are provided for covariance calculating part 17.
Covariance calculating part 17 calculates covariance matrix R to obtain signal and the received signal of each sound source 9k S1S1(ω) to R SkSk(ω), receiving section calculating covariance matrix R EE(ω) and at the noise section calculate covariance matrix R NN(ω).Covariance matrix storage part 18 is based on the result of determination of condition judgement portion 14 and the position sensing result of sound source position detection portion 15, respectively at corresponding regional MA 1To MA K, MA K+1And MA K+2Middle storage covariance matrix R S1S1(ω) to R SkSk(ω), R EE(ω) and R NN(ω).
In order to send the voice that are acquired, filter coefficient calculating part 21 calculating filter coefficients are to eliminate echo and noise.As the situation among the 3rd embodiment, filter coefficient matrix H (ω) solving condition expression formula is provided following equation by least square method:
H ( ω ) = { Σ k = 1 K C Sk R SkSk ( ω ) + C N R NN ( ω ) + C E R EE ( ω ) } - 1 Σ k = 1 K C Sk P opt P Sk R SkSk ( ω ) A k ( ω ) - - - ( 33 )
Top C S1To C SkBe weighting constant to the sensitivity constraint of each sound source, C EBe weighting constant to enhancement echo round trip loss, and C NIt is weighting constant to noise attentuation speed.
The filter coefficient that obtains like this is set at filter 12 1To 12 MWith 23, they filter voice signal and the received signal that microphone obtains respectively.Other parts are identical with second embodiment of the invention, therefore no longer are repeated in this description.Echo and noise that the transmission signal that the 5th embodiment allows to generate has been eliminated to come therefrom as the 3rd embodiment.In addition, according to the 5th embodiment, sensitivity constraint can impose on a plurality of sound sources, and sensitivity can maintain the sound source of saying voice of front.Therefore, present embodiment is useful, even its reason is when sound source moves, owing to the sensitivity that can keep under the condition of sending voice in sound source sound source, so the voice quality of the initial part of voice can not degenerate.
The 6th embodiment
Sound deriving means according to sixth embodiment of the invention will be described.
In the sound deriving means of present embodiment, in first sound deriving means to the 3rd and the 5th embodiment to sound source position 9 kThe weighting factor C of sensitivity constraint S1To C SKBe based on that the time (timewise) changes.
To sound source 9 1To 9 kSensitivity constraint the time become weighting factor C S1To C SKLanguage order according to the past is provided with more and more littler.First method is along with reducing weighting factor C from surveying each sound source position that has detected to the increase of surveying the time that the sound source position detect recently passs SkSecond method is with weighting factor C according to the order of surveying K sound source position SkBe provided with more and more littler.
Figure 14 with the formal specification of calcspar serve as to realize that the weighting factor of above-mentioned first method is provided with the functional structure of the 21H of portion.Weighting factor is provided with the 21H of portion and comprises: clock 21H1 output time; Its detection according to each sound source position of time storage part 21H2 covers detection time t, the sound source 9 of utilizing digital k to survey as the address representative kWeighting factor determination portion 21H3.Based on being stored in the time that sound source position is surveyed among the time storage part 21H2, weighting factor determination portion 21H3 distributes predetermined value C sAs weighting factor S CkGiving the numeral of current detection is the sound source of k (t), and according at detection time t kThe time t-tk apportioning cost q of back passage (t-tk)C SGive the sound source of other each digital k ≠ k (t) as weighting factor.Q is that the value scope of being scheduled to is 0<q≤1.By this way, the sensitivity constraint weighting factor C of each sound source S1To C SKBe determined, and they offer 21A1 to 21AK.
Figure 15 realizes that with the formal specification of calcspar the weighting factor of above-mentioned second method is provided with the functional structure of the 21H of portion.In this example, it comprises clock 21H1, time storage part 21H2, the 21H4 of sequential decision portion and weighting factor determination portion 21H5.The time of the 21H4 of sequential decision portion from be stored in time storage part 21H2 is judged detection sound source 9 1To 9 kK (t)=k (1) ..., k (K) the order (up-to-date order) of position.Weighting factor determination portion 21H5 distributes predetermined value C SAs weighting factor C Sk (1)Give the sound source 9 of up-to-date detection K (1)To other sound source, the weighting factor determination portion is to t=1, and 2 ..., K-1 calculates C Sk (t+1)← qC Sk (t)To obtain weighting factor C Sk (2)..., C Sk (t)These weighting factors C Sk (2)To C Sk (t)According to the order k (1) ..., k (K) rearrange, then as weighting factor C S1..., C SKOutput.The value of q is the scope of the being scheduled to value in 0<q<1.
By as above-mentioned each sound source is changed the weighting of sensitivity constraint, might be reduced in over the sensitivity constraint of sound source position of language.Therefore, compare to the 3rd embodiment with first, the device of present embodiment has reduced the sound source number that is subjected to sensitivity constraint, has strengthened sound level regulating power and noise and the echo cancellation performance obtained.
First those parts to the 3rd and the 5th embodiment of other parts and the present invention are identical, therefore no longer are repeated in this description.
The 7th embodiment
Sound deriving means according to seventh embodiment of the invention will be described.
According to the characteristics of the sound deriving means of seventh embodiment of the invention be according to the present invention first to the filter coefficient calculating part 21 of the sound deriving means of the 6th embodiment albefaction covariance matrix R XX(ω).Figure 16 has illustrated in filter coefficient calculating part 21 shown in Figure 4 by the functional structure of the indicated a kind of typical albefaction 21J1 to 21JK of portion of dotted line.The 21J of albefaction portion comprises diagonal matrix calculating part 21JA, the 21JB of weighting portion, the 21JC of inverse operation portion and multiplier 21JD.The covariance matrix R of diagonal matrix calculating part 21JA to providing XX(ω) generate diagonal matrix diag (R XX(ω)).The 21JB of weighting portion below calculating based on the equation of the capable matrix D of predetermined any M or M+1 to the diagonal matrix weights assigned.
D Tdiag(R XX(ω))D (34)
The 21JC of inverse operation portion calculation equation (34) contrary.
1/{D Tdiag(R XX(ω))D} (35)
Above TThe transposition of representing matrix.Multiply by each in the result of calculation of you calculating part 21JC of multiplier 21JD and be input to that covariance matrix R XX(ω) to obtain the covariance matrix after the albefaction.
After the such albefaction of covariance matrix, the filter coefficient that obtains in filter coefficient calculating part 21 is no longer along with sending signal, and the spectrum of obtaining signal and noise signal changes and changes.As a result, obtaining sound level regulating power and echo and noise removing ability can and not change along with the spectrum change---and this makes realizes that stable obtain sound level adjustment and echo and noise removing becomes possibility.
Therefore first identical to the 4th embodiment of other parts and the present invention no longer is repeated in this description.
The 8th embodiment
Sound deriving means according to eighth embodiment of the invention will be described.
The characteristics of the sound deriving means of the 8th embodiment are: first is averaged to the covariance storage part 18 of the sound deriving means of the 7th the embodiment covariance matrix that will store and the covariance matrix that is newly calculated by covariance matrix calculating part 17 and also the covariance matrix after average is saved as current covariance matrix according to the present invention.
Covariance matrix passes through, and for example Xia Mian method asks average.The covariance matrix of supposing to have stored is by R XX, old(ω) representative and the covariance matrix that newly calculated by covariance matrix calculating part 17 are by R XX, new(ω) representative, following equation is used to calculate average covariance matrix R XX(ω).
R XX(ω)=(1-p)R XX,new(ω)+pR XX,old(ω) (36)
Wherein p be a constant its determined average time constant and value 0≤p<1.
Figure 17 illustrates covariance matrix storage part 18 and is provided at the functional structure of that the average 18A of portion.The average 18A of portion comprises multiplier 18A1, adder 18A2 and multiplier 18A3.The covariance matrix R that is calculated by covariance matrix calculating part 17 corresponding to sound source 9k SkSk(ω), be used as new covariance matrix R SkSk, new(ω) offer multiplier 18A1 and be multiplied by (1-p), and the output of multiplier is applied to adder 18A2.On the other hand, corresponding to sound source 9 kCovariance matrix from storage area 18B, read then as old covariance matrix R SkSk, old(ω) offer multiplier 18A3 and be multiplied by constant p.Output after multiplying each other is added to output (1-p) R of multiplier 18A1 by adder 18A2 SkSk, new(ω), the covariance matrix R that obtains like this SkSk(ω) be rewritten as in corresponding to sound source 9 kStorage area.
By described covariance matrix is asked on average and the covariance matrix after the storage on average, might compare the average influence that reduces circuit noise or similar interference before, therefore covariance matrix accurately is provided---this makes determines that filter coefficient obtains the sound level adjustment with raising, and noise removing or echo cancellation performance become possibility.
Therefore first identical to the 5th embodiment of other parts and the present invention no longer is repeated in this description.
Incidentally, the present invention can realize by the enough hardware that is exclusively used in this; Perhaps, also might be to realize program of the present invention it is recorded on the computer-readable recording medium and reads in computer to carry out.Computer-readable recording medium relates to such as floppy disk, magneto-optical disc, CD-ROM, DVD-ROM, non-volatile semiconductor memory, memory device such as inner or outside hard disk.Computer-readable recording medium is also included within the medium (transmission medium or transmission ripple) of dynamic prewired program in the short time for example by this situation of Internet transmission program, at a fixed time in prewired program, for example in the computer system as this situation of the volatile memory of server.
The effect of invention
Then, be the effect of first embodiment of identity basis sound deriving means of the present invention, Figure 18 A and Figure 18 B have showed the analog result of placing microphone on the angle of the square region of 20 centimetres of 20 cm x.Simulated conditions are---microphone number: 4, signal to noise ratio: 20 decibels, the room reflections time: 300 milliseconds, loud speaker number: 2 (its direction of position that loud speaker A is 50 centimetres at distance square region center meets at right angles for a limit with it, and loud speaker B becomes 90 ° in distance square region center its direction of 200 centimeters with loud speaker A).Figure 18 A has showed the waveform of the signal that the microphone that obtains receives when loud speaker A and B alternately speak under the described conditions.The speech waveform that the speech waveform of comparison loud speaker A and B demonstrates loud speaker B is little on amplitude.Figure 18 B has showed the waveform of handling by the present invention.The speech waveform of loud speaker A and B is almost equal on amplitude, can be proved from the effect of the level adjustment of obtaining sound here.
Figure 19 has showed with the 3rd analog result that embodiment obtains shown in Figure 10.Simulated conditions are---microphone number M:4, and the signal to noise ratio of the transmission signal before handling: 20 decibels, send signal and compare with echo :-10 decibels, the room reflections time: 300 milliseconds.Figure 19 has showed the mutual transmission and the resulting transmission signal level of received signal that ought repeat under the described conditions.Row A has showed the transmission signal level before handling, and row B has showed the transmission signal level of passing through after the 3rd embodiment handles.Described result demonstrates about 15 decibels of the 3rd embodiment attenuate echo general 40 decibels and attenuate acoustic noise signal, is effective from confirming embodiments of the invention here.
As mentioned above, according to first embodiment of the invention, by: according to the acquisition of signal sound source position that is obtained by a plurality of microphones; To the covariance matrix calculating filter coefficient of each sound source position based on the language section; The signal that obtains by filter coefficient filtering microphone; The filtered signal that superposes might obtain each sound source position is carried out the transmission signal that volume is adjusted.
According to second embodiment of the invention, in first embodiment, determine filter coefficient, the sound level adjustment that might not only realize noise removing but also realize obtaining in the covariance of language section and at the covariance matrix of noise section by utilizing.
According to third embodiment of the invention, add at the covariance matrix that receives section by the covariance matrix that utilizes the language section in first or second embodiment and to determine filter coefficient, might realize the echo elimination.
According to four embodiment of the invention, determine filter coefficient by covariance matrix that utilizes the language section and the covariance matrix that receives section, might reproduce the signal that receives and eliminate echo by loudspeaker.
According to fifth embodiment of the invention, add that by the covariance matrix that utilizes language among the 4th embodiment and receive section the covariance matrix of noise section determines filter coefficient, further the elimination noise.
According to sixth embodiment of the invention, by at first, second, among the 3rd and the 5th embodiment during calculating filter coefficient covariance matrix to early language distribute less weighting factor, might further strengthen and obtain sound level adjustment, noise removing or echo cancellation performance.
According to seventh embodiment of the invention, by at first albefaction covariance matrix during calculating filter coefficient to the 6th embodiment, might realize the sound level adjustment of obtaining, the spectrum that noise removing and echo are eliminated signal changes not susceptible to.
According to eighth embodiment of the invention, when covariance matrix when first is stored to the 7th embodiment, covariance matrix and the matrix that has been stored in respective regions is averaged and the weighted average covariance matrix is stored, might obtain more accurately covariance matrix and determine and can obtain the sound level adjustment by it, noise attentuation and echo elimination aspect provide the filter coefficient that has strengthened performance.

Claims (19)

1. the microphone by a plurality of passages in the acoustic space obtains the sound acquisition methods of voice signal from each sound source, comprising:
(a) condition judgement step comprises whether it belongs to the determination step of the language of language section according to the signal determining that microphone obtained of described a plurality of passages;
(b) sound source position detection steps promptly when determining that it is the language section in described language determination step, is surveyed the position of described each sound source from the described signal that obtains;
(c) frequency domain switch process, being about to the described conversion of signals of obtaining is frequency-region signal;
(d) covariance matrix calculation procedure is promptly calculated the covariance matrix that described frequency domain obtains signal during described language section;
(e) covariance matrix storing step, promptly the result of detection based on described sound source position detection steps is stored described covariance matrix to each sound source during described language section;
(f) filter coefficient calculation procedure is promptly calculated filter coefficient in described a plurality of passage based on the described covariance matrix of storage and predetermined output level during described language section;
(g) filter step is promptly carried out filtering to the signal that is obtained in described a plurality of passages respectively by the filter coefficient in described a plurality of passages; With
(h) stack step, the filtering result who is about in described a plurality of passage all superposes, and the output after will superposeing is provided as the transmission signal.
2. sound acquisition methods as claimed in claim 1, it is between covariance matrix storing step and filter coefficient calculation procedure, further comprise the sound level estimating step of obtaining, promptly based on the language of described each sound source being estimated that the sound level that obtains, described filter coefficient calculation procedure afterwards comprise that the sound level that obtains based on the described covariance matrix of storing corresponding to described each sound source and described estimation calculates the filter coefficient of described a plurality of passages so that output level becomes the step of predetermined output level corresponding to the covariance matrix of described each sound source storage.
3. sound acquisition methods as claimed in claim 2, wherein: described condition judgement step comprises to be obtained signal determining according to described a plurality of passages described whether it belongs to the noise determination step of noise section;
Described covariance matrix calculation procedure comprises that also after described noise section was judged, the covariance matrix that calculates the signal that obtains at described noise section was with the step as noise covariance matrix;
Described covariance matrix storing step also is used to store the described covariance matrix corresponding to the described noise section of each sound source; With
Described filter coefficient calculation procedure is used for by calculating the filter coefficient of described a plurality of passages at the covariance matrix of described each sound source of language section and the covariance matrix stored in described noise section, so that the signal level of obtaining of described each sound source is become predetermined output level and noise is attenuated.
4. sound acquisition methods as claimed in claim 3, wherein be used for being arranged in described acoustic space according to the loudspeaker of the signal reproduction voice signal that receives, wherein: described condition judgement step comprises that receiving determination step receives section with the signal determining that basis is transferred to described micropkonic described reception;
Described frequency domain switch process comprises that conversion is transferred to the step of described micropkonic described received signal to frequency-region signal;
Described covariance calculation procedure is according to the described described covariance matrix of described frequency-region signal calculating in described language section and described reception section that obtains the described frequency-region signal of signal and be transferred to described micropkonic described received signal of described a plurality of passages;
Described covariance matrix storing step storage is corresponding to the described covariance matrix and the described covariance matrix in described reception section of each sound source in the language section; With
Described filter coefficient calculation procedure based on the storage in described language section corresponding to the covariance matrix of described each sound source and the covariance matrix at described reception section of storage, calculate the described filter coefficient of described a plurality of passages, so that described each sound source obtained that sound level becomes predetermined output level and noise is attenuated.
5. as the described sound acquisition methods of any claim in the claim 1 to 4, wherein: the number of described sound source is K, and it is equal to or greater than 2; And described filter coefficient calculation procedure is at the weighting C with the sensitivity constraint of K described sound source S1To C SKDistribute to corresponding to calculating described filter coefficient behind the covariance matrix of K described sound source, the described weighting of distributing to described sound source reduces gradually by the language order of described sound source.
6. sound acquisition methods as claimed in claim 4 wherein, supposes that described a plurality of passage is a M passage, and wherein M is equal to or greater than 2, and described filter coefficient calculation procedure is by multiply by (the R based on diagonal element diag to described each covariance matrix XXThe weighting 1/{D that forms of M or M+1 row matrix D (ω)) and arbitrarily TDiag (R XX(
Figure C200480000174C0003110539QIETU
)) D} comes each covariance matrix R of albefaction XX(ω) afterwards, calculate the described filter coefficient of the signal that is used for each signal that obtains and is received, wherein, D TThe transposed matrix of expression D.
7. the described sound acquisition methods of any claim in the claim 1 to 4, the covariance matrix that wherein said covariance matrix storing step will be stored in the past and be averaged and will be averaged by the covariance matrix that described covariance matrix calculation procedure newly calculates after covariance matrix store as current covariance matrix.
8. sound deriving means, the microphone of its a plurality of passages by being placed on acoustic space obtains voice signal from each sound source, comprising:
Condition judgement portion comprises the language determination portion, is used for whether it belongs to the language section according to the signal determining that microphone obtained of described a plurality of passages;
Sound source position detection portion is used for being judged the position of back from described each sound source of signal detection obtained when the language section by described language detection unit;
The frequency domain converter section, the conversion of signals that is used for being obtained is a frequency-region signal;
The covariance matrix calculating part is used to calculate the covariance matrix of described frequency-region signal of the signal that is obtained of described a plurality of passages;
The covariance matrix storage part is used for based on the result that described sound source position detection portion surveys described each sound source being stored described covariance matrix;
The filter coefficient calculating part is used for calculating the filter coefficient of described a plurality of passages so that the transmission signal level of described each sound source becomes the level of expectation by the described covariance matrix that utilizes storage;
The filter of described a plurality of passages is used to utilize the filter coefficient of described a plurality of passages respectively the signal that obtains from described microphone to be carried out filtering; With
Adder, be used for the output of the described filter of described a plurality of passages all stack and will superpose after output be provided as the transmission signal.
9. sound deriving means as claimed in claim 8, it further comprises:
Obtain sound level estimation portion, from the sound level of corresponding to the described covariance matrix of described each sound source storage described each sound source being estimated to obtain, and wherein the filter coefficient calculating part is used for the sound level that obtains according to described estimation, to covariance matrix weights assigned, calculate the described filter coefficient of described a plurality of passages afterwards so that the transmission signal level of described each sound source becomes predetermined output level corresponding to described each sound source.
10. a sound acquisition methods is used for obtaining voice signal by the microphone of at least one passage in the acoustic space from least one sound source, and loudspeaker in this acoustic space according to the signal reproduction voice signal that receives, comprising:
(a) condition judgement step is promptly according to the signal determining language section of signal that is obtained by the described microphone of described at least one passage and described reception with receive section;
(b) frequency domain switch process is a frequency-region signal with the conversion of signals of described signal that obtains and described reception respectively promptly;
(c) covariance matrix calculation procedure is promptly calculated covariance matrix at described reception section according to the described signal frequency-domain calculated signals of obtaining at the covariance matrix of described language section with according to the described frequency-region signal of the signal of described reception;
(d) covariance matrix memory phase is promptly respectively to described language section and the described covariance matrix of described reception storage of sectors;
(e) filter coefficient calculation procedure, promptly based in described language section and the covariance matrix of the storage of the section that the receives calculating described filter coefficient that obtains signal that is used for described at least one passage be used for the filter coefficient of described received signal can be eliminated so that be included in the described part that obtains the received signal in the signal;
(f) filter step promptly utilizes the filter coefficient of described received signal and the described described filter coefficient that obtains signal of described at least one passage that described received signal and the described signal that obtains are carried out filtering; With
(g) stack step is about to by described received signal and described output after obtaining signal that signal carries out the filtering that filtering produces and being superimposed and will superposeing are provided as the transmission signal.
11. as the sound acquisition methods of claim 10, wherein: described condition judgement step comprises from the described step of judging the noise section signal and the described received signal of obtaining; Described covariance matrix calculation procedure is included in the step that described noise section calculates covariance matrix; Described covariance matrix storing step comprises the step of the described covariance matrix that is stored in above-mentioned noise section; And described filter coefficient calculation procedure is based at described language section, the covariance matrix that described reception section and described noise section are stored calculate be used for described at least one passage described obtain the described filter coefficient of signal and described received signal described filter coefficient so that echo and noise be eliminated.
12. sound acquisition methods as claim 10, wherein said microphone is provided to obtain voice signal from a plurality of sound sources in each of at least one passage, and it further comprises the sound position detection steps, when described language section is judged by described condition judgement step according to obtaining the acquisition of signal sound source position from the microphone of described at least one passage; And described covariance matrix storing step is wherein stored corresponding to the sound source position of described detection and the described covariance matrix of described reception section based on the result of determination of described condition judgement step and the sound source position of described detection.
13. as the sound acquisition methods of claim 12, wherein said filter coefficient calculation procedure is at the weighting C with the sensitivity constraint of K sound source position S1To C SKDistribute to corresponding to calculating described filter coefficient behind the covariance matrix of each sound source, the described weighting of distributing to described sound source position reduces gradually by the language order of described sound source.
14. as the described sound acquisition methods of any claim in the claim 10 to 13, wherein, described at least one passage is that M passage and M are equal to or greater than 2, and described filter coefficient calculation procedure is by multiply by based on each covariance matrix R described covariance matrix XXDiagonal element diag (R (ω) XXThe weighting 1/{D that forms of M or M+1 row matrix D (ω)) and arbitrarily TDiag (R XX(
Figure C200480000174C0006110646QIETU
)) D} comes described each the covariance matrix R of albefaction XX(ω) afterwards, calculating is used for the described described filter coefficient that obtains signal and described received signal, wherein, and D TThe transposed matrix of expression D.
15. as the described sound acquisition methods of any claim in the claim 10 to 13, the covariance matrix that wherein said covariance matrix storing step will be stored in the past and be averaged and will be averaged by the covariance matrix that described covariance matrix calculation procedure newly calculates after covariance matrix store as current covariance matrix.
16. a sound deriving means comprises:
The microphone of at least one passage is used for obtaining voice signal and being used to export the signal that obtains from sound source;
Loudspeaker is used to reproduce the voice signal of reception;
Condition judgement portion is used for signal determining language section and reception section according to described signal that obtains and reception;
The frequency domain converter section is used for respectively obtaining signal and described received signal is converted to frequency-region signal with described;
The covariance matrix calculating part is used for respectively described language section and described reception section are calculated the covariance matrix of the signal frequency-domain signal of described covariance matrix that obtains the signal frequency-domain signal and described reception;
The covariance matrix storage part is used to be respectively described language section and the described covariance matrix of described reception storage of sectors;
The filter coefficient calculating part, be used for covariance matrix based on described storage and be described at least one passage obtain the calculated signals filter coefficient and for described received signal calculating filter coefficient to eliminate the described composition that echoes that obtains the described received signal of signal;
Obtain traffic filter and received signal filter, obtain signal and described received signal has been set filter coefficient for described within it, be respectively applied for described signal and the described received signal of filtering obtained of filtering; With
Adder is used for all superposeing described output of obtaining traffic filter and described received signal filter, and is used to provide signal after the stack as sending signal.
17. sound deriving means as claim 16, wherein, described at least one passage is a plurality of passages, described microphone and the described traffic filter that obtains all are provided in each of at least one passage, and described adder described at least one passage described obtained the output of traffic filter and described received signal filter output all stack and will superpose after output be provided as the transmission signal.
18. as the sound deriving means of claim 17, wherein: described condition judgement portion comprises that the noise detection unit is used for obtaining signal and described received signal is judged the noise section from described; Described covariance matrix storage part also is used at the described covariance matrix of described noise storage of sectors; And described filter coefficient calculating part is used for calculating the filter coefficient of described at least one passage so that the echo of described received signal and noise are eliminated and the filter coefficient that set-up and calculated goes out in the described filter of described at least one passage based on the covariance matrix of described storage.
19. as the sound deriving means of claim 18, it comprises that further sound source detection portion is with the position that obtains an acquisition of signal K sound source based on described at least one passage; And described covariance calculating part wherein is used at described language section each sound source being calculated covariance matrix; Described covariance matrix storage part is used at the described covariance matrix of described language storage of sectors corresponding to each sound source; And described filter coefficient calculating part is included in the weighting C with the sensitivity constraint of each described sound source S1To C SKDistribute to corresponding to calculating described filter coefficient behind the covariance matrix of each described sound source, the described weighting of distributing to described sound source reduces gradually by the language order of described sound source.
CNB2004800001742A 2003-02-07 2004-02-06 Sound collecting method and sound collecting device Expired - Lifetime CN100534001C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2003030676 2003-02-07
JP030676/2003 2003-02-07
JP058626/2003 2003-03-05

Publications (2)

Publication Number Publication Date
CN1698395A CN1698395A (en) 2005-11-16
CN100534001C true CN100534001C (en) 2009-08-26

Family

ID=35350229

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004800001742A Expired - Lifetime CN100534001C (en) 2003-02-07 2004-02-06 Sound collecting method and sound collecting device

Country Status (1)

Country Link
CN (1) CN100534001C (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009126561A1 (en) * 2008-04-07 2009-10-15 Dolby Laboratories Licensing Corporation Surround sound generation from a microphone array
GB2493327B (en) 2011-07-05 2018-06-06 Skype Processing audio signals
GB2495128B (en) 2011-09-30 2018-04-04 Skype Processing signals
GB2495129B (en) 2011-09-30 2017-07-19 Skype Processing signals
GB2495472B (en) 2011-09-30 2019-07-03 Skype Processing audio signals
GB2496660B (en) 2011-11-18 2014-06-04 Skype Processing audio signals
GB201120392D0 (en) 2011-11-25 2012-01-11 Skype Ltd Processing signals
GB2497343B (en) 2011-12-08 2014-11-26 Skype Processing audio signals
CN103428607A (en) * 2012-05-25 2013-12-04 华为技术有限公司 Audio signal playing system and electronic device
DK3190587T3 (en) * 2012-08-24 2019-01-21 Oticon As Noise estimation for noise reduction and echo suppression in personal communication
JP6834971B2 (en) * 2015-10-26 2021-02-24 ソニー株式会社 Signal processing equipment, signal processing methods, and programs
US10104472B2 (en) * 2016-03-21 2018-10-16 Fortemedia, Inc. Acoustic capture devices and methods thereof
CN111405416B (en) * 2020-03-20 2022-06-24 北京达佳互联信息技术有限公司 Stereo recording method, electronic device and storage medium
CN115497438B (en) * 2022-11-14 2023-02-17 厦门视诚科技有限公司 Device and method for rapidly solving digital volume approximate value in audio recording or playing

Also Published As

Publication number Publication date
CN1698395A (en) 2005-11-16

Similar Documents

Publication Publication Date Title
CN100534001C (en) Sound collecting method and sound collecting device
EP1592282B1 (en) Teleconferencing method and system
CN101682809B (en) Sound discrimination method and apparatus
CN104186001B (en) Designed using the audio Compensatory Control device for the variable set for supporting loudspeaker
CN100512509C (en) Method for designing digital audio precompensation filter and system thereof
CN102810325B (en) Being automatically adjusted of velocity correlation balance control system
US8355510B2 (en) Reduced latency low frequency equalization system
US4480333A (en) Method and apparatus for active sound control
US8120993B2 (en) Acoustic treatment apparatus and method thereof
US10008993B2 (en) Amplifier current consumption control
CN102947685A (en) Method and apparatus for reducing the effect of environmental noise on listeners
KR20030066609A (en) Method for apparatus for audio matrix decoding
JP2001309483A (en) Sound pickup method and sound pickup device
JP2004349806A (en) Multichannel acoustic echo canceling method, apparatus thereof, program thereof, and recording medium thereof
JP3069535B2 (en) Sound reproduction device
JP4119328B2 (en) Sound collection method, apparatus thereof, program thereof, and recording medium thereof.
JP2003533109A (en) Receiving system for multi-sensor antenna
CN1294556C (en) Voice matching system for audio transducers
JP2003250193A (en) Echo elimination method, device for executing the method, program and recording medium therefor
JPWO2009025023A1 (en) Sound image localization prediction apparatus, sound image localization control system, sound image localization prediction method, and sound image localization control method
JP4298466B2 (en) Sound collection method, apparatus, program, and recording medium
JP3451022B2 (en) Method and apparatus for improving clarity of loud sound
JP4306815B2 (en) Stereophonic sound processor using linear prediction coefficients
JPH04295727A (en) Impulse-response measuring method
CN117292698B (en) Processing method and device for vehicle-mounted audio data and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20090826

CX01 Expiry of patent term