CN1672464A - Audio channel spatial translation - Google Patents

Audio channel spatial translation Download PDF

Info

Publication number
CN1672464A
CN1672464A CNA03817877XA CN03817877A CN1672464A CN 1672464 A CN1672464 A CN 1672464A CN A03817877X A CNA03817877X A CN A03817877XA CN 03817877 A CN03817877 A CN 03817877A CN 1672464 A CN1672464 A CN 1672464A
Authority
CN
China
Prior art keywords
input signal
input
level
correlation
scaling factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA03817877XA
Other languages
Chinese (zh)
Other versions
CN1672464B (en
Inventor
马克·富兰克林·戴维斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority claimed from PCT/US2003/024570 external-priority patent/WO2004019656A2/en
Publication of CN1672464A publication Critical patent/CN1672464A/en
Application granted granted Critical
Publication of CN1672464B publication Critical patent/CN1672464B/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/02Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/02Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo four-channel type, e.g. in which rear channel signals are derived from two-channel stereo signals

Abstract

Using an M:N variable matrix, M audio input signals, each associated with a direction, are translated to N audio output signals, each associated with a direction, wherein N is larger than M, M is two or more and N is a positive integer equal to three or more. The variable matrix is controlled in response to measures of: (1) the relative levels of the input signals, and (2) the cross-correlation of the input signals so that a soundfield generated by the output signals has a compact sound image in the nominal ongoing primary direction of the input signals when the input signals are highly correlated, the image spreading from compact to broad as the correlation decreases and progressively splitting into multiple compact sound images, each in a direction associated with an input signal, as the correlation continues to decrease to highly uncorrelated.

Description

The audio track space conversion
Technical field
The present invention relates to Audio Signal Processing.More specifically, the present invention relates to M audio frequency input sound channel of expression sound field is converted to N audio frequency output channels of the same sound field of expression, wherein each sound channel is the independent audio stream of the audio frequency that arrived at by a direction of expression, M and N are positive integer, and M be at least 2 and N be at least 3, and N is greater than M.Typically, N is characterized as being " decoder " usually greater than the space convertor of M.
Background technology
Though the mankind have only two ears, we can recognize the sound of actual three-dimensional, and this depends on a plurality of positioning indicatings, such as head relevant transfer function (HRTF) and head movement.So audio reproduction true to nature fully requires to keep and reproduce full three-dimensional sound field, perhaps needs appreciable prompting at least.Unfortunately, the SoundRec technology does not fit into obtains three-dimensional sound field, does not also fit into and obtains two dimensional surface sound, even is not adapted to obtain one dimension straight line sound.Current SoundRec technology only is suitable for obtaining, preserving and present the discrete tone sound channel of zero-bit.
Since Edison invention SoundRec, concentrated on the defective of the cylinder/disk media that overcomes its original simulation track modulation mostly about the effort that improves fidelity.These defectives comprise limited and uneven frequency response, noise, distortion, change voice, flutter, velocity accuracy, wearing and tearing, dirt and duplicate infringement.Though some discrete effort for local improvement have been arranged, comprise the cassette player that electronics amplification, magnetic recording, minimizing noise and price are also higher than some automobile, but the traditional problem of each sound channel quality has been up to having researched and developed general digital record, especially introduces all to have arguement before the audio frequency cd and finally be not resolved.From having researched and developed digital record particularly since the audio frequency cd, except the quality that further expands digital record to some effort of 24 bits/96 KHz sampling, concentrate in the main effort aspect the audio reproduction research and to be reduced to the data volume that keeps each sound channel quality required-mostly adopt appreciable encoder, and improve the space fidelity.This back theme that problem is this paper.
The effort of room for improvement fidelity is carried out along two lines: attempt to transmit the perception prompting of whole sound field, and it is approximate to attempt to transmit of actual original sound field.Adopt the system example of last processing to comprise the dual track record and based on the virtual surround system of two loud speakers.There are a plurality of unfortunate defectives in these systems, especially aspect the sound of locating reliably on some direction, and are requiring to use earphone or independent fixing listening to aspect the position.
No matter in a room still such as cinema in such commercial location, for reproduction of stereo is given the multidigit audience, it is to attempt approximate actual original sound field that unique feasible replacement is handled.If the discrete channels characteristic of given SoundRec, this is can be not astonishing: great majority make great efforts to comprise the so-called quantity of reproducing sound channel that conservatively increases at present.Representative system comprises the five-sound channel discrete magnetic track on early stage the moving fifties-monophony three loud speaker cinefilm tracks, the quadraphonic system of traditional stereo, sixties, the 70 millimeters cinefilms, the Dolby Surround that uses matrix the seventies, the AC-35.1 sound channel surround sound of the nineties and recent around-EX 6.1 sound channel surround sounds." Dolby " (Doby), " Pro Logic " and " Surround EX " (around-EX) be the trade mark of Dolby Laboratories Licensing Corp..In varying degrees, these systems provide the spatial reproduction that has strengthened than mono reproduction.Yet the audio mixing of a large amount of sound channels has caused more time and cost burden on one's body contents producer, and the perception typical case who causes is one in the discrete channels of several dispersions, rather than a continuous sound field.Described the Pro Logic decoding of Dolby in United States Patent (USP) 4799260, the full content of this patent comprises incorporated by reference at this.The detailed content of AC-3 (also can be at the Web address of internet at the document A/52 " digital audio compression standard (AC-3) " of advanced television system committee (ATSC) announcement in 20 days December nineteen ninety-five Www.atsc.org/standards/A52/a 52/docObtain) the middle description.Also can (can be at the Web address of internet referring to the errata on July 22nd, 1999 Www.dolby.com/tech/ATSC err.pdfObtain).
In case sound field is characterized, possible on the principle: decoder produces optimum signal arbitrary output loud speaker of feeding.The sound channel that is fed to such decoder is referred to as " substantially ", " being transmitted " and " input " sound channel in the different places of this paper, and any delivery channel that the position does not correspond to the position of a sound channel in the input sound channel will be referred to as " centre " sound channel.An output channels also can have one and the corresponding to position of input sound channel.
Summary of the invention
According to a first aspect of the invention, a kind of processing that is used for M audio input signal is converted to N audio output signal, each input signal and a directional correlation connection, each output signal and a directional correlation connection, wherein N is greater than M, M be at least 2 and N for being 3 positive integer at least, this processing comprises: M:N is provided variable matrix, use M audio input signal and give this variable matrix, from variable matrix, derive N audio output signal, and control this variable matrix in response to input signal, during with convenient input signal height correlation, the sound field that output signal produced has compact acoustic image in basic orientation is advanced in the appointment of input signal, along with this acoustic image of minimizing of the degree of correlation is expanded to wide from compactness, and progressively be separated into a plurality of compact acoustic images, each acoustic image with direction that input signal is associated in continue to reduce to highly uncorrelated with the degree of correlation.
According to a first aspect of the invention, can control variable matrix in response to following tolerance: the cross-correlation of the relative level of (1) input signal and (2) input signal.In this case, in order to measure cross-correlation with the input signal that is worth in first scope, this scope is used as boundary by maximum and reference value, when the tolerance of cross-correlation is maximum, sound field can have compact acoustic image, when the tolerance of cross-correlation is reference value, sound field then can have wide expansion acoustic image, and in order to measure cross-correlation with the input signal that is worth in second scope, this scope is used as boundary by reference value and minimum value, when the tolerance of cross-correlation is reference value, sound field can have wide expansion acoustic image, when the tolerance of cross-correlation was minimum value, it can have the sound field of a plurality of compactnesses, and each sound field is arranged in the direction that is associated with input signal.
According to a further aspect of the invention, a kind of processing that is used for M audio input signal is converted to N audio output signal, each input signal and a directional correlation connection, each output signal and a directional correlation connection, wherein N is greater than M, M is 3 at least, this processing comprises: a plurality of m:n variable matrixes are provided, here m is the subclass of M and the subclass that n is N, from each variable matrix, derive each subclass of N audio output signal, in response to the subclass of the input signal that is applied to matrix to control each variable matrix, during with these input signal height correlations of box lunch, the sound field that each subclass of the output signal of deriving from matrix produces, in advancing basic orientation, the appointment of the input signal subclass that is applied to matrix has compact acoustic image, along with this acoustic image of minimizing of the degree of correlation is expanded to wide from compactness, and progressively be separated into a plurality of compact acoustic images, each acoustic image with direction that the input signal that is applied to this matrix is associated in, continue to reduce to highly uncorrelated with the degree of correlation, and from the subclass of N audio frequency output channels, derive N audio output signal.
According to a further aspect of the invention, can also control variable matrix in response to the information of the influence that compensates one or more other variable matrixes that receive same input signal.And, from the subclass of N audio frequency output channels, derive N audio output signal and can also comprise a plurality of variable matrixes that replenish the same output signal of generation.This on the one hand can control each variable matrix in response to following tolerance according to the present invention: the relative level and (b) cross-correlation of input signal that (a) are applied to the input signal of this matrix.
According to a further aspect of the invention, a kind of processing that is used for M audio input signal is converted to N audio output signal, each input signal and a directional correlation connection, each output signal and a directional correlation connection, wherein N is greater than M, M is 3 at least, this processing comprises: the scaling factor in response to gating matrix coefficient or gating matrix output provides a M:N variable matrix, use M audio input signal and give this variable matrix, a plurality of m:n variable matrix scaling factor generators is provided, here m is the subclass of M and the subclass that n is N, and each subclass of using M audio input signal is given each variable matrix scaling factor generator, for each subclass of N audio output signal is derived one group of variable matrix scaling factor from each variable matrix scaling factor generator, control each variable matrix scaling factor generator in response to the subclass of the input signal that is applied to matrix, when the scaling factor that it is produced with box lunch is applied to the M:N variable matrix, the sound field that each subclass of the output signal that is produced produces is when these input signal height correlations, in advancing basic orientation, the appointment that produces and use the input signal subclass of scaling factor has compact sound field, along with this acoustic image of minimizing of the degree of correlation is expanded to wide from compactness, and progressively be separated into a plurality of compact acoustic images, each acoustic image is in the direction that the input signal with this application scaling factor of generation is associated, continue to reduce to very uncorrelated with the degree of correlation, and derive N audio output signal in the variable matrix.
According to a further aspect of the invention, can also control variable matrix scaling factor generator in response to the information of the influence that compensates one or more other variable matrix scaling factor generators that receive same input signal.And, from variable matrix, derive N audio output signal and can comprise a plurality of variable matrix scaling factor generators that are compensated for as same output signal generation scaling factor.Can be in response to following tolerance control variable matrix scaling factor generator according to this aspect of the present invention: (a) use the relative level of the input signal of giving this generator and (b) cross-correlation of input signal.
According to the present invention, M audio frequency input sound channel of expression sound field is converted to N audio frequency output channels of the same sound field of expression, and wherein each sound channel is to represent the independent audio stream of the audio frequency that arrives at from a direction, and M and N are positive integers, and M be at least 2 and N be 3 at least, and N is greater than M.Each input and output sound channel has a direction that is associated (for example, orientation, height and optionally be distance are to consider nearer or farther virtual or reflection sound channel).Produce one or more groups output channels, every group of sound channel has one or more output channels.Every group of sound channel is associated with adjacent input sound channel on two or more spaces usually, and each output channels in a group is to measure or the mutual degree of a relation amount of level of two or more input sound channels produces by the cross-correlation of determining two or more input sound channels.The tolerance of cross-correlation is the tolerance of zero time offset cross-correlation preferably, and this tolerance is public energy level and the ratio of input signal energy level geometric average.Public energy level that public energy level is preferably level and smooth or average and input signal energy level are level and smooth or average input signal energy levels.
In one aspect of the invention, many group output channels can be associated with the input sound channel more than two, and a kind of processing can be according to the correlation of definite those input sound channels that are associated with every group of output channels of hierarchical order, so that according to the input sound channel quantity that is associated with output channels every group or each group that sorts, the input sound channel of number maximum has the highest rank, and to handle each group in regular turn according to their hierarchical order.In addition according to an aspect of the present invention, processing procedure is considered the result that the group of higher-order is handled.
The hypothesis expression of playback of the present invention or decoding aspect arrive at from a direction audio frequency M audio frequency input sound channel each by each source side to a passive matrix recently-it is adjacent that amplitude-following encodes produces (supposing that promptly a source side is to mainly being mapped to nearest input sound channel), and do not need additional side chain information (use of side chain or supplementary is optional), thereby it is compatible mutually with existing audio mixing technology, control desk and form.Though these source signals can produce by directly using a passive encoder matrix, these source signals of the intrinsic generation of most of traditional recording techniques (so, constitute one " efficient coding matrix ").Playback of the present invention or decoding aspect are also mostly compatible with the source signal that writes down naturally, such as signal with the directional microphone of 5 reality record, because allow some possible delay, the sound that arrives at from middle direction tends to mainly be mapped to the most contiguous microphone (a horizontal array, particularly being mapped in the most contiguous microphone).
The decoder of aspects or decoding processing may be implemented as the grid of the processing module or the functions of modules (being called " module " or " decoder module " down) of coupling according to the present invention, each decoder module be used for typically from two or more spaces that this decoder module is associated on the most contiguous adjacent input sound channel produce one or more output channels (perhaps replacedly, can be used for producing the control signal of one or more output channels).The relative scale of the most contiguous adjacent input sound channel sound intermediate frequency signal on output channels canonical representation and the space that specific decoder module is associated.As following explain in detail like that, share in module and to import and have a mutual loose couplings of decoder module on the meaning of decoder module classification.Module by according to the number order of classification of its input sound channel that is associated (have a module of the relevant input sound channel of maximum numbers or a plurality of module have high-order).A monitor functions is so managed these modules, so that share public input signal liberally between these modules, and the decoder module of higher-order can influence the output of lower-order module.
Each decoder module can comprise a matrix effectively, make it directly produce output signal, perhaps each decoder module can produce control signal, the control signal one that these control signals and other decoder modules produce be used from the coefficient that changes a variable matrix or change be input to a fixed matrix or from the scaling factor of a fixed matrix output, so that produce all output signals.
The work of decoder module imitation people ear is to attempt to provide appreciable transparent reproduction.According to wherein decoder module and functions of modules is each frequency band that the conversion of signals of the present invention of an aspect can be applicable to broadband signal or multiband processor, this depends on realization, and can carry out once or each sampling square is carried out once in each sampling.Multiband embodiment can the filter application group, such as discrete critical band bank of filters or its band structure bank of filters with relevant decoder compatibility, or a kind of mapped structure, such as FFT (fast Fourier transform) or MDCT (discrete cosine transform of correction) linear filter group.
Another aspect of the present invention is and can arrives actual numerical value to reduce the loudspeaker quality that receives N output channels, i.e. the perception acoustic image that forms on the locus except placing loud speaker by the suitable virtual mapping that depends on.Though the most logical general application of virtual mapping is by the track at a reflection between mobile two loud speakers of a monophonic signal stereoscopic rendering between the sound channel, desired as one aspect of the invention, virtual mapping can comprise that reproducing phantom shines upon acoustic image, and this acoustic image provides the impression of the sense of hearing outside the wall that surmounts the room or within the wall of room.Virtual mapping is not considered to a kind of feasible technology for the group of the rare sound channel of number reproduces, because it requires the listener equidistant or approximate equidistantly with two loud speakers.For example, the loud speaker of middle at the cinema left front and right front is too far away apart for the useful mirage that most of audiences obtain a central acoustic image, therefore with regard to the importance of the center channel in many dialogues source, replace using the center loudspeaker of a physics.
Along with the increase of loud speaker density, for most of audiences, for the level and smooth scope that moves, the position that can occur virtual mapping between any a pair of loud speaker can reach at least; When loud speaker was enough big, the gap between the loud speaker was no longer perceived.
Signal distributions
As mentioned above, in cross-correlation metric determination module the ratio of advantage (common signal component) and non-advantage (non-common signal component) energy and between the output channels of module the degree of expansion of non-advantage signal component.By in the signal distributions that is used for considering under the unlike signal condition of two input modules the output channels of module, can better understand above-mentioned aspect.Unless being otherwise noted that the principle of being set forth can directly be expanded uses the more module of high-order.
The problem of signal distributions is that signal itself then still less usually for the information of recovering the primary signal distribution of amplitudes very little.Available essential information is the signal level of each module input and the averaged cross-product of input signal, public energy level.The zero time offset cross-correlation is the ratio of public energy level with respect to the geometric average of input signal energy level.
The validity of cross-correlation is on its function as the tolerance for the public signal component net amplitude of all inputs.If exist in the independent signal (" inside " or " centre " signal) that moves Anywhere between the module input, all inputs will have identical waveform, but may have different amplitudes, and under these conditions, the degree of correlation will be 1.0.Extreme at another, if all input signals are independently, mean there is not the common signal component that the degree of correlation will be 0 so.Correlation between 0 and 1.0 can be considered to corresponding to output some separately, the intermediate equilibria level of common signal component and independent signal component.Therefore, any input signal condition can be divided into common signal, " advantage " signal, and form remaining input signal component and comprise " all remaining " signal component (" non-advantage " or residual signal energy) deducting common signal.As mentioned above, public or " advantage " signal amplitude needn't be higher than remaining or edominant signal level.
For example, consideration is mapped to the situation of the circular arc of right five sound channels (L (left side), MidL (a middle left side), C (centre), MidL (the middle right side), R (right side)) of independent Lt/Rt (left side is complete and entirely right), wishes five sound channels recovering original.If all five independent signals that sound channel has equal amplitudes, Lt and Rt will amplitude equate so, and have median corresponding to the public energy of cross correlation value between zero-sum 1 (because Lt and Rt are not independently signal).Utilize L, the C of suitably selection and the level of R, and be not used to obtain identical level from the signal of MidL and MidR.Therefore, two inputs, five output modules may be after Lt and Rt input remove the C energy, only present corresponding to the output channels of principal direction (in this example for C) with corresponding to the output channels of input signal residue (L, R), not give MidL and any signal of MidR output channels.A kind of result so is not expectation-unnecessarily turn off almost always a kind of bad selection of sound channel, because the little interference in the signal conditioning will cause that " shutoff " sound channel is to the trigger between opening and turn-offing, and cause tedious rattle (" vibration " is the sound channel of a kind of quick unlatching and shutoff), particularly when hearing that " shutoff " sound channel is in isolation.
Therefore, when a plurality of possible output signal that is used for given pack module input signal values when existence distributes, from the viewpoint of each sound channel quality, this conservative solution is to expand non-advantage signal component as far as possible equably between the output channels of module, with consistent with signal conditioning.One aspect of the present invention is to separate according to three modes under signal conditioning rather than " advantage " separated than two modes of " all residues " and expanded available signal energy equably.Preferably, three modes are separated and are comprised advantage (public) signal component, filling (evenly expansion) signal component and input signal component residue.Unfortunately, have only enough information two modes of carrying out to separate (advantage signal component and every other signal component).Here described the solution that a kind of suitable realization three modes are separated, for the correlation that surpasses particular value, two modes are separated the non-advantage signal component of application advantage and expansion in the method; For the correlation that is lower than this value, two modes are separated the non-advantage signal component and the remaining component of application extension.The common signal energy is separated between " advantage " and " evenly expansion "." evenly expansion " component comprises the signal component of " public " and " residue ".So " expansion " comprises mixing public (being correlated with) and remaining (incoherent) signal component.
Before handling,, calculate correlation corresponding to all output channels that receive the same signal amplitude for the given I/O channel structure of given module.This correlation can be referred to as " at random _ xcor " value.Output channels and two input sound channels in the middle of deriving for independent, central authorities, at random _ value of xcor may be calculated 0.333.For three equal expansion intermediate channel and two input sound channels, at random _ value of xcor may be calculated 0.483.Provide gratifying result though have been found that these time values, they still are not conclusive.For example, can use approximately is respectively 0.3 and 0.5 value.In other words, there is the certain relevant degree of M input in the module for having M input and N output, and thinks that it can represent all N the equal energy in the output.Whether this can use the passive N of N the independent signal that receives equal energy to derive them to Metzler matrix and reach by considering M input, although actual certainly input can be derived by other modes.This thresholding correlation is " at random _ xcor ", and it can represent two boundaries between the operating state.
Then, in processing procedure, if the cross correlation value of module more than or equal at random _ the xcor value, then it is targeted at 1.0 to 0 scope.
Calibration _ xcor=(relevant-at random _ xcor)/(1-" calibration _ xocr " value representation at random _ xcor) is higher than the amount of the advantage signal of even expansion level.No matter last is what can equally be distributed in other output channels of module.
But, other factors that existence should be considered, i.e. more obviously offset from center because the basic orientation of advancing of the appointment of input signal becomes, if therefore keep equal distribution for all output channels, then should significantly reduce the quantity of expansion energy, perhaps replacedly, if should keep expanding the quantity of energy, but then should reduce the energy that delivery channel distributes-in other words with respect to " offset from center " of advantage energy, reduce gradually along the energy of output channels.In the latter case, may need other to handle complexity and equal intake to keep the output energy.
On the other hand, if current correlation less than at random _ the xcor value, think that then the advantage energy should be zero, reduce evenly expansion energy significantly, and allow remaining residual signal to accumulate at input.Relevant=zero the time, do not have internal signal, and only map directly to the independently input signal of output channels.
The operation of this respect of the present invention can be further as explained below:
A) when actual correlation greater than at random _ during xcor, the advantage signal that exists enough public energy to be considered as between two adjacent outputs to introduce (moving) (perhaps, if certainly its direction by chance with the corresponding to words of the direction of an output, it can be fed to an output); From input, deduct the energy of distributing to this signal to be given in (evenly preferred) remainder of distribution energy between all outputs.
B) when actual correlation accurately at random _ during xcor, intake (may be considered to all dump energies) be evenly distributed in all export between (Here it is at random _ xcor definition).
C) when actual correlation less than at random _ during xcor, do not have the enough public energy that are used for advantage signal, therefore the energy of input is distributed between the output according to what ratio.If a kind of treated relevant portion is as dump energy, intake then will be evenly distributed between all outputs, and uncorrelated part is more as a large amount of advantage signals that will send to corresponding to the output of input direction.Relevant be under zero the extreme case, each input only is fed to an outgoing position (a normally output in each output, but it can be as the shift position between two outputs).
Therefore, between relevant fully, exist continuously, and have an independent signal that moves according between two outputs of relative energy of input, and by have in all output equally distributed input at random _ xcor, to having M zero correlation that independently is fed to the input of M outgoing position.
The reciprocation compensation
As mentioned above, can think to comprise the grid of " module " according to the sound channel conversion of one aspect of the invention.Therefore a plurality of modules can be shared a given input sound channel, and also possible reciprocation between module is unless and may degrade performance use certain compensation.Although can not separate each signal according to the module that signal institute " follows " at input sound channel usually, estimate that the number of the employed input signal of each link block can improve the relevant and directivity estimation that is produced, to cause improving whole performance.
As mentioned above, have two types module interaction: the module that is included in public or low classification level (promptly, module with similar amt input or less input), be referred to as " neighbours ", and than the module of the higher classification level of given module (have more many input), but share one or more public inputs, be referred to as " more high-order neighbours ".
The first adjacent compensation of consideration on public classification level.In order to understand the problem that adjacent reciprocation is brought, consider independently two input modules, it has identical L/R (left side and right) input signal A.This is corresponding to independent advantage (public) signal in the middle of the input.Public energy is A 2And correlation is 1.0.Suppose second liang of input module, it has common signal in its L/R input, B, and public energy is B 2, and correlation also is 1.0.If two modules all are connected in public input, then the signal of this public input will be A+B.Suppose that signal A and B are independently, the average product of AB will be zero so, and therefore the public energy of first module will be A (A+B)=A 2+ AB=A 2And the public energy of second module will be B (A+B)=B 2+ AB=B 2Therefore, as long as adjacent block is handled independently signal, so public energy just is not subjected to the influence of adjacent block.This is a correct hypothesis normally.If each signal is not independently, but identical or shared at least basically common signal component, so system will react with a kind of and the corresponding to mode of response of the ear-promptly, public input will be bigger, the acoustic image that is produced to cause pulls to public input.In this case, be offset the L/R input range ratio of each module, because the signal amplitude of public input (A+B) is higher than the signal amplitude of outside input, this will cause directivity to estimate the public input of deflection.In this case, the correlation of all modules now some less than 1.0, because two waveform differences that input is right.Because correlation has determined to expand the degree of non-common signal component and the ratio of advantage (common signal component) and non-advantage energy (non-common signal component), the non-common signal that the public input signal of not compensated has caused expanding each module distributes.
In order to compensate, estimate to belong to the tolerance of " public incoming level " of each input of each module, notify the total amount of this public incoming level energy of all adjacent levels of the relevant same classification level that is positioned at each module input of each module then.Here described the method for public incoming level tolerance that two kinds of calculating belong to each input of a module: wherein a kind of method is according to the public energy of module input (describing in next section usually), but another kind method is more accurate requires more computational resource, and this method is based on whole energy of inside modules output (hereinafter in conjunction with Fig. 6 A structrual description).
The first method of public incoming level tolerance that belongs to each input of a module according to calculating, the analysis of module input signal does not directly allow to solve the public incoming level in each input, a part of having only whole public energy, this part are the geometric averages of public intake level.Owing to can not surpass the measured and known gross energy level in this input at the public intake level of each input, therefore whole public energy are following limit, and the observed proportional public incoming level of having estimated of incoming level of process by calibration.In case be overall (no matter the tolerance of public incoming level is based on first kind of account form or second kind of account form) that all modules in the grid have been calculated public incoming level, then notify the whole public incoming level of relevant all adjacent blocks in each input of each module, be referred to as the amount of " adjacent levels " that be positioned at its each input module.Module deducts adjacent levels deriving the incoming level of compensation from the incoming level kind that is positioned at its each input then, and the incoming level of this compensation is used to calculate correlation and direction (appointment of input signal advance basic orientation).
For above-described example, adjacent levels initially is zero, and this is because public input has more signal than the end points input, and first module requires to surpass A at the public intake level of this input 2, first module requires to surpass B at the public incoming level of same input 2Because all require to have surpassed at the available energy level of this input, this requirement then is restricted to about A respectively 2And B 2Owing to do not have other modules to be connected to this public input, so each public incoming level is corresponding to the adjacent levels of other modules.Therefore, the compensation intake level seen of first module is
(A 2+B 2)-B 2=A 2
And second the compensation intake level seen of module be:
(A 2+B 2)-A 2=B 2
But these only are the level that has utilized these module independent observations to arrive.Therefore, the correlation of generation will be 1.0, and as desirable, predominant direction will concentrate on suitable amplitude.But, the signal itself that recovers will can not be that the output of fully independently-first module will have some B signal components, vice versa, but this is a kind of restriction of matrix system, if and this processing of execution on the basis of multiband, the signal component of mixing will be positioned at similar frequency place, to reflect certain pendent difference between these signal components.Under complicated situation more, compensation usually will can be inaccurate, but the experience by this system shows the influence that the compensation in the middle of actual can alleviate the interactive overwhelming majority of adjacent block.
Set up the principle and the signal that in the adjacent levels compensation, use, considerably direct to the expansion of the more adjacent levels compensation of high-order.This is applied to be positioned at the situation that two or more modules of different classification level are shared a more than public input sound channel.For example, may there be three input modules of sharing two inputs with two input modules.The common signal component of all three inputs also will be the common signal component of two input module inputs, and need not compensation, and will reflect in different positions by each module.More generally, the public second component that may have a common signal component of all three inputs and have only the input of two input modules, and export sound field for correct reflection and separate their influence as much as possible.Therefore, should be able to from input, deducted before the correct execution two input calculating as being embedded into the influence of the input of three in above-mentioned public incoming level common signal.In fact, when the calculating of handling lower grade, not only should be from the module incoming level of lower grade, but also should from its observation tolerance, deduct the more common signal unit of high-order to public energy level.This is different from the influence of the public incoming level of the module that is positioned at the same hierarchical grade, and the public incoming level of module can not influence the tolerance of the public energy level of adjacent block.Therefore, should consider and use and with the single order adjacent levels adjacent levels of high-order more independently.More the high-order adjacent levels is delivered to the module of hanging down classification downwards simultaneously, the common level of remaining lower grade module also should upwards be delivered in the aforesaid classification, because the action of lower grade module is similar to the common neighbours of higher level module.One tittle is separate and is difficult to solve simultaneously.Solve the resource centralized calculation in the time of for fear of the execution complexity, the value of last calculating can be passed to relevant module.The potential mutual independence that is positioned at the public incoming level of module of different classification level can be by using last value as above, or by each order from the highest ranked grade to the lowest class (that is ring) carry out and calculate and solve.Replacedly, equation solution simultaneously also is possible, although it may comprise non-usual computing cost.
Be used for the approximate correct value that sophisticated signal distributes although described reciprocation compensation technique only transmits, it is believed that they can provide improvement to the network that can not consider module interaction.
Description of drawings
Fig. 1 is a top view, it has schematically shown a kind of Utopian decode structures of testing in the arrangement, and this test arrangement is applied in ten six sound channels horizontal arrays around the room wall, is arranged in six sound channels array and single overhead sound channel in the annulus on the horizontal array.
Fig. 2 is a functional-block diagram, and it provides total view of the multiband conversion embodiment of a plurality of modules, and these module utilizations realize the central monitor work of Fig. 1 example.
Fig. 3 is the useful functional-block diagram of mode of the watch-dog 201 of understanding watch-dog such as Fig. 2 being determined the end points scaling factor.
Fig. 4 A-4C shows the functional-block diagram according to a module of one aspect of the invention.
Fig. 5 is a schematic diagram, and it shows ideal structure, the three inner output channels and the predominant directions of three input modules that the triangle of input sound channel presents.This view is to understanding the usefulness that is distributed with of advantage signal component.
Fig. 6 A and 6B are the functional-block diagrams that shows a kind of appropriate configuration respectively, this structure is used for (1) and produces the total estimated energy that is used for each input of module in response to the gross energy of each input, (2) in response to cross-correlation tolerance, produce the excessive end point energy scaling factor component that is used for each end points of module to input signal.
Fig. 7 shows among Fig. 4 C a functional-block diagram of the preferred function of " summation and/or wherein the greater " square 367.
Fig. 8 is the idealized expression of one aspect of the present invention mode of producing the scaling factor component in response to the tolerance of cross-correlation.
Fig. 9 A and 9B are a series of idealized expressions to Figure 16 A and 16B, their examples by the output scaling factor of the module that each example produced of input signal condition.
Embodiment
In order to test various aspects of the present invention, developed a kind of structure of horizontal array, on every wall in room, there are 5 loud speakers (loud speaker to be arranged in the every nook and cranny, 3 evenly spaced loud speakers are arranged) between the every nook and cranny, consider public corner loud speaker, 16 loud speakers altogether, add that vertical angles with about 45 degree place 6 loud speakers of a circle on the audience of centrally-located, add the single loud speaker that is located immediately at the audience top, 23 loud speakers altogether, add a mega bass loud speaker/LFE (low-frequency effects) sound channel, 24 loud speakers altogether, all loud speakers are all fed by a personal computer that is provided for 24 sound channel playbacks.Although according to present saying, this system can be referred to as 23.1 sound channel systems, for simplicity, it is referred to as 24 sound channel systems here.
Fig. 1 is a top view, and it has schematically shown a Utopian decode structures of testing as mentioned above in the arrangement.The input sound channel of 5 wide scopes of level is illustrated as the square 1 ', 3 ', 5 ', 9 ' and 13 ' on the cylindrical.A vertical sound channel is illustrated as the dotted line square 23 ' of center, and this sound channel may be derived by reverberation relevant or that produced by the input sound channel of 5 wide scopes, and (as shown in Figure 2) perhaps is provided separately.23 wide scope output channels are illustrated by the filled circles that respective digital 1-23 marks.16 output channels are on a horizontal plane on the cylindrical, and last 6 output channels of interior circle are 45 degree above horizontal plane.Output channels 23 is directly above one or more audiences.5 two input decoder modules are illustrated by cylindrical upward arrow 24-28, and they are connected each between the horizontal input sound channel.5 vertical decoder modules of additional two inputs are illustrated by arrow 29-33, connect vertical sound channel to each horizontal sound channel.Sound channel after upborne central authorities lean on is that output channels 21 is derived by one three input decoder module 34, and it is illustrated by the arrow between output channels 21 and input sound channel 9,13 and 23.Therefore, three input modules 34 have higher classification level than the adjacent block 27,32 and 33 of its low classification of two inputs.In this example, each module is associated with corresponding a pair of or three input sound channels that the space is the most contiguous.Each module in this example has the neighbours of three same grades at least.For example, module 25,28 and 29 is neighbours of module 24.
Although the arrangement of Fig. 1 has been used five modules (24-28) (each module has two inputs) and five inputs (1 ', 3 ', 5 ', 9 ' and 13 ') to derive the horizontal output (1-16) of 16 expression room four walls location about, but utilize three inputs of minimal amount and three modules (each has two inputs, and each module and another module are shared an input) also can obtain similar result.
By using a plurality of modules, wherein each module has all output channels (such as the example of Fig. 1 and 2) on circular arc or straight line, the ambiguity that runs in the decoding prior art decoder, and the minus correlation of wherein decoding can be avoided the direction of back with indication.
Although can by their physical location or at least their direction characterize the input and output sound channel, can advantageously utilize a matrix to characterize, because this matrix provides the signal relation of good definition.Each matrix unit (row i, row j) is the transfer function of a relevant input sound channel i to output channels j.The normally signed multiplication coefficient of each matrix unit, but also can comprise phase place or delay project (on the principle, any filter), and can be used as the function (discrete frequency, a different matrix are positioned at a frequency) of frequency.This is simple under the situation of the output application dynamic scaling factor of giving fixed matrix, but it can also be by allowing each matrix unit have an independently scaling factor, perhaps for all matrix units, have than simple scalar scaling factor scaling factor in greater detail, causing itself is variable matrix, wherein all matrixes itself are variable, for example are variable delays.
In matrix unit, there is certain flexibility in the mapping physical position; Say on the principle, the embodiment of each side of the present invention can handle input sound channel of mapping to any amount of output channels, vice versa, but the quadratic sum of the signal of the most general situation is a hypothesis only is mapped to nearest output channels via simple scaling factor preservation energy is 1.0.This mapping realizes by the sin/cos move function usually.
For example, two input sound channels point-blank and three inner output channels add two with the corresponding to end points output channels of input position (promptly, the M:N module, here M be 2 and N be 5), someone may suppose the radian (sine or cosine become 1 scope from 0, and perhaps vice versa) of time interval 90 degree, is 22.5 degree thereby each sound channel is/4 interval=each parts of 90 degree, below provided the sound channel matrix coefficient of (cosine (angle), sinusoidal (angle)):
Lout?coeffs=cos(0),sin(0)=(1,0)
MidLout?coeffs=cos(22.5),sin(22.5)=(.92,.38)
Cout?coeffs=cos(45),sin(45)=(.71,.71)
MidRout?coeffs=cos(67.5,sin(67.5)=(.38,.92)
Rout?coeffs=cos(90),sin(90)=(0,1)
Therefore, having fixed coefficient and be subjected to the situation of the variable gain that scaling factor controls at each matrix output for a matrix, is (" SF " be the specific output that identifies of subscript scaling factor) here in the signal output of wherein each output channels of five output channels:
Lout=Lt(SF L)
MidLout=((.92)Lt+(.38)Rt))(SF MidL)
Cout=((.45)Lt+(.45)Rt))(SF c)
MidRout=((.38)Lt+(.92)Lt))(SF MidR)
Rout=Rt(SF R)
Generally speaking, an array of given input sound channel conceptively may add the nearest input with straight line, and this array is represented potential decoder module.(why they be " potential ", is because if there is not any output channels that need derive from a module, then do not need this module).For typical arrangement, can derive any output channels on line between two input sound channels (if source and send the sound channel coplane from one two input module, any one source will appear in maximum two input sound channels so, in this case, will be unfavorable when using) more than two input sound channels.With an identical output channels of input sound channel position is an end points sound channel, perhaps is the output channels of a more than module.With input sound channel not on a line or the identical output channels (for example, being positioned at three formed leg-of-mutton inside of input sound channel or outside) in position need a module that has more than two inputs.
When common signal takies more than two input sound channels, the decoder module that has more than two inputs will be useful.For example, when source sound channel and input sound channel not on a face: in the time of on the source sound channel may be mapped to more than two input sound channel, this situation may take place.When shining upon 24 sound channels (1 vertical sound channel adds LEF for the annulus sound channel of 16 levels, the annulus sound channel of 6 risings) to 6.1 sound channels (comprising a compound vertical sound channel), this situation may take place in the example of Fig. 1.In these cases, the middle back sound channel in the annulus of rising is not on the straight line between two source sound channels, and therefore it need one three input module extract this sound channel in the middle of Ls (13), Rs (9) and the formed triangle of top (23) sound channel.A kind of sound channel of shining upon rising is that the sound channel of each rising of mapping is to more than on two the input sound channel to the method for horizontal array.This permission is mapped to 5.1 traditional sound channel arrays with 24 sound channels that Fig. 1 exemplified.Under interchangeable situation, a plurality of three input modules can extract the sound channel of each rising, and remaining signal component can be handled to extract the main horizontal circular ring of sound channel by two input modules.
Generally speaking, there is no need to check all possible combination of signal versatility between the input sound channel.Utilize the sound channel array (for example, the sound channel of expression horizontal array direction) of level, be enough to carry out in couples the similitude comparison of space adjacent channels usually.For the sound channel in lid or the arrangement of ball surface, the signal versatility can expand to three or more sound channels.Use and detection signal versatility can also be used to transmit additional signals information.For example, all five full scale sound channels of five-sound channel array that can be by the vertical signal component being mapped to a level are represented this signal component.
Judge relevant which input sound channel combination for versatility and be used for analyzing, and the I/O mapping matrix of judging acquiescence, what each I/O sound channel transducer or transducer function structure need be finished at every turn is equipped with switches or transducer function.Passive " master " matrix is derived in " initial mapping " (before processing), and this matrix carries out relevant with the dimensional orientation of sound channel the I/O channel configuration.As a kind of replacement, processor of the present invention or processing unit can generate time dependent scaling factor, scaling factor of each output channels, this scaling factor are revised otherwise will be become simple, the output signal level of passive matrix or the matrix coefficient of himself.Scaling factor successively from (a) advantage as described below, (b) evenly expansion (fillings) and (c) remaining in the combination of (end points) signal component derive.
A principal matrix helps disposing such as further being described hereinafter in the modular structure shown in the example of Fig. 1 and in conjunction with Fig. 2.By checking principal matrix, for example may reason out, need what decoder module, how to connect these decoder module, what input and output sound channels are each decoder module have, and the matrix coefficient relevant with the input and output of each module.These coefficients can obtain from principal matrix; Only need nonzero value, unless an input sound channel still is an input sound channel (that is end points sound channel) simultaneously.
Each module preferably has " this locality " matrix, and this matrix is a part that can be applicable to the principal matrix of particular module.Under the situation of multi-module structure, example such as Fig. 1 and 2, the scaling factor (or matrix coefficient) that this module can be used local matrix to produce to be used to control principal matrix, as described hereinafter in conjunction with Fig. 2 and Fig. 4 A-4C, perhaps use local matrix to produce a subclass of output signal, output signal is handled by central authorities and is collected such as watch-dog described in conjunction with Figure 2.Under latter event, this watch-dog compensates a plurality of versions of same output signal in the mode of the watch-dog 201 of a kind of Fig. 2 of being similar to, this signal is produced by each module with public output signal, and the watch-dog of Fig. 2 is determined the preliminary scaling factor of a final scaling factor to replace each module to be produced, and these modules are that identical output channels produces preliminary scaling factor.
Under the situation of a plurality of modules that produce scaling factor rather than output signal, these modules can obtain relevant it self matrix information from principal matrix continuously via watch-dog, rather than have a local matrix.But,, so only need less computing cost if module has its local matrix.Under the situation of single, standalone module, this module has a local matrix, and it is the matrix (from using, this this locality matrix is exactly a principal matrix) of unique needs, and should be used to produce output signal by this locality matrix.
Unless indicate, otherwise the substitute mode that referrer module is produced scaling factor is described the embodiment of the invention with a plurality of modules.
Any the decoder module output channels with nonzero coefficient (this coefficient is 1.0, because the coefficient quadratic sum will be 1.0) in the local matrix of module is an end points sound channel.The output channels that has more than a nonzero coefficient is inner output channels.Consider a kind of simple example.If output channels O1 derives out with I2 (but they have different coefficient values) from input sound channel I1 with O2, so needs are connected the 2-input module that produces output O1 and O2 between I1 and the I2, also may be connected between other modules.In a more complicated example, if having 5 inputs and 16 outputs, one of them decoder module will have input I1 and I2, and feed output O1 and O2, so that:
O1=AI1+BI2+0I3+0I4+0I5
(attention does not have the contribution from input sound channel I3, I4 or I5), and
O2=CI1+DI2+0I3+0I4+0I5
(attention does not have the contribution from input sound channel I3, I4 or I5),
So, decoder may have two inputs (I1 and I2), two outputs, and the scaling factor relevant with them is as follows:
O1=AI1+BI2, and
O2=CI1+DI2
No matter be principal matrix or local matrix, under the situation of single, standalone module, may have the matrix unit that is used to provide except multiplication.For example, as mentioned above, matrix unit may comprise a filter function, and such as phase place or postpone, and/or one is the filter of frequency function.A matrix that example is a pure delay of the filtering that may be employed, this matrix can reproduce mirage projection acoustic image.In fact, may with this principal matrix or local matrix be divided into two functions, one of them application factor output channels of deriving, another then uses a filter function.
Functional-block diagram of Fig. 2, it provides total view of realizing the multiband conversion embodiment of example among Fig. 1.For example, the pcm audio input has a plurality of audio signal sound channels that interweave, it is applied to the watch-dog or the management function 201 (being called " watch-dog 201 " down) that include deinterleaver, this deinterleaver recovers the independent stream of each sound channel in entrained six the audio signal sound channels (1 ', 3 ', 5 ', 9 ', 13 ' and 23 ') of the input that interweaves, and use each separately stream to time domain to frequency domain transform or transforming function transformation function (descending to be called " forward transform ").Replacedly, can in independent stream, receive audio track, in this case, not need deinterleaver.
As mentioned above, conversion of signals according to the present invention can be applicable to each frequency band of broadband signal or multiband processor, this processor can be used a bank of filters, such as discrete critical band bank of filters or have bank of filters with the band structure of the decoder compatibility that is associated, perhaps use mapped structure, such as FFT (fast Fourier transform) or MDCT (discrete cosine transform of modification) linear filter group.Fig. 2,4A-4C and other figure are that background is described with the multiband mapped structure.
For simplicity, Fig. 1,2 and other not shown be optionally LFE input sound channel (among Fig. 1 and 2 potential the 7th input sound channel) and output channels (among Fig. 1 and 2 potential the 24th output channels).Usually can handle the LFE sound channel in the mode identical, but this sound channel has its scaling factor, and be fixed as " 1 " with other input and output sound channels, and the matrix coefficient that also has it, also be fixed as " 1 ".But (for example do not have under the situation that the LFE output channels has in the source sound channel, 2:5.1 last audio mixing), can be applied to sound channel summation or avoid low pass filter that the sound channel addition eliminates (for example by using one, the 5th rank Butterworth filter, it has the 120Hz corner frequency) and obtain a LFE sound channel, and can use sound channel compensation of phase and.Have a LFE sound channel in input, and output does not have under the situation of this sound channel, the LFE sound channel can be added to one or more output channels.
Continue to describe Fig. 2, module 24-34 receives some suitable inputs in six inputs 1 ', 3 ', 5 ', 9 ', 13 ' and 23 ' in mode shown in Figure 1.As shown in Figure 1, each module produces a preliminary scaling factor (" PSF ") output for each the audio frequency output channels that is associated with it.Therefore, for example, module 24 receives input 1 ' and 3 ' and produce preliminary scaling factor and export PSF1, PSF2 and PSF3.Replacedly, as mentioned above, each module can produce one group of preliminary audio frequency output for each the audio frequency output channels that is associated with it.Each module can also be communicated by letter with watch-dog 201, as described further below.The information that watch-dog 201 sends to each module can comprise adjacent levels information and high-order adjacent levels information more, if any.The information that sends to watch-dog from each module can comprise inner estimated energy altogether.Can consider the part of all modules as the control signal generation part of Fig. 2 whole system.
A watch-dog such as the watch-dog 201 of Fig. 2, can be carried out a large amount of difference in functionalitys.Watch-dog for example can determine whether using the module more than, if do not use, watch-dog does not then need to carry out any and the adjacent levels function associated.In initialization procedure, watch-dog can be notified quantity, the matrix coefficient relevant with them and the signals sampling speed of this module or each module input and output that it is had.As above described, watch-dog can read each piece of the PCM sampling that interweaves, and these pieces that deinterleave are independent sound channel.It can also be for example the additional information of and limited degree limited in response to indication source signal amplitude, in time domain, use unrestricted action.If system works with the multiband pattern, then watch-dog (for example can be used windowing and bank of filters, FFT, MDCT etc.) stream giving each sound channel (so that a plurality of module is not carried out redundant conversion, this conversion has increased processing expenditure in fact) and transmit transformed value is used for handling for each module.Each module sends back the scaling factor of a two-dimensional array to watch-dog: the scaling factor of whole conversion receivers of each subband that is used for each output channels is (when in the multiband mapped structure, otherwise scaling factor of an output channels), or, alternatively, the output signal of a two-dimensional array: overall (when the time, otherwise output signal of each output channels) of complex transformation receiver that is used for each subband of each output channels at the multiband mapped structure.Watch-dog is scaling factor and they are applied to signal path matrix (matrix 203, as described below) to produce (in the multiband mapped structure) output channels complex spectrum smoothly.Replacedly, when module produced output signal, watch-dog can be derived output channels (the output channels complex spectrum is in a multiband mapped structure), produced the local matrix of identical output signal with compensation.Then, watch-dog can be carried out inverse transformation for each output channels and add windowing and overlapping interpolation under the situation of MDCT, interweave the output sampling to form a compound multichannel output stream (perhaps, alternatively, can omit and interweave, so that a plurality of output streams are provided), and send this output and flow to an output file, audio cards or other final destination.
Although can carry out various functions by a watch-dog or a plurality of watch-dog, as described here, but those of ordinary skill in the art will be understood that, can in module itself, carry out various functions or these whole functions, rather than by all or part of module public watch-dog carry out.For example, if having only an independent module, then between functions of modules and monitor functions, there is not any difference.Although under the situation of multimode, a common monitoring device can be by eliminating or reducing redundant Processing tasks and reduce required whole processing energy, eliminating the common monitoring device or simplifying this watch-dog to allow each module easily to be added mutually, for example, to be upgraded to more output channels.
Get back to the description of Fig. 2, six inputs 1 ', 3 ', 5 ', 9 ', 13 ' and 23 ' can also be applied to a variable matrix or variable matrix function 203 (being called " matrix 203 " down).Can consider the part of matrix 203 as the signal path of Fig. 2 system.Matrix 203 also from watch-dog 201 receive one group of final scaling factor SF1 to SF23 as input, these scaling factors are used for each sound channel of 23 output channels of Fig. 1 example.Can consider of the output of final scaling factor as the control signal parts of Fig. 2 system.As described further below, watch-dog 201 preferably will be used for the preliminary scaling factor of each " inside " output channels, send matrix to as final scaling factor, but this watch-dog in response to it from the information that all modules received, for each end points output channels is determined final scaling factor.The centre of two or more " end points " output channels that " inside " output channels is each module.Replacedly, if all modules produce output signal rather than scaling factor, then do not need matrix 203; Watch-dog itself produces this output signal.
In the example of Fig. 1, suppose that the end points output channels is consistent with the input sound channel position, although they are not necessarily consistent, further describe as other places.Therefore, output channels 2,4,6-8,10-12,14-16,17,18,19,20,21 and 22 are inner output channels.Go in the middle of inner output channels 21 is positioned at or by three input sound channels (input sound channel 9 ', 13 ' and 23 ') frame, and other inner sound channels each in two input sound channels centres (between or frame go into).Since between module or middle these end points output channels of sharing (promptly, output channels 1,3,5,9,13 and 23) have a plurality of preliminary scaling factors, so watch-dog 201 is determined scaling factor SF1 final end points scaling factor (SF1, SF3 or the like) in the SF23.Final inside output scaling factor (SF2, SF4, SF6 or the like) is identical with preliminary scaling factor.
Fig. 3 is a functional-block diagram, and it helps understanding watch-dog, can determine the mode of an end points scaling factor such as the watch-dog 201 of Fig. 2.This watch-dog not to whole outputs summations of all modules of sharing an input to obtain an end points scaling factor.On the contrary, it is used for from sharing the internal energy of this input such as the overall estimation of an input of each module of input 9 ' such as combination in a combiner 301 in addition, wherein should import 9 ' and be shared by the module 26 of Fig. 2 and 27.Should and represent to export the gross energy level of the input of being stated in the inside of whole link blocks.Then, such as in combiner 303, from the level and smooth intake level of this input of any one module (in this example, being module 26 or module 27) of sharing input, deduct should and (for example, the smoother 325 of Fig. 4 B as described below or 327 output).Although may be slightly different from the level of module to module, this be because each module is all adjusted its time constant independently of one another, is enough to select in public input the level and smooth input of any one module.At combiner 303 outputs, this difference is exactly the output signal energy level of wanting in this input, does not wherein allow energy level to be lower than zero.Such as in dispenser 305, cut apart the output signal level that this is wanted by level and smooth incoming level in this input, and as in square 307, carrying out square root calculation, to obtain to be used for the final scaling factor (being SF9 in this example) of this output.Should be noted that watch-dog is that single final scaling factor has been derived in each this shared input, and no matter how many modules are shared this input.A kind of being arranged in below in conjunction with Fig. 6 A of overall estimate energy of determining to belong to the inside output of each module input is described.
Because these level are energy level (second exponent number amounts),, after cutting operation, use square root calculation so that obtain final scaling factor (scaling factor is associated with the first exponent number amount) as relative with amplitude (the first exponent number amount).The addition of inner level deducts from total incoming level all and all carries out with the implication of pure energy, and this is to be independently (incoherent) because export the inside of hypothesis disparate modules inside.If should suppose in particular cases incorrect at certain, calculate so and can produce more residual signal in input than producing signal, this may in the sound field of reproducing, cause a little spatial distortion (for example, push other adjacent inner acoustic images to this input a little), but under same case, people's ear similarly reacts probably.Transmit inner output channels scaling factor by watch-dog, such as the PSF6 of module 26 to PSF8, with as final scaling factor (they are unmodified all).For simplicity, Fig. 3 only shows and produces the final scaling factor of end points.The final scaling factor of other end points can be derived in a similar manner.
Get back to the description of Fig. 2, as mentioned above, in variable matrix 203, changeability can complicated (all coefficient is variable) or simple (coefficient changes with group, such as they being applied to inputing or outputing of fixed matrix).Produce substantially the same result although can use any method, but wherein a kind of comparatively simple method is, have been found that the fixed matrix (gain of each output is controlled by scaling factor) that the variable gain of each output is followed produces gratifying result, and use this fixed matrix among the embodiment that describes here.Although can use a variable matrix, each matrix coefficient is variable in this matrix, and wherein disadvantageously variable is too many also need more handle energy.
Watch-dog 201 was also carried out the time domain of optionally final scaling factor before they are applied to variable matrix 203 level and smooth.In the variable matrix system, output channels never " is turned off ", and arranges each coefficient to strengthen some signal and the signal of eliminating other.But, as the fixed matrix among the described embodiment of the present invention, variable gain system, but open on the contrary and turn off sound channel, simultaneously more responsive for undesired " vibration " illusion.Although the two-stage of the following stated level and smooth (for example, smoother 319/325, or the like) is arranged, this phenomenon still has and may take place.For example, when scaling factor approaches zero since only need little variation from ' little ' to ' nothings ' then from ' nothing ' to ' little ', so be transformed into zero-sum and can cause the vibration that can listen from zero conversion.
The watch-dog 201 performed optional variable time constants that smoothly preferably utilize are smoothly exported scaling factor, and this constant depends on absolute extent between the successive value of the instantaneous values of scale factor of up-to-date derivation and level and smooth scaling factor (" abs-dif ').For example, if abs-dif greater than 0.4 (certainly, smaller or equal to 1.0), then uses hardly and without any smooth operation; Use the level and smooth abs-diff value of giving between 0.2 and 0.4 of little additional amount; Be lower than 0.2 value, time constant is the inverse function of a continuous abs-diff.Although these values are not crucial, find that they have reduced the vibration artifact that can listen.Alternatively, in the multiband version of a module, scaling factor smoother time constant can also utilize frequency the same with the time, calibrates with the frequency smoother 413,415 of the following stated Fig. 4 A and 417 mode.
As mentioned above, variable matrix 203 is preferably a fixing decoding matrix, and this matrix has variable scaling factor (gain) in matrix output.Each matrix output channels may have (fixing) matrix coefficient, if existed a encoder (to replace audio mixing source sound channel directly to arrive the array of audio mixing down with different inputs, this needing have been avoided different encoders), then this coefficient may be to be used for this sound channel coding audio mixing coefficient down.For each output channels, the preferred quadratic sum of this coefficient is 1.0.In case known output channels is (relevant " master " matrix as discussed above) where, matrix coefficient is fixed; The scaling factor of controlling each sound channel output gain then is dynamic.
Calculate with the receiver level after the energy and public energy of initial number, this is further specified following, comprise that the input of the frequency domain transform receiver that is applied to Fig. 2 module 24-34 can be grouped into frequency subband by each module.Therefore, for each frequency subband, all there are a preliminary scaling factor (PSF among Fig. 2) and a final scaling factor (SF among Fig. 2).Each comprises one group of conversion receiver (group that is divided into the subband size of conversion receiver is handled by same scaling factor) the frequency domain output channels 1-23 that matrix 203 is produced.This group frequency domain transform receiver is converted to one group of PCM output channels 1-23 respectively by frequency domain to time domain conversion or transforming function transformation function 205 (being called " inverse transformation " down), and this function may be the function of watch-dog 201, but does not illustrate separately for brevity.Watch-dog 201 can interweave the PCM sound channel 1-23 that produces the single PCM output stream that interweaves to be provided or to allow the PCM output channels as independent stream.
Fig. 4 A-4C shows the functional-block diagram according to a module of one aspect of the invention.This module is from a watch-dog, such as the two or more input signal streams of watch-dog 201 receptions of Fig. 2.Each input comprises frequency domain transform receiver (bin) overall of complex values.Each input, from 1 to m, all be applied to calculating function of each receiver energy or equipment (such as being used to import 1 function or equipment 401, with the function or the equipment 403 that are used to import m), this energy is the quadratic sum of the real part of each conversion receiver and imaginary values (only show the path that is used for two inputs in order to simplify accompanying drawing, 1 and m).Also each input is applied to the function or the equipment 405 of the public energy of each receiver on the computing module input sound channel.Under the situation of FFT embodiment, can calculate this public energy (under the situation of two inputs, L and R for example are the complex conjugate of the real part of the multiplication of complex numbers of plural L receiver value and plural R receiver value) by the crossed product of getting input sample.Use the embodiment of real number value only need intersect the real number value that multiply by each input.For situation more than two inputs, can use specific intersection multiplication techniques described below, promptly, if all symbol is identical, then give positive sign of product, otherwise give its negative sign, and the calibration recently of the quantity of quantity by possible positive result and possible negative test (always have two kinds of situations: result or just all be perhaps all is to bear).
The paired calculating of public energy
For example, suppose that input sound channel has comprised a common signal X and each, incoherent signal Y and Z to A/B:
A=0.707X+Y
B=0.707X+Z
Here, scaling factor 0.707 = 0.5 The reservation that is mapped to nearest input sound channel energy is provided.
RMSEnergy ( A ) = ∫ A 2 ∂ t = A 2 ‾ = ( . 707 X + Y ) 2 ‾ = ( 0.5 X 2 + 0.707 XY + Y 2 ) ‾
= 0.5 X 2 ‾ + 0.707 XY ‾ + Y 2 ‾
Because X is uncorrelated with Y,
XY=0
So: A 2 ‾ = 0.5 X 2 ‾ + Y 2 ‾
That is, because X is uncorrelated with Y, the gross energy of input sound channel A is the energy sum of signal X and Y.
In like manner:
B 2 ‾ = 0.5 X 2 ‾ + Z 2 ‾
Because X, Y are uncorrelated with Z, the average crossed product of A and B is:
AB ‾ = 0.5 X 2 ‾
Therefore, under the output signal situation that adjacent input sound channel equality is shared by two, this sound channel can also comprise independently, uncorrelated signal, and the average crossed product of these signals equals the energy of common signal component in each sound channel.If common signal is not shared coequally, be input of its deflection, so average crossed product will be the geometric average between the component common energy among A and the B, thus, square root by sound channel amplitude ratio carries out normalization, can derive the public Energy Estimation of each sound channel.As described below, after level and smooth level, calculate actual time average.
The more public energy of high-order calculates
The public energy that has the decoder module of three or more inputs in order to derive is necessary to notify the average crossed product of all input signals.Carry out simply and handle each output signal of importing between the common signal that to distinguish every pair of input and all inputs in pairs.
For example, consider three input sound channels, i.e. A, B and C, these sound channels are made of incoherent signal W, Y, Z and common signal X:
A=X+W
B=X+Y
C=X+Z
If calculate average crossed product, all that comprise W, Y and Z combination as the same in calculate on second rank, are left X all by cancellation 3Average:
ABC ‾ = X 3 ‾
Unfortunately, if X is the time signal of zero mean, just as expected, so it cube mean value also will be zero.Unlike average X 2, it all is on the occasion of, but X for the X value of any non-zero 3Identical with the X symbol, so the contribution of positive and negative will be tending towards offsetting.Obviously, this for any odd power of X also in like manner, the odd power of X is corresponding to the module input of odd number, but the result that also can lead to errors greater than 2 even index; For example, the input of four components (X, X ,-X ,-X) will equally with (X, X, X, X) have identical product/mean value.
Can address this problem by a kind of modification of using average product technology.Before average, remove the symbol of each product by the absolute value of getting this product.The symbol of each of inspection product.If their symbols are identical, then the absolute value with product is applied to averager.In the if symbol any one is different with other symbol, then the negative value of the absolute value of average this product.Because the quantity of possible same symbol combination can be different from the quantity of possible distinct symbols combination, therefore the weighted factor that constitutes of the ratio of the quantity that will be made up by same-sign combination and distinct symbols is applied to the absolute value product of negating to compensate.For example, three input modules have dual mode to be used for identical symbol, and this dual mode comes from eight kinds of possibilities, and stay six kinds of possible modes and be used for different symbols, thereby the scaling factor that produces is 2/6=1/3.And if only when there is public signal component in whole inputs of a decoder module, this compensation just causes product integration or addition to increase with positive direction.
But on average can comparing of same order module not, they all must have identical amount.Therefore the relevant average magnitude input multiplication that comprises in the second traditional rank, on average has the quantity of energy dimension or power.Therefore, also must be modified in high-order more wants in relevant average item to have the dimension of this power.Relevant for the k rank, therefore before it is averaged, must increase each product absolute value to power 2/k.
Certainly, no matter each intake of a module how, if necessary, can be calculated in rank, it as corresponding input signal square mean value, do not need at first it to be increased to k power and reduce to the second exponent number amount then.
Get back to the description of Fig. 4 A, function that can be by separately or equipment 407,409 and the 411 conversion receiver outputs with each square are grouped into subband.Each subband for example can be similar to the critical band of people's ear.The remaining module of the embodiment of Fig. 4 A-4C is operated separately and independently on each subband.In order to simplify accompanying drawing, only show the operation on a subband.
Each subband from square 407,409 and 411 is applied to frequency smoother or frequency smooth function 413,415 and 417 (being called " frequency smoother " down) respectively.The effect of frequency smoother below will be described.Use respectively from the level and smooth subband of each frequency of frequency smoother to level and smooth " fast " smoother of time domain or smooth function 419,421 and 423 (being called " quick and smooth device " down) optionally are provided.Although be preferred, the time constant of working as the quick and smooth device approaches to produce the forward transform of input sink, and (for example, the forward transform in Fig. 1 watch-dog 201) square length can be omitted the quick and smooth device during time.Why fast these quick and smooth devices are, for " at a slow speed " variable time constant smoother or smoother function 425,427 and 429 (being called " smoother at a slow speed " down), these at a slow speed smoother receive each output of this quick and smooth device.Below provided fast and some examples of smoother time constant value at a slow speed.
Therefore, no matter be to provide quick and smooth by the proper operation of forward transform or by a quick and smooth device, all preferred two-stage is smoothly moved, wherein second, slow one-level is variable.But level and smooth independent one-level can provide makes us the acceptable result.
The preferred phase mutually synchronization in a module of the time constant of smoother at a slow speed.This may be for example by use identical control information give each at a slow speed smoother and dispose each at a slow speed smoother reply applied control information in the same manner and be achieved.Below will describe controlling the derivation of the information of smoother at a slow speed.
Preferably, the every pair of smoother all with shown in Fig. 4 A and Fig. 4 B to 419/425,421/427 and 423/429 identical mode connect one of them quick and smooth device smoother at a slow speed of feeding.The advantage of cascaded structure is, the second level can be resisted at the fast signal peak value to the weak point of input.But, by disposing these smoothers concurrently to also obtaining similar result.For example, in parallel organization, can in the logic of time constant controller, handle the opposing of the second level in the cascaded structure to short fast signal peak value.
(in the embodiment of simulation) or is equal to ground by single pole low-pass filter (" leaky integrating device ") such as RC low pass filter, and one first rank low pass filter (in the embodiment of numeral) can be realized each grade of two-stage smoother.For example, in the embodiment of a numeral, each first rank filter can be implemented as " ditetragon " filter, the second common rank iir filter, and wherein some coefficient is set to zero, so that this filter is as one first rank filter.Replacedly, two smoothers can be combined as second an independent rank ditetragon level, although if it and first (fixing) level is independent, the coefficient value that calculates second (variable) level is then comparatively simple.
Should be noted that in the embodiment of Fig. 4 A, 4B and 4C all signal levels all are represented as energy (square) level, unless owing to the root of making even needs amplitude.Application smooths to the energy level of the signal of using, and carries out smoother RMS to replace average detected and detects (the average detected smoother is fed by linear amplitude).Since the level that the signal that is applied to these smoothers is square, so these smoothers can be than the unexpected increase of response signal levels more apace of average smoother, and this is because the increase that chi square function has amplified level.
Therefore, the two-stage smoother for each subband of each input sound channel energy provide time average (at a slow speed smoother 425 provide the mean value of first sound channel and at a slow speed smoother 427 the average of m sound channel be provided) and provide mean value (providing) for each subband of the public energy of input sound channel by smoother 429 at a slow speed.
The output of the average energy of smoother (425,427,429) is applied to combiner 431,433 and 435 respectively at a slow speed, wherein (1) (for example deducts adjacent energy level (if any) from the level and smooth energy level of each input sound channel, watch-dog 201 from Fig. 2), and (2) deduct high-order adjacent energy level (if any) more (for example, from Fig. 2 watch-dog 201) at a slow speed the average energy output of smoother from each.For example, receive the adjacent energy level information that each module of importing 3 ' (Fig. 1 and 2) has two adjacent blocks and receives these two adjacent block influences of compensation.But the neither one module is the module (that is, whole modules of shared input sound channel 3 ' are two-input modules) of " more high-order " in these modules.On the contrary, module 28 (Fig. 1 and 2) is an example that has the module of the more high-order module of sharing its one of them input.Therefore, for example in module 28, receive the more adjacent levels compensation of high-order from the average energy output of the smoother at a slow speed of input 13 '.
" adjacent compensation " energy level of each subband that is used for each module input that produces is applied to a function or equipment 437, the basic orientation that the appointment of this function or these energy levels of calculation of equipments is advanced.Can the vector sum of calculated direction indication to import as weighted energy.For one two input module, this simplification becomes the L/R ratio of the input signal energy level of level and smooth input signal energy level and adjacent compensation.
For example, suppose a plane around array, in this array, provided the position of sound channel, represent to be used for two x, y coordinates under the input condition as 2-ples.The audience who supposes central authorities is positioned at (0,0).In the normalization space coordinates, left front sound channel is positioned at (1,1).Right front channels is positioned at (1,1).If left input range (Lt) be 4 and right input range (Rt) be 3, use these amplitudes as weighted factor so, the basic orientation of advancing of appointment is:
(4* (1,1)+3* (1,1))/(4+3)=(0.143,1) or the left side of being partial to the central authorities on the horizontal line that connects a left side and the right side a little.
Replacedly, in case defined a principal matrix, so just can come the representation space direction with the matrix coordinate rather than with physical coordinates.In this case, normalization quadratic sum is that 1 input range is the active matrix coordinate of this direction.In above-mentioned example, " direction " is (0.8,0.6).In other words, the basic orientation of advancing of appointment is that the normalization quadratic sum is the square root of the level and smooth intake level of adjacent compensation of 1 version.The output that square 337 produces equal number, direction in space of these output indications is as the input (being 2 in this example) that has this module.
Be applied to direction and determine the level and smooth energy level of adjacent compensation of each subband that is used for each module input of function or equipment 337, can also be applied to the function or the equipment 339 that calculate adjacent compensation cross-correlation (" adjacent compensation _ xcor ").The average public energy that square 330 also receives the module input that is used for each subband from variable smoother 329 at a slow speed is as an input, if any, compensates the output of smoother at a slow speed by high-order adjacent energy level more in combiner 335.In square 339, calculate the level and smooth public energy of the more high-order compensation that adjacent compensation cross-correlation is divided by with M root as the product of the level and smooth energy level of adjacent compensation that is used to each module input sound channel, here M is the quantity of input, so that derive the real mathematics correlation of a scope from 1.0 to-1.0.Preferably, from 0 to-1.0 value is taken as 0.Adjacent compensation _ xcor provides and has been present in the estimation that other cross-correlation in the module do not occur.
Then, will be applied to a weighting device or function 341 from the adjacent compensation _ xcor of square 339, the adjacent compensation _ xcor of the adjacent compensation direction information weighting of this equipment utilization is to produce an adjacent compensation cross-correlation of weighted direction (" weighted direction _ xcor ").The basic orientation that weighting is advanced along with appointment is left and is increased from a foveal state.In other words, unequal input range (and therefore energy is also unequal) has caused the proportional increase of weighted direction _ xcor.Weighted direction _ xcor provides an estimation of acoustic image compression.Therefore, under the situation of two input modules that for example has left L and right R input, weighting is along with this direction is left central inclined left or right and increase (that is, weighting is all identical for the same degree of offset from center) in any direction.For example, under the situation of one two input module, adjacent compensation _ xcor value is come weighting by a L/R or R/L ratio, so that non-homogeneous signal distributions promotes weighted direction _ xcor trend 1.0.For one two such input module,
As R during more than or equal to L,
Weighted direction _ xcor=(1-((the adjacent compensation of 1-_ xcor) * (L/R))
And
As R during less than L,
Weighted direction _ xcor=(1-((the adjacent compensation of 1-_ xcor) * (R/L))
For the module that has more than two inputs, according to adjacent weighting _ xcor calculated direction weighting _ xcor needs for example, at first " on average " tolerance replacement ratio L/R or R/L by between 1.0 and 0, changing.For example,, come the normalization incoming signal level by total input power, and produced normalized incoming level in order to calculate the average tolerance of any amount input, this level energy (square) and detection be 1.0.The similar normalized incoming level that each normalized incoming level is positioned at the signal of array central authorities is divided by.Minimum ratio becomes will become average tolerance.So for example, for three input modules, the level of one of them input is zero, therefore average tolerance is zero, and weighted direction _ xcor equals 1.(in this case, signal is positioned at the border of three input modules, on a line in the middle of its two input, and one two input module (classification is lower) judgement specifies the basic orientation of advancing to be positioned at which position on this line, and should the output signal expansion is how wide along this line.)
Get back to the description of Fig. 4 B, what further pass through to function or equipment 443 should be used for weighting weighted direction _ xcor producing one " effectively _ xcor ", and this equipment is used " at random _ xcor " weighting.Effectively _ xcor provides an estimation of input signal distribution shape.
At random _ the average crossed product of the input range that xcor is divided by by the square root of average intake.By the hypothesis output channels is original module input sound channel, can calculate at random _ value of xcor, and calculate the xcor value that is produced by all these sound channels, and these sound channels have independently but signal that level equates, and quilt is descended audio mixing passively.According to this solution route, for the example of three output modules that have two inputs, at random _ xcor is calculated as 0.333, for five output modules that have two inputs (three inner output), at random _ xcor is calculated as 0.483.For each module, only need to calculate at random _ the xcor value is once.Although have been found that these at random _ the xcor value provides gratifying result, these values are not crucial, and can also use other value from the judgement of system designer.At random _ variable effect in the xcor value line of demarcation between the two states of signal distributing system operation, as described below.This marginal exact position is not crucial.
Can consider with function or equipment 343 performed at random _ the xcor weighting is as a kind of weighted direction of normalization again _ xcor value so that obtain one effectively _ xcor:
Effectively _ xcor=(weighted direction _ xcor-at random _ xcor)/(1-at random _ xcor), if weighted direction _ xcor more than or equal at random _ xcor;
Otherwise, effectively _ xcor=0
Be lower than 1.0 along with weighted direction _ xcor reduces to, at random _ the xcor weighting quickened the minimizing of weighted direction _ xcor, with convenient weighted direction _ xcor equal at random _ during xcor, effectively _ value of xcor equals 0.Because each output expression of a module is along the direction of a circular arc or a line, minus effectively _ the xcor value is considered to equal zero.
From non-adjacent compensation at a slow speed with the energy of quick and smooth input sound channel and derive with the public energy of quick and smooth input sound channel at a slow speed and be used to control the information of smoother 325,327 at a slow speed and 329.Particularly, function or equipment 345 are in response to the energy of quick and smooth input sound channel and the public energy of quick and smooth input sound channel, to calculate the cross-correlation of a quick non-adjacent compensation.Function or equipment 347 calculate a quick non-adjacent compensation direction (ratio or vector are described content in conjunction with square 337 and discussed as above-mentioned) in response to quick and smooth input sound channel energy.Function or equipment 349 are in response to the energy of this level and smooth at a slow speed input sound channel and the level and smooth public energy of input sound channel at a slow speed, to calculate a non-adjacent at a slow speed compensation cross-correlation.Function or equipment 351 calculate a non-adjacent at a slow speed compensation direction (ratio or vector are as above discussed) in response to the energy of level and smooth input sound channel at a slow speed.This quick non-adjacent compensation cross-correlation, quick non-adjacent compensation direction, non-adjacent compensation cross-correlation and non-adjacent at a slow speed compensation direction and from the weighted direction _ xcor of square 341 at a slow speed, all be applied to equipment or function 353, this equipment or function 353 are provided for controlling variable smoother at a slow speed 325,327 and 329 to adjust its time constant information of (being called " adjustment time constant " down).Preferably, use identical control information and give each variable smoother at a slow speed.Do not resemble the time constant of feeding and select other quantity of square frame, these quantity compare faster and slow measure, service orientation weighting _ xcor and preferably not with reference to any quick value, if so that the absolute value of weighted direction _ xcor is greater than a thresholding, it may cause adjusting time constant 353 to select a time constant more fast so.Below will set forth the principle of " modulating time constant " 353 operations.
Generally in a dynamic audio frequency system, all expect to use as much as possible the time constant at a slow speed that is positioned at a quiescent value to minimize listened to the interruption of reproducing sound field, remove in the non-audio signals " new events " taken place, in this case, the expectation control signal promptly changes to a new quiescent value, remains on this value then and takes place up to another " new events ".Typically, audio frequency processing system has utilized one " new events " to make changes in amplitude equate.But when handling crossed product or cross-correlation, new events and amplitude always do not equate: new events may cause the minimizing of cross-correlation.By detecting the variation in the parameter relevant with module operation, promptly measure cross-correlation and direction, the time constant of module can quicken and suppose a kind of new state of a control as required apace.
The consequence of incorrect dynamic behaviour comprises drift, vibration (sound channel is opened apace and turned off), pulsation (level naturally change), and in multiband embodiment, also includes and pipes on (on the basis of frequency band one by one vibration and pulse).In these influences some are particularly crucial for the quality of isolating sound channel.
Grid such as the embodiment application decoder module of Fig. 1 and 2.A kind of structure has like this produced the dynamic problem of two classes: module is middle and inside modules is dynamic.In addition, each all needs its dynamic behaviour optimization to the method for a plurality of realization Audio Processing (for example, the broadband, use the multiband of FFT or MDCT linear filter group, or different bank of filters, critical band or other).
Basic decoding processing depends on the tolerance of input signal energy ratio and the tolerance of input signal cross-correlation in each module, (specifically, weighted direction relevant (weighted direction _ xcor), as mentioned above; The output of square 341 among Fig. 4 B), these tolerance signal distributions in the control module output of coming together.This base quantity of deriving needs level and smooth, and this smoothly needs to calculate the time weighted average of this tittle instantaneous value in time domain.Suitable big of the scope of required time constant: change for the fast transient in the signal condition, this time constant is very short (for example, 1 millisecond), arrive for relevant smaller value, time constant is very long again (for example, 150 milliseconds), transient change might be significantly greater than real mean value here.
A kind of general method that realizes the behavior of variable time constant is to use " acceleration " diode in simulation.When instantaneous level exceeded threshold amount of average level, diode had produced short constant effective time.A shortcoming of this technology is that the transient peak in other stable state input may cause the great variety in the smooth level; then; this peak value is decay very lentamente, and so that not increasing the weight of naturally of independent peak to be provided, this peak value will have listened to result seldom.
The correlation computations of describing in conjunction with the embodiment of Fig. 4 A-4C makes the use of quickening diode (or be equal to use DSP) go wrong.For example, all smoothers in the particular module preferably have constant lock in time, so that their smooth level is comparable.So, preferred a kind of overall situation (group) time constant switching mechanism.In addition, the quick variation of signal condition must not be associated with the increase of public energy level.For using an acceleration diode, this level might produce relevant estimation biasing, coarse.Therefore, the embodiment of each side of the present invention preferably uses two-stage smoothly and not to need an acceleration that is equal to diode.The estimation of relevant and direction can be at least derived so that partial time constant to be set according to first and second grades of these smoothers.
To smoother (for example, 319/325), the first order, fixing afterbody, time constant can be set to a fixed value, such as 1 millisecond for each.The second level, variable level at a slow speed, time constant can be for example in 10 milliseconds (fast), 30 milliseconds (median) and select between 150 milliseconds (at a slow speed).Provide gratifying result although have been found that these time constants, their value is not crucial, it seems from system designer, can also use other value.In addition, time constant value in the second level can change continuously rather than be discrete.The select time constant can be not only based on above-mentioned signal condition, but also can be based on the hysteresis mechanism of a kind of use " mark fast ", in case this mechanism is used to guarantee to run into real quick conversion, system will remain on quick mode, with the time constant in the middle of avoiding using, until signal condition restarts at a slow speed till the time constant.This may help to guarantee to be quickly adaptive to new signal condition.
Select to use any can the realization in three kinds of possible second level time constants according to following " the adjustment time constant " 353 of passing in principle that is used for two input situations:
If the absolute value of weighted direction _ xcor less than first reference value (for example, 0.5) and quick non-adjacent compensation _ xcor and non-adjacent at a slow speed compensation _ xcor between absolute difference less than same first reference value, and fast and at a slow speed the absolute difference between the direction ratio (scope of each ratio be+1 to-1) less than same first reference value, then use second level time constant at a slow speed, and be true with quick flag settings, to start the selection of ensuing interlude constant.
Otherwise, if be labeled as true fast, absolute difference between quick and non-adjacent at a slow speed compensation _ xcor greater than same first reference value and less than second reference value (for example, 0.75), absolute difference between the quick and instantaneous at a slow speed L/R ratio is greater than first reference value and less than second reference value, and the absolute value of weighted direction _ xcor is then selected middle second level time constant greater than first reference value and less than second reference value.
Otherwise, use quick second level time constant, and be provided with and be labeled as vacation fast, forbid the use of ensuing interlude constant, up to having selected time constant at a slow speed once more.
In other words, when all three conditions all are during less than first reference value, just select time constant at a slow speed, when all conditions between first reference value and second reference value and formerly condition be at a slow speed during time constant, time constant in the middle of just selecting, when any one condition during, select quick time constant all greater than second reference value.
Produced gratifying result although had been found that mentioned above principle and reference value, but these are not crucial, at system designer, can also use the modification of these principles or other principle, wherein mentioned above principle has been considered fast and cross-correlation and fast and at a slow speed direction at a slow speed.In addition, can carry out other change.For example, it can be simplified, and is effectively equal for the processing of using diode to quicken type still, still has group's operation, if so that any one smoother in the module is in quick mode, every other smoother also switches to quick mode.Also may expect for time constant determine and signal distributions for have independent smoother, be used for the smoother that time constant determines and maintain the regular time constant, and have only the signal distributions time constant to change.
Even because in quick mode, level and smooth signal level also needs a plurality of milliseconds to mate, time delay can be built in the system to allow control signal to mate at the application controls signal before signal path.In a broadband embodiment, this delay can be embodied as delay (for example, 5 milliseconds) discrete in the signal path.In multiband (conversion) version, this delay is the natural result that square is handled, if before the signal path matrixing of this square square is analyzed, may not need clear and definite delay.
The multiband embodiment of each side of the present invention can equally with broadband release use identical time constant and principle, unless sampling rate that can smoother (for example is set to the signal sampling speed of being divided by by the square size, square speed), so that suitably be adjusted at employed coefficient in these smoothers.
In multiband embodiment,, preferably time constant and frequency are calibrated on the contrary for the frequency that is lower than 400Hz.In broadband release, not having independent smoother at the different frequency place is impossible, therefore, as partly compensation, can use a mitigation and be with logical/preemphasis filter to arrive the control path to input signal, to increase the weight of middle and last intermediate frequency.This filter for example can have the two poles of the earth high pass characteristic, and its corner frequency is 200Hz, adds 2 utmost point low-pass characteristic, its corner frequency is 8000Hz, add a preemphasis network, this network application rises to the 6dB of 800Hz and another 6dB that rises to 3200Hz from 1600Hz from 400Hz.Although it is suitable having been found that this filter, filter characteristic is not crucial, it seems the parameter that can also use other with regard to system designer.
Except time domain was level and smooth, it is level and smooth that the multiband version of each side of the present invention is preferably gone back applying frequency domain, as above in conjunction with Fig. 4 A described (frequency smoother 413,415 and 417).For each square, can utilize the sliding frequency window to come the non-adjacent compensation energy level of equalization, so that before energy level being applied to ensuing aforesaid time domain processing, it is adjusted near 1/3-octave (critical band) bandwidth.Because the bank of filters based on conversion has linear frequency resolution inherently, the width of this window (size of conversion coefficient) increases along with increasing frequency, and locates this width in low frequency (being lower than about 400Hz) usually and have only a transformation series SerComm.So the total time domain that smoothly depends on the low frequency place more that is applied to that multiband handles is level and smooth, and, might more be necessary time response fast sometimes here at the frequency domain smoothing at upper frequency place.
Get back to the description of Fig. 4 C, can be by calculating " advantage " scaling factor component respectively, the equipment or the function 455 of " filling " scaling factor component and " excessive end point energy " scaling factor component, 457 and 459, and each normalization device or normalization device function 361,363 and 365, with the advantage of getting and fill the scaling factor component and/or the additive combination of filling and excessive end point energy scaling factor component in the combination of peaked equipment or function 367 produce preliminary scaling factor (in Fig. 2, be shown " PSF "), these preliminary scaling factors have finally influenced advantage/filling/end point signal and have distributed.If this module is a plurality of modules one of them, can send this preliminary scaling factor to watch-dog, such as the watch-dog 201 of Fig. 2.Each preliminary scaling factor can have from 0 to 1 scope.
Advantage scaling factor component
Except effectively _ xcor, the information that equipment or function 355 (" calculating advantage scaling factor component ") receive adjacent compensation direction information and receive about local matrix coefficient from local matrix 369 from square 337, so that it can determine N output channels (quantity of N=input here) recently, these output channels can be applied to weighted sum producing the basic orientation coordinate that advances of appointment, and this equipment can be used " advantage " scaling factor component and gives output channels to produce the advantage coordinate.If the basic orientation of advancing of appointment is consistent with outbound course by chance, square 355 is output as a scaling factor component so, otherwise, this output is and to be employed in the proper ratio to the appointment bracketed a plurality of scaling factor components of basic orientation (each input quantity of each subband has a scaling factor component) that advance, so that meaning pan or mapping advantage signal with regard to saving power arrive correct virtual positions (promptly, for N=2, the advantage sound channel scaling factor component of two appointments should quadratic sum be effectively _ xcor).
For one two input module, therefore all output channels, exist a kind of natural ordering (from " left side " to " right side ") all on a line or circular arc, it is evident that quite sound channel is all adjacent one another are.For above-mentioned supposed situation, as shown in, module has two input sound channels and five output channels that have the sin/cos coefficient, the basic orientation of advancing that can suppose appointment is (0.8,0.6), this direction be positioned in the middle of left ML sound channel (.92 .38) and central C sound channel (.71 .71) between.Can realize that more than the L coefficient is greater than the coordinate of the basic orientation L that advances of appointment by finding two continuous sound channels here, and the sound channel on its right has a L coefficient less than advantage L coordinate.
With regard to the meaning of firm power, give two these advantage scaling factor components of nearest channel allocation.In order to finish distribution, answer a system of two equations and two the unknowns, these the unknowns are advance corresponding scaling factor components (for SFL and SFR answer these equations) of basic orientation (SFR) of the advantage component scaling factor component of sound channel on predominant direction (SFL) left side and appointment.
First advantage coordinate=SFL* L channel matrix value 1+SFR* R channel matrix value 1
Second advantage coordinate=SFL* L channel matrix value 2+SFR* R channel matrix value 2
Notice that a left side and R channel refer to the bracketed sound channel of the basic orientation of advancing to appointment, rather than the L of module and R input sound channel.
Answer is that the advantage level reciprocal of each sound channel calculates, and is 1.0 by the normalization quadratic sum, and as advantage distribution scaling factor component (SFL, SFR), each component is used for other sound channels.In other words, be the signal of C, D for coordinate, the absolute value that the advantage value reciprocal of the output channels that coefficient is A, B is AD-BC.For numerical example according to following consideration:
Antidom (ML sound channel)=abs (.92*.6-.38*.8)=.248
Antidom (C sound channel)=abs (.71*.6-.71*.8)=.142
(" abs " expression here takes absolute value)
Two numerals of the normalization latter are 1.0 to have produced value .8678 and .4969 respectively to quadratic sum.Therefore, switch these and be worth relative sound channel, advantage scaling factor component is: (noticing that before weighted direction the value of advantage scaling factor is the square root of effective _ xcor):
ML dom sf=.4969*sqrt (effectively _ xcor)
C dom sf=.8678*sqrt (effectively _ xcor)
(advantage signal more approaches Cout than MidLout).
If the basic orientation of advancing of appointment is accurately pointed to two selected sound channels one of them by chance, as the advantage scaling factor component of other sound channels, can understand sound channel of use better by normalized antidom component so by what consider to be taken place.The coefficient of supposing a sound channel is that [A, B] and the coefficient of other sound channels are [C, D], and the basic orientation coordinate that advances of appointment is [A, B] (pointing to first sound channel), so:
Antidom (first sound channel)=abs (AB-BA)
Antidom (second sound channel)=abs (CB-DA)
Notice that an antidom value is zero.When two antidom signals were normalized into quadratic sum and are 1.0, the 2nd antidom value also was 1.0.When switching, just as expected, first sound channel receives one 1.0 advantage scaling factor component (the inferior square root of effective _ xcor) and second sound channel reception 0.0.
When this solution is extended to the module that has more than two inputs, no longer there is a kind of natural ordering, this ordering is when these sound channels are positioned on a line or the circular arc and take place.For example, the square 337 of Fig. 4 B calculates the basic orientation coordinate that advances of appointment once more by utilizing input range after adjacent compensation, and the normalization coordinate to become quadratic sum be 1.Then, for example the square 455 of Fig. 4 B is discerned N nearest sound channels (quantity of N=input here), and these sound channels can be applied to a weighted sum to produce the coordinate of advantage.(notice: if these coordinates are (x, y, z) space coordinatess, can calculate so distance or degree of closeness with as coordinate difference square and.) therefore since must these sound channels of weighted sum to produce the basic orientation of advancing of this appointment, therefore can not always pick up the individual nearest sound channel of N.
For example, suppose a triangle of sound channel: as the Ls among Fig. 5, Rs and Top one three input module of feeding.Suppose existence together near three inner output channels of this triangular base, the local matrix coefficient of the module of these sound channels is respectively [.71 .69 .01], [.70 .70 .01] and [.69 .71 .01].The basic orientation of advancing of supposing appointment is a shade below leg-of-mutton central authorities, and its coordinate is [.6 .6 .53].(note: the coordinate in the middle of the triangle is [.5 .5 .707].) be those three inner sound channels that are positioned at triangle bottom plate to these three nearest sound channels of specifying the basic orientation of advancing, but they are not sued for peace to and use the advantage coordinate of scaling factor between 0 and 1, therefore from two of bottom and top end track selecting with this advantage signal that distributes, and solved three equations of three weighted factors, calculate and proceed to and fill and end points calculating so that finish this advantage.
In the example of Fig. 1 and 2, only there is one three input module, and uses this module unique inner sound channel of deriving, this has simplified calculating.
Fill the scaling factor component
Except effectively _ xcor, equipment or function 357 (" calculate and fill the scaling factor component ") also receive at random _ xcor, weighted direction _ xcor from square 341, " EQUIAMPL " (" EQUIAMPL " to give a definition and illustrate), and from the information (at identical filling scaling factor component shall not be applied to the situation of all outputs under, as below in conjunction with Figure 14 B illustrated) of local matrix reception about local matrix coefficient.The output of square 457 is the scaling factor components that are used for each module output (each subband).
As mentioned above, when weighted direction _ xcor be less than or equal at random _ during xcor, effectively _ xcor is zero.When weighted direction _ xcor more than or equal at random _ during xcor, the filling scaling factor component that is used for whole output channels is:
(1-is * EQUIAMPL effectively _ xcor) to fill scaling factor component=sqrt
Therefore, when weighted direction _ xcor=at random _ during xcor, effectively _ xcor is 0, therefore (1-is 1.0 effectively _ xcor), equals EQUIAMPL (having guaranteed power output=input power under this condition) thereby fill the amplitude calibration factor component.This point is exactly to fill the maximum that the scaling factor component is reached.
As weighting _ xcor during less than xcor at random, one or more advantage scaling factor components are 0, and when weighted direction _ xcor near zero the time, fill the scaling factor component and also be reduced to zero:
(weighted direction _ xcor/ at random _ xcor) to fill scaling factor component=sqrt *EQUIAMPL
Therefore, at boundary, when weighted direction _ xcor=at random _ during xcor, the preliminary scaling factor component of filling equals EQUIAMPL once more, with for weighted direction _ xcor greater than at random _ situation of xcor, guaranteed continuity with above-mentioned equation result.
What be associated with each decoder module is not only at random _ value of xcor, and be a value of " EQUIAMPL ", if these signals are distributed with being equal to, so that when keeping this power, this value is the values of scale factor that all scaling factors should have, that is:
EQUIAMPL=square root (quantity of the quantity of decoder module input sound channel/decoder module output channels)
For example, for two input modules with three outputs:
EQUIAMPL=sqrt(2/3)=.8165
Here " sqrt () " refers to " square root of () "
For two input modules with 4 outputs:
EQUIAMPL=sqrt(2/4)=.7071
For two input modules with 5 outputs:
EQUIAMPL=sqrt(2/5)=.6325
Provide gratifying result although have been found that these EQUIAMPL values, these values are not crucial, it seems the value that can also use other with regard to system designer.With respect to being used for the level of " advantage " state (maximal correlation of input signal) with the output channels of " all end points " state (minimum of input signal is relevant), for " filling " state (centre of input signal relevant), the variable effect in the EQUIAMPL value level of output channels.
End points scaling factor component
Except adjacent compensation _ xcor (from the square 439 of Fig. 4 B), equipment or function 359 (" calculating excessive end point energy scaling factor component ") receive the level and smooth non-adjacent compensation energy (from square 325 and 327) of each first to m input, alternatively, also from local matrix receive relevant local matrix coefficient information (one of end points output of module or two all and one import under the inconsistent situation, the excessive end point energy of this module application is given two outputs, the direction that the direction of these two outputs approaches to import most, as discussed below).If these directions are consistent with input direction, the output of square 359 is scaling factor components that are used for each end points output so, otherwise two scaling factor components, the one-component that wherein is used for each output approaches this end points most, as following explanation.
But the excessive end point energy scaling factor component that square 359 is produced is not unique " end points " scaling factor component.The end points scaling factor component (under the situation of single, standalone module is two scaling factor components) that has three other sources:
At first, in the preliminary scaling factor of a specific modules calculated, these end points were possible candidates for the advantage signal scaling factor component of square 335 (with normalization device 361).
Secondly, in " fillings " of the square 357 (with normalization device 363) of Fig. 4 C calculates, with these end points as the processing of coming together of possible filling candidate and all inside sound channel.Can use any non-zero padding scaling factor component and give all output, even be applied to end points and the output of selected advantage.
At last, if there is a grid of a plurality of modules, that watch-dog (such as the watch-dog in Fig. 3 example 201) is carried out is final, the 4th, the distribution of " end points " sound channel, as above in conjunction with as described in Fig. 2 and 3.
For square 459 calculates " excessive end point energy " scaling factor component according to adjacent compensation _ xcor, the gross energy of all inner outputs is reflected back to the input of module, contributed the energy (at the internal energy of input " n ") of how many inner outputs to estimate each input, and use this energy calculate with one the input corresponding to each module output place (that is end points) excessive end point energy scaling factor component.
Also need to reflect internal energy and get back to input, so that watch-dog is provided, such as the watch-dog 201 needed information of Fig. 2, to calculate the adjacent levels and the adjacent levels of high-order more.A kind of calculating has been shown in the contribution of each module input internal energy and the method for determining excessive end points scaling factor component for each end points output in Fig. 6 A and 6B.
Fig. 6 A and 6B are functional-block diagrams, it shows respectively in such as any one module among Fig. 2 module 24-34, a kind of proper arrangement be used for (1) in response to each input promptly from 1 to m gross energy, for module each the input promptly from 1 to m, produce total estimation internal energy, and (2) are in response to adjacent compensation _ xcor (referring to Fig. 4 B, the output of square 439), for each module end points produces an excessive end point energy scaling factor component.Watch-dog need be used for total estimation internal energy of each input of a module, and (Fig. 6 A) in the structure of a multimode, and under any circumstance, produces excessive end point energy scaling factor component by module itself.
Scaling factor component and other information that use is derived in the square 455 and 457 of Fig. 4 C, total estimated energy is located in each inner output of the Structure Calculation of Fig. 6 A (rather than the output of its end points).The inside output energy level that use to calculate, it multiply by each output level and the relevant matrix coefficient [" m " individual input, " m " individual multiplier] of output that arrives each input, with the Energy distribution of this input of being provided to this output.For each input, it is sued for peace to obtain total internal energy contribution of this input to whole contribute energy of all inside output channels.This total internal energy contribution of each input is reported to watch-dog, and use it to come to calculate excessive end point energy scaling factor component for each end points output by module.
At length with reference to figure 6A, the level and smooth gross energy level (being preferably non-adjacent compensation) that is used for each module input is applied to one group of multiplier, and a multiplier is used for each inner input of module.For reduced representation, Fig. 6 A shows two inputs, i.e. " 1 " and " m " and two inner outputs " X " and " Z ".The level and smooth gross energy level that is used for each module input is multiplied by (the local matrix of module) matrix coefficient, this coefficient relevant with the specific input of one of them inside modules output (the attention matrix coefficient is its oneself a inverse, because the matrix coefficient quadratic sum is 1).Above multiplying is all carried out in each combination for input and inner output.Therefore, as shown in Figure 6A, can be with (for example at input 1 level and smooth gross energy level, can be in the output of the smoother at a slow speed 425 of Fig. 4 B and obtain) be applied to a multiplier 601, this multiplier with this energy level multiply by one with the relevant matrix coefficient of input 1 inside output X, so that the output energy level component X of a calibration to be provided at output X 1In like manner, multiplier 603,605 and 607 provides the energy level component X of calibration m, Z 1And Z m
According to the mode of adjacent compensation _ xcor with amplitude/power, in combiner 611 and 613 to being used for energy level component (X of each inner output 1And X mZ 1And Z m) sue for peace.If to a combiner respectively import homophase, this is that 1.0 adjacent weighting cross-correlation is indicated the linear amplitude addition of these inputs so by value.If respectively import uncorrelatedly, this is that zero adjacent weighting cross-correlation is indicated the energy level addition of these inputs so by value.If cross-correlation is between 1 and 0, summation is exactly partly to sue for peace for amplitude and partly is the power summation so.For the input to each combiner is correctly sued for peace, calculating amplitude simultaneously summation and power summation, and respectively by adjacent compensation _ xcor and (the adjacent weighting of 1-_ xcor) to they weightings.In order to obtain weighted sum, the square root of getting power summation to be to obtain an amplitude that is equal to, and perhaps square this linear amplitude summation is with before square this weighted sum and obtain its power level.For example, adopt latter's method (weighted sum of power), if amplitude leyel is 3 and 4, adjacent weighting _ xcor is, the amplitude summation then be 3+4=7, perhaps power level be 49 and the power energy to sue for peace be 9+16=25.Therefore weighted sum is 0.7*49+ (1-0.7) * 25=41.8 (power energy level), the root of perhaps making even, promptly 6.47.
In multiplier 613 and 615, product (X will sue for peace 1+ X mZ 1+ Z m) multiply by the scaling factor component that is used for each output X and Z, produce total energy level to export, and energy level can be designated X ' and Z ' in each inside.Obtain to be used for the scaling factor component of each inner output from square 467 (Fig. 4 C).Notice that " excessive end point energy scaling factor component " from square 459 (Fig. 4 C) can not influence inner output, and also be not included in the performed calculating of Fig. 6 A structure.
By each energy level being multiply by (module local matrix) matrix coefficient, can be with total energy level in each inner output, X ' and Z ' reflect back into each input of module, and this coefficient is relevant with the specific output that each module is imported.Above-mentioned multiplication is all carried out in each combination for inside output and input.Therefore, as shown in Figure 6A, the total energy level X ' that is applied in inner output X gives a multiplier 617, this multiplier with energy level multiply by one to import the relevant matrix coefficient of 1 inside output X (this coefficient is identical with its inverse, as mentioned above), so that the energy level component X of a calibration to be provided in input 1 1'.
Should be noted that when the second rank value,,, add temporary, then need one second rank weighting such as a matrix coefficient by a first-order value such as total energy level X '.This equates the square root that utilizes energy and obtain an amplitude, this amplitude be multiply by this matrix coefficient and square multiplied result to obtain an energy value.
In like manner, multiplier 619,621 and 623 provides the energy level X of calibration m', Z 1' and Z m'.The energy component relevant with each output sued for peace in combiner 625 and 627 according to adjacent compensation _ xcor by the mode with amplitude/power, and be as above described in conjunction with combiner 611 and 613.Combiner 625 and 627 output represent to be used to import 1 and total estimation internal energy of m respectively.Under the situation of a multimode grid, send this information to watch-dog, such as the watch-dog 201 of Fig. 2, so that this watch-dog can calculate adjacent levels.Watch-dog is from whole total internal energy contribution of whole each input of module request of being connected with this input, it notifies each module then, for its each input, it is from whole other modules that are connected with this input which other whole total internal energy contribution sum is arranged.The result is the adjacent levels that is used for this module input.Below will further describe the adjacent levels information that produces.
Total estimation internal energy that this module also needs each input 1 and m to be contributed is used for the excessive end point energy scaling factor component that each end points is exported with calculating.Fig. 6 B shows can calculate for how many scaling factor component information.For reduced representation, only show and calculate the scaling factor component information that is used for an end points, should be appreciated that for each end points output and can carry out similar calculating.In combiner or composite function 629, promptly deduct an input for the level and smooth total input energy of input 1 in this example from being used for same input, such as input 1 total estimation internal energy of being contributed (for example in the output of the smoother at a slow speed 425 of Fig. 4 B, obtain the identical level and smooth gross energy level of input 1, and it is applied to a multiplier 601).Level and smooth gross energy level by same input 1 in divider or division function 631 is divided by the result of subtraction.In square-root computer or square root function 633, get the square root of phase division result.Should be noted that the computing of divider or division function 631 (and other dividers described here) should comprise zero mother's a test.In this case, can the merchant be set to zero.
If have only single a, separate modular, therefore, determine that by relying on advantage, filling and excessive end point energy scaling factor determine the preliminary scaling factor component of end points.
So, comprise that whole output channels of end points have distributed all scaling factors, and one of them sound channel can continue to use them to carry out the signal path matrixing.But, if there is the grid of a multimode, each module has distributed an end points scaling factor to give each input of the module of feeding, therefore have each input more than one module that is connected with this input and have a plurality of scaling factors and distribute, one of them scaling factor is from each module that is connected.In this case, that watch-dog (such as the watch-dog 201 of Fig. 2 example) is carried out is final, the 4th, the distribution of " end points " sound channel, as above in conjunction with as described in Fig. 2 and 3.This watch-dog is determined final end points scaling factor, and this final scaling factor has covered all scaling factors as the end points scaling factor that each module is distributed.
In the arrangement of reality, the uncertain output channels direction that promptly in fact has a kind of corresponding to endpoint location is although this situation often takes place.If there is no any physical endpoint sound channel, but have at least one the physics sound channel surmount this end points, this end point energy will be moved near the physics sound channel of end points so, just look like it be an advantage signal component.In horizontal array, this be two near the sound channel of endpoint location, they preferably use a constant energy distribute (two scaling factor quadratic sums are 1.0).In other words, when audio direction does not correspond to the position of actual sound channel, even this direction is an end point signal, it will preferably be moved to available a pair of actual sound channel recently so, because if sound slowly moves, this end point signal will jump to another output channels from an output channels suddenly so.Therefore, when not having the end points sound channel of any physics, it is unsuitable near the sound channel of endpoint location that an end point signal is moved to one, unless there is not any physics sound channel that surmounts this end points, in this case, there are not other to select except moving to a sound channel near endpoint location.
Realize that this mobile another kind of method is for watch-dog, watch-dog 201 such as Fig. 2 is that each input also has a corresponding output channels to produce " final " scaling factor (promptly according to a kind of hypothesis, each corresponding input and output all is corresponding to, their identical positions of expression).Then, if there is no actual direct output channels corresponding to an input sound channel, an output matrix such as the variable matrix 203 of Fig. 2, can be mapped to one or more suitable output channels with an output channels so.
That as mentioned above, uses each " calculate scaling factor component " equipment or function 455,457 and 459 exports to each normalization equipment or function 461,463 and 465.So only want these normalization devices to be because, the scaling factor component that square 455,457 and 459 is calculated all is based on adjacent compensation level, and final signal path matrixing is (in principal matrix, under the situation of a plurality of modules, or in local matrix, under the situation of separate modular) comprise non-adjacent compensation level (input signal that is applied to matrix is without adjacent compensation).Typically, reduced the value of scaling factor component by the normalization device.
A kind ofly realize that the appropriate method of normalization device is as described below.Each normalization device (as from combiner 331 and 333) receives the level and smooth intake of the adjacent compensation that is used for each module input, (as from square 325 and 327) receives the level and smooth intake of the non-adjacent compensation that is used for each module input, receive local matrix coefficient information from local matrix, and each output of square 355,357 and 359.Each normalization device is that each output channels calculates an output of wanting, and is the output level of a reality of each output channels calculating, supposes that here scaling factor is 1.Then, by being the output of wanting of actual output level that each output channels calculated divided by the calculating that is used for each output channels, and the square root of getting the merchant to be providing a potential preliminary scaling factor, it is applied to " summation and/or wherein the greater " 367.Consider following example.
The level and smooth non-adjacent compensation intake level of supposing two input modules is 6 and 8, and corresponding adjacent compensation energy level is 3 and 4.Also suppose the matrix coefficient of central interior output channels=(.71 .71), or its square: (0.5,0.5).If being this sound channel (based on adjacent compensation level), module selects an initial scaling factor 0.5, or its square=0.25, the output level of wanting of this sound channel is (suppose for simplicity, pure energy summation and use adjacent compensation level) so:
.25*(3*.5+4*.5)=0875。
Because actual incoming level is 6 and 8, if the scaling factor of above-mentioned value 0.25 (it square) is used to final signal path matrixing, output level will be so:
.25*(6*.5+8*.5)=1.75
The output level 0.875 that replacement is wanted.When using non-adjacent compensation level, the normalization device is adjusted the output level of this scaling factor to obtain wanting.
SF=1 is supposed in actual output ,=(6*.5+8*.5)=7.
(output level of wanting)/(actual output hypothesis SF=1)=0.875/7.0=0.125=square final scaling factor
Be used for the final scaling factor=sqrt (0.125)=0.354 of this output channels, replaced initial calculation value 0.5.
" summation and/or wherein the maximum " 367 be preferably to the corresponding filling scaling factor component and the summation of end points scaling factor component of each output channels of being used for each subband, and select to be used for each subband each output channels advantage and fill bigger one of scaling factor component.The preferred form of function " summation and/or wherein the greater " 367 can characterize as shown in Figure 7.Promptly, application advantage scaling factor component and filling scaling factor component are given equipment or function 701, this equipment or function are selected to be used for the scaling factor component of each output bigger (" wherein the greater " 701) and are used them and give additive combination device or composite function 703, and this combiner is to from wherein the scaling factor component and the excessive end point energy scaling factor summation that is used for each output of the greater 701.Replacedly, when " summation and/or wherein the maximum " 467 is: (1) summation in zone 1 and zone 2, (2) get in zone 1 and non-1 zone 2 greatlyyer one, or (3) select one and summation in zone 2 maximum in the zone 1, thereby can obtain acceptable result.
Fig. 8 is one aspect of the present invention produces the scaling factor component in response to the tolerance of cross-correlation an idealized expression mode.This figure is particularly conducive to the example to Figure 16 A and 16B with reference to figure 9A and 9B.As mentioned above, the generation of scaling factor component can be thought of as and have two zones or operating area: the first area, zone 1, this regional border is " all advantages " and " evenly filling ", wherein available scaling factor component is advantage and a combination of filling the scaling factor component, also has second area, zone 2, its border is " evenly filling " and " all end points ", and wherein available scaling factor component is a combination of filling and excessive end point energy scaling factor component.Should occur in when weighted direction _ xcor is 1 by " all advantages " boundary condition.The zone extended to weighted direction _ xcor from this border and equal at random 1 (advantage adds fillings) _ this point of xcor, i.e. " evenly filling " state." all end points " boundary condition occurs in when weighted direction _ xcor is zero.Regional 2 (filling adds end points), extend to " all end points " boundary condition from " evenly filling " boundary condition.Can consider that " evenly filling " boundary point is arranged in zone 1 or zone 2.As described below, accurate boundary point is not crucial.
Routine as shown in Figure 8, minimizing along with advantage scaling factor component value, the value of filling the scaling factor component is increasing, to arrive a null value and to arrive maximum along with advantage scaling factor component, at that point, along with the value of filling the scaling factor component reduces, the value of excessive end point energy scaling factor component is increasing.When this result uses suitable matrix to the receiver module input signal, it the time output signal distribute, when the input signal height correlation, this signal distributions provides compact acoustic image, and with the minimizing of the degree of correlation from compactness to wide expansion (widening), and continue to reduce to highly uncorrelated along with relevant, and little by little leniently separate or a plurality of acoustic images of outwardly-bent one-tenth, each acoustic image is positioned on the end points.
Although there is the acoustic image (in the appointment of the input signal basic orientation of advancing) of single space compactness in expectation for relevant fully situation, and for whole incoherent situations, the acoustic image (each acoustic image is positioned on the end points) that has a plurality of space compactnesses can be realized the spatial spread acoustic image between these extremely in the mode except shown in Fig. 8 example.For example, at random _ situation of xcor=weighted direction _ xcor, filling the scaling factor component value, to reach a maximum not crucial, perhaps linear change is not crucial yet like that as shown in the figure for the value of three scaling factor components.To the relation (and the equation that becomes the figure basis of expressing here) of Fig. 8 and the modification of other relations between suitable tolerance of cross-correlation and the values of scale factor still is that the present invention is desired, wherein these values can be given birth to compact advantage for the cross correlation measure volume production and expanded to compact end point signal and distribute with wide, with from height correlation to highly uncorrelated.For example, replace obtaining a kind of compact advantage such as aforesaid dual area method and expand to compact end point signal and distribute, can obtain these results by mathematical method, such as a kind of equation solution of using based on pseudoinverse with wide by using.
Output scaling factor example
A series of idealized expressions, as Fig. 9 A and 9B to Figure 16 A and 16B, example be used for the output scaling factor of a module of each example of status input signal.For simplicity, suppose single, independent module, so that it is final scaling factor for all scaling factors that variable matrix produces.This module and the variable matrix that is associated have two input sound channels (such as left L and right R sound channel), these two input sound channels and two end points output channels consistent (can also be the L and the R of appointment).In this serial example, there are three inner output channels (such as Lm in the left side, Rm in the middle C and the right side).
The meaning of " all advantage ", " combination advantage and filling ", " evenly filling ", " mix and fill and end points " and " all end points " will give example to the example of Figure 16 A and 16B further combined with Fig. 9 A and 9B.In every couple of figure (for example, Fig. 9 A and 9B), " A " figure is depicted as the energy levels of two inputs left L and right R, and " B " figure is depicted as and is used for five outputs, the scaling factor component of RM and right R in LM, middle C, the right side in promptly left L, the left side.These figure do not press ratio and draw.
In Fig. 9 A, the intake level equates, shown in two vertical arrows.In addition, weighted direction _ xcor (with effectively _ xcor) be 1.0 (relevant fully).In this example, only there is the scaling factor of a non-zero, is depicted as the single vertical arrows that is positioned at C, and this scaling factor is used the output to inside middle sound channel C, to produce advantage signal compact on the space as Fig. 9 B.In this example, in the middle of output is positioned at (L/R=1), therefore, consistent with the inside output channels C of centre by chance.If there is no corresponding to output channels is then used this advantage signal in the proper ratio and is given nearest output channels, so that move this advantage signal to correct virtual positions between them.For example, if there is no in the middle of output channels C, in the left side in the LM and the right side RM output channels will have the non-zero scaling factor be applied to LM and RM output so that this advantage signal is equal to.(all advantage signals) do not exist and fills and the end point signal component under relevant fully situation.Therefore, the normalization advantage scaling factor component that produced of the preliminary scaling factor and the square 361 that are produced of square 467 (Fig. 4 C) is identical.
In Figure 10 A, the intake level equates, but weighted direction _ xcor less than 1.0 and greater than at random _ xcor.Therefore, the scaling factor component is the advantage of the component-combination in the zone 1 and fills the scaling factor component.Normalization advantage scaling factor component (from square 361) and normalization are filled the middle the greater (by 367 squares) of scaling factor component (from square 363) and are applied to each output channels, so that the advantage scaling factor is arranged in the central output channels C place identical with Figure 10 B, but it is smaller, and fill scaling factor and appear at each other output channels, i.e. L, LM, RM and R (comprising end points L and R).
In Figure 11 A, the intake level keep to equate, but weighted direction _ xcor=at random _ xcor.Therefore, the scaling factor of Figure 11 is the scaling factor-even occupied state of boundary condition between the zone 1 and 2, wherein there are not advantage or end points scaling factor, only fill scaling factor and have identical value (therefore in each output, " evenly filled "), indicated as same arrow in each output.Fill the scaling factor level in this example and reach their maximum.As discussed below, can be such as using this filling scaling factor unevenly in the mode of convergent, this mode depends on status input signal.
In Figure 12 A, the intake level keep to equate, but weighted direction _ xcor less than at random _ xcor and greater than zero (zone 2).Therefore, shown in Figure 12 B, exist and fill and the end points scaling factor, but do not have the advantage scaling factor.
In Figure 13 A, the intake level keeps equating, but weighted direction _ xcor is zero.Therefore, the scaling factor as shown in Figure 13 B is the scaling factor of whole end points boundary conditions.There is not inner output scaling factor, and has only the end points scaling factor.
In the example of Figure 13 A/13B, because the energy level of two inputs is equal, so weighted direction _ xcor (producing such as the square 441 by Fig. 4 B) is identical with adjacent compensation _ xcor (producing such as the square 439 by Fig. 4 B) at Fig. 9 A/9B.But, in Figure 14 A, intake level unequal (L is less than R).Although adjacent in this example weighting _ xcor equals at random _ xcor, the scaling factor of Chan Shenging not is the filling scaling factor that evenly is applied to as in the whole sound channels as shown in Figure 11 A and the 11B example as shown in Figure 14B.On the contrary, unequal intake level has caused proportional increase among weighted direction _ xcor (advance basic orientation to leave the degree of its middle position proportional with appointment), so that this weighted direction becomes greater than adjacent compensation _ xcor, taken this to cause that scaling factor is partial to whole advantages more and is weighted (example as shown in Figure 8).This is a kind of result of expectation, because the signal of remarkable L or R weighting should not have wide bandwidth; They should have the compact bandwidth near L or R sound channel end points.As shown in Figure 14B, the output that is produced is one and is positioned at and compares non-zero advantage scaling factor (the adjacent compensation direction information that more approaches L output with R output, in this case, detent edge component LM position in a left side accurately by chance), reduce filling scaling factor amplitude, and do not needed end points scaling factor (weighted direction pushes this zone of operating Fig. 81 (advantage and filling combination)).
For five outputs corresponding to the scaling factor of Figure 14 B, output can followingly be expressed as:
Lout=Lt(SF L)
MidLout=((.92)Lt+(.38)Rt))(SF MidL)
Cout=((.45)Lt+(.45)Rt))(SF C)
MidRout=((.38)Lt+(.92)Lt))(SF MidR)
Rout=Rt(SF R)
Therefore, in the example of Figure 14 B, be used in promptly in four outputs that the scaling factor (SF) of each all equates (filling) except MidLout, so corresponding signal output is also unequal, and this is because Lt indicates greater than scaling factor greater than Rt (having caused more signal outputs left) with in the advantage output on a middle left side.Because the basic orientation of advancing of appointment is consistent with middle left output channels, so the ratio of Lr and Rt is identical with the matrix coefficient that is used for left output channels, promptly 0.92 to 0.38.Suppose that above-mentioned coefficient value is the actual margin that is used for Lt and Rt.For these output levels, these level be multiply by corresponding matrix coefficient, addition, and by each scaling factor it is calibrated:
Output amplitude (output channels _ sub_i)=sf (i) * (Lt_ coefficient (i) * Lt+Rt_ coefficient (i) * Rt)
Although preferably consider the combination (as in the calculating relevant) between amplitude and the energy addition with Fig. 6 A, in this example, cross correlation value quite high (big advantage scaling factor), can carry out common addition:
Lout=0.1*(1*0.92+0*0.38)=0.092
MidLout=0.9*(0.92*0.92+0.38*0.38)=0.900
Cout=0.1*(0.71*0.92+0.71*0.38)=0.092
MidRout=0.1*(0.38*0.92+0.92*0.38)=0.070
Rout=0.1*(0*0.92+1*0.38)=0.038
Therefore, this example shows in the output of the signal of Lout, Cout, MidRout and Rout unequal, because Lt is greater than Rt, the scaling factor that promptly is used in these outputs is identical.
Filling scaling factor can equally distribute to output channels, as shown in the example of Figure 10 B, 11B, 12B and 14B.Replacedly, fill scaling factor component rather than the same, can be by certain mode along with the position change, promptly as the function of advantage (being correlated with) and/or end points (uncorrelated) input signal component (or be equal to ground, as the function of weighted direction _ xcor value).For example, for value contour among weighted direction _ xcor, filling the scaling factor component amplitude can be convexly crooked, so as near the output channels of specifying the basic orientation of advancing can than away from sound channel receive more signal level.For weighted direction _ xcor=at random _ xcor, fill the scaling factor component amplitude and can flatten and be a kind of even distribution, for weighted direction _ xcor less than at random _ xcor, this amplitude may the bending of concavity ground, to help the sound channel near the end points direction.
In Figure 15 B and Figure 16 B, set forth each example of the filling scaling factor amplitude of these bendings.Figure 15 B identical among the output result of an input (Figure 15 A) and the above-mentioned Figure 10 A.Figure 16 B identical among the output result of an input (Figure 16 A) and the above-mentioned Figure 12 B.
Communication between module and the watch-dog
About the adjacent levels and the adjacent levels of high-order more
Each module in multi-module structure, such as the example of Fig. 1 and 2, need two kinds of mechanism so as to support it and watch-dog between communication, such as the watch-dog 201 of Fig. 2:
(a) a kind of mechanism is to select and report that the required information of watch-dog is to calculate adjacent levels and high-order adjacent levels (as if the words that have) more.The required information of watch-dog is for example produced by the structure of Fig. 6 A, belongs to total estimation internal energy of each module input.
(b) another kind of mechanism is to receive and use adjacent levels (if the words that have) and high-order adjacent levels (as if the words that have) more from watch-dog.In the example of Fig. 4 B, in each combiner 431 and 433, from the level and smooth energy level of each input, deduct this adjacent levels, in each combiner 431,433 and 435, from the level and smooth energy level of each input and the public energy on the whole sound channel, deduct more high-order adjacent levels (if the words that have).
In case watch-dog has been known the internal energy contribution of whole overall estimates of each input of each module:
(1) watch-dog determines whether that the overall estimate internal energy contribution (suing for peace according to the whole modules that are connected with this input) of each input has surpassed total available signal level of this input.If should and surpass available altogether signal level, watch-dog is calibrated back the internal energy of each report that each module of being connected with this input reports, total incoming level so that they are sued for peace so.
(2) watch-dog notice other internal energy contribution of each input with as the every other internal energy contribution of this input and (if the words that have).
More high-order (HO) adjacent levels is the adjacent levels of one or more more high-order modules, and these modules are shared the input of lower grade module.Aforementioned calculation to adjacent levels is only relevant with the module in specific input, and these modules have identical classification: three all input modules (if the words that have) are two all input modules then, or the like.The HO adjacent levels of a module be this input all more the high-order module whole adjacent levels sums (promptly, the HO adjacent levels of one two input module input be the whole the 3rd, the 4th and more the high-order module and, if have, these modules are shared the node of two input modules).In case a module has known that its HO-adjacent levels is positioned at a specific input of its input, it will deduct this level from the total input energy level of this input, and same hierarchical grade adjacent levels, to obtain the adjacent compensation level at this input node.This is shown in Fig. 4 B, wherein in combiner 431 and 433, from the output of variable smoother at a slow speed 425 and 427, deduct respectively and be used to import 1 and the adjacent levels of input m, in combiner 431,433 and 435, from the output of variable smoother at a slow speed 425,427 and 429, deduct respectively be used to import 1, more high-order adjacent levels and the public energy of input m.
A difference between adjacent levels that use is used to compensate and the HO adjacent levels is that the HO adjacent levels also is used to compensate the public energy (for example, realizing by deduct the HO adjacent levels in combiner 435) on the whole input sound channel.The basic principle of this difference is the influence that the common level of module is not subjected to the adjacent block of same classification, but it but can be subjected to the influence of more high-order module of all inputs of sharing module.
For example, suppose input sound channel Ls (left side around) and Rs (right side around) and Top, leg-of-mutton centre has inner output channels (behind the annulus of rising) between them, add between Ls and the Rs the inner output channels (behind the main horizontal circular ring) on the line, the former output channels need one three input module recover for whole three inputs public signal.Then, the output channels that is positioned between two inputs (Ls and Rs) on the line of the latter needs one two input sound channel.But, total common signal level that two input modules are observed comprises the common unit of three input modules, these common units do not belong to the latter's output channels, therefore, from the public energy of two input modules, deduct the square root of the paired product of HO adjacent levels, to determine that how many public energy are the inside sound channels (the described latter) that belong to it separately.Therefore, in Fig. 4 B, from wherein deducting the HO common level derived to produce a public energy level of adjacent compensation (from combiner 435), module uses this level to calculate (at square 439) adjacent compensation _ xcor to level and smooth public energy level (from square 429).
Can in analog circuit, realize the present invention and various aspects thereof, perhaps more likely be implemented as performed software function in the general purpose digital computer of digital signal processor, programming and/or special digital computer.Interface between the analog and digital signal stream can be realized with suitable hardware and/or as the function in software and/or the firmware.Although the present invention and various aspects thereof can comprise the analog or digital signal, in the application of reality, most or whole processing capacities all might be performed in the numeric field on digital signal streams, wherein represents audio signal by sampling.
Should be understood that for the ordinary skill in the art and will it is evident that can realize other variants and modifications to the present invention and each side thereof, the present invention is not limited to these specific embodiments described herein.Therefore, the invention is intended to cover any and whole modifications, modification or equivalent, these have all fallen in the spirit and scope of disclosed and claimed here basic principle.

Claims (32)

1, a kind of processing that is used for M audio input signal is converted to N audio output signal, each input signal and a directional correlation connection, and each output signal and a directional correlation connection, wherein N is greater than M, M be at least 2 and N for being 3 positive integer at least, this processing comprises:
A M:N variable matrix is provided,
Use a described M audio input signal and give described variable matrix,
Derive a described N audio output signal from described variable matrix, and
Control described variable matrix in response to described input signal, make when the input signal height correlation, the sound field that described output signal produced has compact acoustic image in basic orientation is advanced in the nominal of input signal, along with this acoustic image of minimizing of the degree of correlation is expanded to wide from compactness, and continue to reduce to highly uncorrelated with the degree of correlation, progressively be separated into a plurality of compact acoustic images, each with direction that an input signal is associated in.
2, processing according to claim 1, wherein said M:N variable matrix is a variable matrix with variable coefficient, or the variable matrix with fixed coefficient and variable output, wherein control described variable matrix by changing variable coefficient or changing variable output.
3, processing according to claim 1, wherein control described variable matrix in response to following tolerance:
(1) relative level of input signal and
(2) cross-correlation of input signal.
4, processing according to claim 3, wherein in order to measure cross-correlation with the input signal that is worth in first scope, the boundary of this scope is a maximum and a reference value, this sound field has compact acoustic image when cross-correlation tolerance is described maximum, it has wide expansion acoustic image when the tolerance of cross-correlation is described reference value, and in order to measure cross-correlation with the input signal that is worth in second scope, the boundary of this scope is described reference value and minimum value, this sound field has described wide expansion acoustic image when cross-correlation tolerance is described reference value, when the tolerance of cross-correlation is described minimum value, have a plurality of compact acoustic images, each is arranged in the direction that is associated with an input signal.
5, processing according to claim 4, wherein for the identical situation of energy in each output, described reference value approximately is a metric of the cross-correlation of input signal.
6, processing according to claim 3, wherein the tolerance of the relative level of input signal is in response to the level and smooth energy level of each input signal.
7, according to claim 3 or the described processing of claim 6, wherein the tolerance of the relative level of input signal is the nominal of the input signal basic orientation of advancing.
8, processing according to claim 3, wherein the tolerance of input signal cross-correlation is in response to the level and smooth public energy of input signal M root divided by the product of the level and smooth energy level of each input signal, and M is the input number here.
9, according to any one described processing in the claim 6,7 or 8, wherein by variable time constant time domain smoothly to obtain the level and smooth energy level of each input signal.
10, according to any one described processing in the claim 6,7 or 8, wherein by frequency domain smoothing and variable time constant time domain smoothly to obtain the level and smooth energy level of each input signal.
11, processing according to claim 8 is wherein by the public energy of multiplication cross input range level with the acquisition input signal.
12, processing according to claim 11, the public energy that wherein passes through the level and smooth input signal of variable time constant time domain is to obtain the level and smooth public energy of input signal.
13, processing according to claim 12, wherein by variable time constant time domain smoothly to obtain the level and smooth energy level of each input signal.
14, processing according to claim 11, wherein the public energy by frequency domain smoothing and the level and smooth input signal of variable time constant time domain is to obtain the level and smooth public energy of input signal.
15, processing according to claim 14, wherein by frequency domain smoothing and variable time constant time domain smoothly to obtain the level and smooth energy level of each input signal.
16, according to any one described processing in the claim 9,10,12,13,14 and 15, wherein said variable time constant time domain is smoothly carried out by having the level and smooth of set time constant and variable time constant.
17, according to any one described processing in the claim 9,10,12,13,14 and 15, wherein said variable time constant time domain is smoothly carried out by only having the level and smooth of variable time constant.
18, according to claim 16 or the described processing of claim 17, wherein said variable time constant is with step change.
19, according to claim 16 or the described processing of claim 17, wherein said variable time constant is a continuous variable.
20, according to claim 16 or the described processing of claim 17, wherein in response to the relative level of input signal and the tolerance of their cross-correlation, to control described variable time constant.
21, processing according to claim 6 wherein obtains the level and smooth energy level of each input signal by the energy level that comes level and smooth each input signal of variable time constant time domain with substantially the same time constant.
22, processing according to claim 3 wherein smoothly obtains the tolerance of relative level of input signal and the tolerance of their cross-correlation by variable time constant time domain, smoothly uses same time constant wherein for each.
23, processing according to claim 8, the tolerance of wherein said cross-correlation is first tolerance of input signal cross-correlation, and pass through to use the tolerance of the tolerance of input signal relative level to a weighted direction of the described first tolerance generation cross-correlation of cross-correlation, with the appended metric of acquisition cross-correlation.
24, processing according to claim 23 wherein for the identical situation of energy in each output, by using a scaling factor that approximates the cross correlation measure value of input signal greatly, can obtain another tolerance of the cross-correlation of input signal.
25, a kind of processing that is used for M audio input signal is converted to N audio output signal, each input signal and a directional correlation connection, and each output signal and a directional correlation connection, wherein N is greater than M, and M is 3 at least, and this processing comprises:
A plurality of m:n variable matrixes are provided, and m is the subclass of M and the subclass that n is N here,
Use each subclass of a described M audio input signal and give each described variable matrix,
Derive each subclass of a described N audio output signal from each described variable matrix,
In response to the subclass of the input signal that is applied to each variable matrix to control each described variable matrix, make when these input signal height correlations, the sound field that each subclass of the output signal of deriving from described matrix produces has compact acoustic image basic orientation is advanced in the appointment of the input signal subclass that is applied to this matrix, along with this acoustic image of minimizing of the degree of correlation is expanded to wide from compactness, and continue to reduce to highly uncorrelated with the degree of correlation, progressively be separated into a plurality of compact acoustic images, each with direction that an input signal that is applied to matrix is associated in, and
From the subclass of N audio frequency output channels, derive a described N audio output signal.
26, processing according to claim 25, wherein one or more other the information of influence of variable matrix that also receive same input signal in response to compensation are controlled described variable matrix.
27,, wherein comprise that from N the audio frequency output channels described N audio output signal of deriving compensation produces a plurality of variable matrixes of same output signal according to claim 25 or the described processing of claim 26.
28, according to any one described processing among the claim 25-27, wherein control each described variable matrix in response to following tolerance:
(a) use to give it input signal relative level and
(b) cross-correlation of input signal.
29, a kind of processing that is used for M audio input signal is converted to N audio output signal, each input signal and a directional correlation connection, and each output signal and a directional correlation connection, wherein N is greater than M, and M is 3 at least, and this processing comprises:
Scaling factor in response to gating matrix coefficient or gating matrix output provides a M:N variable matrix,
Use a described M audio input signal and give described variable matrix,
A plurality of m:n variable matrix scaling factor generators are provided, and m is the subclass of M and the subclass that n is N here,
Use each subclass of a described M audio input signal and give described each variable matrix scaling factor generator,
For each subclass of a described N audio output signal is derived one group of variable matrix scaling factor from described each variable matrix scaling factor generator,
Control described each variable matrix scaling factor generator in response to the subclass of the input signal that is applied to each variable matrix scaling factor generator, make when scaling factor that it produced is applied to described M:N variable matrix, the sound field that each subclass produced of the output signal that produces is when these input signal height correlations, in advancing basic orientation, the nominal of the input signal subclass that produces the scaling factor of using has compact acoustic image, minimizing along with the degree of correlation, this acoustic image is expanded to wide from compactness, and continue to reduce to highly uncorrelated with the degree of correlation, progressively be separated into a plurality of compact acoustic images, each with direction that an input signal that produces applied scaling factor is associated in, and
From described variable matrix, derive a described N audio output signal.
30, processing according to claim 29 wherein also in response to the information of the influence that compensates one or more other variable matrix scaling factor generators that receive same input signal, is controlled described variable matrix scaling factor generator.
31,, wherein comprise a plurality of variable matrix scaling factor generators that produce scaling factor for same output signal of compensation from the described variable matrix described N audio output signal of deriving according to claim 29 or the described processing of claim 30.
32,, wherein control each described variable matrix scaling factor generator in response to following tolerance according to any one described processing among the claim 29-31:
(a) relative level of its each input signal is given in application, and
(b) cross-correlation of each input signal.
CN03817877XA 2002-08-07 2003-08-06 Audio channel spatial translation Expired - Lifetime CN1672464B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US40198302P 2002-08-07 2002-08-07
US60/401,983 2002-08-07
PCT/US2003/024570 WO2004019656A2 (en) 2001-02-07 2003-08-06 Audio channel spatial translation

Publications (2)

Publication Number Publication Date
CN1672464A true CN1672464A (en) 2005-09-21
CN1672464B CN1672464B (en) 2010-07-28

Family

ID=33489220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN03817877XA Expired - Lifetime CN1672464B (en) 2002-08-07 2003-08-06 Audio channel spatial translation

Country Status (17)

Country Link
EP (1) EP1527655B1 (en)
JP (1) JP4434951B2 (en)
KR (1) KR100988293B1 (en)
CN (1) CN1672464B (en)
AT (1) ATE341923T1 (en)
AU (1) AU2003278704B2 (en)
BR (2) BR0305746A (en)
CA (1) CA2494454C (en)
DE (1) DE60308876T2 (en)
DK (1) DK1527655T3 (en)
ES (1) ES2271654T3 (en)
HK (1) HK1073963A1 (en)
IL (1) IL165941A (en)
MX (1) MXPA05001413A (en)
MY (1) MY139849A (en)
PL (1) PL373120A1 (en)
TW (1) TWI315828B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527874B (en) * 2009-04-28 2011-03-23 张勤 Dynamic sound field system
CN102273233A (en) * 2008-12-18 2011-12-07 杜比实验室特许公司 Audio channel spatial translation
CN102714039A (en) * 2010-01-22 2012-10-03 杜比实验室特许公司 Using multichannel decorrelation for improved multichannel upmixing
CN102077276B (en) * 2008-06-26 2014-04-09 法国电信公司 Spatial synthesis of multichannel audio signals
CN106604199A (en) * 2016-12-23 2017-04-26 湖南国科微电子股份有限公司 Digital audio signal matrix processing method and device
CN112562697A (en) * 2015-06-24 2021-03-26 索尼公司 Audio processing apparatus and method, and computer-readable storage medium
CN114327040A (en) * 2021-11-25 2022-04-12 歌尔股份有限公司 Vibration signal generation method, device, electronic device and storage medium

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7508947B2 (en) * 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
CN101048935B (en) 2004-10-26 2011-03-23 杜比实验室特许公司 Method and device for controlling the perceived loudness and/or the perceived spectral balance of an audio signal
EP1817938B1 (en) * 2004-11-23 2008-08-20 Koninklijke Philips Electronics N.V. A device and a method to process audio data, a computer program element and a computer-readable medium
TWI397901B (en) * 2004-12-21 2013-06-01 Dolby Lab Licensing Corp Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith
JP5461835B2 (en) 2005-05-26 2014-04-02 エルジー エレクトロニクス インコーポレイティド Audio signal encoding / decoding method and encoding / decoding device
US8494667B2 (en) 2005-06-30 2013-07-23 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
WO2007004831A1 (en) 2005-06-30 2007-01-11 Lg Electronics Inc. Method and apparatus for encoding and decoding an audio signal
US8073702B2 (en) 2005-06-30 2011-12-06 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
JP4859925B2 (en) 2005-08-30 2012-01-25 エルジー エレクトロニクス インコーポレイティド Audio signal decoding method and apparatus
WO2007055464A1 (en) 2005-08-30 2007-05-18 Lg Electronics Inc. Apparatus for encoding and decoding audio signal and method thereof
US7788107B2 (en) 2005-08-30 2010-08-31 Lg Electronics Inc. Method for decoding an audio signal
KR100880643B1 (en) 2005-08-30 2009-01-30 엘지전자 주식회사 Method and apparatus for decoding an audio signal
US7646319B2 (en) 2005-10-05 2010-01-12 Lg Electronics Inc. Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor
KR100857120B1 (en) 2005-10-05 2008-09-05 엘지전자 주식회사 Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor
US7696907B2 (en) 2005-10-05 2010-04-13 Lg Electronics Inc. Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor
WO2007040355A1 (en) 2005-10-05 2007-04-12 Lg Electronics Inc. Method and apparatus for signal processing and encoding and decoding method, and apparatus therefor
US7672379B2 (en) 2005-10-05 2010-03-02 Lg Electronics Inc. Audio signal processing, encoding, and decoding
US7751485B2 (en) 2005-10-05 2010-07-06 Lg Electronics Inc. Signal processing using pilot based coding
US7742913B2 (en) 2005-10-24 2010-06-22 Lg Electronics Inc. Removing time delays in signal paths
US7752053B2 (en) 2006-01-13 2010-07-06 Lg Electronics Inc. Audio signal processing using pilot based coding
US7965848B2 (en) 2006-03-29 2011-06-21 Dolby International Ab Reduced number of channels decoding
WO2010113434A1 (en) * 2009-03-31 2010-10-07 パナソニック株式会社 Sound reproduction system and method
JP5323210B2 (en) * 2010-09-30 2013-10-23 パナソニック株式会社 Sound reproduction apparatus and sound reproduction method
ES2683821T3 (en) * 2012-03-22 2018-09-28 Dirac Research Ab Audio precompensation controller design using a variable set of support speakers
EP2830052A1 (en) 2013-07-22 2015-01-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio decoder, audio encoder, method for providing at least four audio channel signals on the basis of an encoded representation, method for providing an encoded representation on the basis of at least four audio channel signals and computer program using a bandwidth extension
EP2830332A3 (en) 2013-07-22 2015-03-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method, signal processing unit, and computer program for mapping a plurality of input channels of an input channel configuration to output channels of an output channel configuration
US11019449B2 (en) * 2018-10-06 2021-05-25 Qualcomm Incorporated Six degrees of freedom and three degrees of freedom backward compatibility
AU2019392876B2 (en) 2018-12-07 2023-04-27 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for encoding, decoding, scene processing and other procedures related to DirAC based spatial audio coding using direct component compensation
TWI740206B (en) * 2019-09-16 2021-09-21 宏碁股份有限公司 Correction system and correction method of signal measurement

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659619A (en) * 1994-05-11 1997-08-19 Aureal Semiconductor, Inc. Three-dimensional virtual audio display employing reduced complexity imaging filters
US6009179A (en) * 1997-01-24 1999-12-28 Sony Corporation Method and apparatus for electronically embedding directional cues in two channels of sound
US6072878A (en) * 1997-09-24 2000-06-06 Sonic Solutions Multi-channel surround sound mastering and reproduction techniques that preserve spatial harmonics
AUPP271598A0 (en) * 1998-03-31 1998-04-23 Lake Dsp Pty Limited Headtracked processing for headtracked playback of audio signals
EP1054575A3 (en) * 1999-05-17 2002-09-18 Bose Corporation Directional decoding

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102077276B (en) * 2008-06-26 2014-04-09 法国电信公司 Spatial synthesis of multichannel audio signals
US10104488B2 (en) 2008-12-18 2018-10-16 Dolby Laboratories Licensing Corporation Audio channel spatial translation
CN102273233A (en) * 2008-12-18 2011-12-07 杜比实验室特许公司 Audio channel spatial translation
US11805379B2 (en) 2008-12-18 2023-10-31 Dolby Laboratories Licensing Corporation Audio channel spatial translation
CN102273233B (en) * 2008-12-18 2015-04-15 杜比实验室特许公司 Audio channel spatial translation
US11395085B2 (en) 2008-12-18 2022-07-19 Dolby Laboratories Licensing Corporation Audio channel spatial translation
US9628934B2 (en) 2008-12-18 2017-04-18 Dolby Laboratories Licensing Corporation Audio channel spatial translation
US10887715B2 (en) 2008-12-18 2021-01-05 Dolby Laboratories Licensing Corporation Audio channel spatial translation
CN104837107B (en) * 2008-12-18 2017-05-10 杜比实验室特许公司 Audio channel spatial translation
US10469970B2 (en) 2008-12-18 2019-11-05 Dolby Laboratories Licensing Corporation Audio channel spatial translation
CN101527874B (en) * 2009-04-28 2011-03-23 张勤 Dynamic sound field system
CN102714039B (en) * 2010-01-22 2014-09-10 杜比实验室特许公司 Using multichannel decorrelation for improved multichannel upmixing
US9269360B2 (en) 2010-01-22 2016-02-23 Dolby Laboratories Licensing Corporation Using multichannel decorrelation for improved multichannel upmixing
CN102714039A (en) * 2010-01-22 2012-10-03 杜比实验室特许公司 Using multichannel decorrelation for improved multichannel upmixing
CN112562697A (en) * 2015-06-24 2021-03-26 索尼公司 Audio processing apparatus and method, and computer-readable storage medium
CN106604199B (en) * 2016-12-23 2018-09-18 湖南国科微电子股份有限公司 A kind of matrix disposal method and device of digital audio and video signals
CN106604199A (en) * 2016-12-23 2017-04-26 湖南国科微电子股份有限公司 Digital audio signal matrix processing method and device
CN114327040A (en) * 2021-11-25 2022-04-12 歌尔股份有限公司 Vibration signal generation method, device, electronic device and storage medium

Also Published As

Publication number Publication date
CN1672464B (en) 2010-07-28
AU2003278704A1 (en) 2004-03-11
PL373120A1 (en) 2005-08-08
EP1527655B1 (en) 2006-10-04
CA2494454A1 (en) 2004-03-04
EP1527655A2 (en) 2005-05-04
DE60308876T2 (en) 2007-03-01
DE60308876D1 (en) 2006-11-16
BRPI0305746B1 (en) 2018-03-20
MXPA05001413A (en) 2005-06-06
ES2271654T3 (en) 2007-04-16
KR100988293B1 (en) 2010-10-18
AU2003278704B2 (en) 2009-04-23
KR20050035878A (en) 2005-04-19
JP4434951B2 (en) 2010-03-17
TW200404222A (en) 2004-03-16
ATE341923T1 (en) 2006-10-15
HK1073963A1 (en) 2005-10-21
IL165941A (en) 2010-06-30
MY139849A (en) 2009-11-30
DK1527655T3 (en) 2007-01-29
JP2005535266A (en) 2005-11-17
BR0305746A (en) 2004-12-07
CA2494454C (en) 2013-10-01
TWI315828B (en) 2009-10-11
IL165941A0 (en) 2006-01-15

Similar Documents

Publication Publication Date Title
CN1672464A (en) Audio channel spatial translation
CN1250045C (en) Vehicle audio reproduction device
CN1116785C (en) Multichannel active matrix sound reproduction with maximum lateral separation
CN1605225A (en) Method and apparatus to create a sound field
CN1116737C (en) User adjustable volume control that accommodates hearing
CN1402952A (en) Method and apparatus to direct sound
CN1281098C (en) Surround-sound processing system
CN1756446A (en) Audio signal processing apparatus and method
CN1926607A (en) Multichannel audio coding
CN1275498C (en) Audio channel translation
RU2667630C2 (en) Device for audio processing and method therefor
CN1402956A (en) Acoustic correction apparatus
AU2023203570A1 (en) Sound processing device and method, and program
CN1342386A (en) Low-frequency audio enhancement system
CN1728892A (en) Sound-field correcting apparatus and method therefor
CN1277180C (en) Apparatus and method for adapting audio signal
CN1747608A (en) Audio signal processing apparatus and method
CN101040564A (en) Audio signal processing device and audio signal processing method
CN1754403A (en) Sound beam loudspeaker system
CN1257639A (en) Audiochannel mixing
CN1495705A (en) Multichannel vocoder
CN1701634A (en) Spectacle hearing aid
CN1468029A (en) Sound image control system
CN1650528A (en) Multi-channel downmixing device
CN1838235A (en) Apparatus and method for reproducing sound by dividing sound field into non-reduction region and reduction region

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20100728