SE516467C2

SE516467C2 - Method and apparatus for convolution

Info

Publication number: SE516467C2
Application number: SE0001909A
Authority: SE
Inventors: Lars Philipson
Original assignee: Philipson Lars H G
Priority date: 2000-05-19
Filing date: 2000-05-19
Publication date: 2002-01-15
Also published as: SE0001909D0; WO2001090927A1; SE0001909L; AU5898901A

Description

20 25 30 516 46.7 2. 20 25 30 516 46.7 2.

TEKNIKENS STÅNDPUNKT Den matematiska deﬁnitionen av faltning för den aktuella tillämpningen framgår av Formel 1 nedan. ya) = lhx 0 där x utgör indata, h utgör faltningskärna (ﬁlter) och y utgör utdata.PRIOR ART The mathematical definition of folding for the current application is shown in Formula 1 below. ya) = lhx 0 where x is input, h is the fold core (ﬁ lter) and y is output.

Modern Signalbehandling bygger på Fouriertransform av aktuella sig- naler i tidsplanet till transformerade signaler i frekvensplanet. Signalbehand- lingen sker därefter i frekvensplanet. Den faltning som ska utföras i tidsplanet motsvaras av multiplikation ifrekvensplanet. Eftersom multiplikation är en enklare operation, har tidigare signalbehandling även i detta sammanhang implementerats genom multiplikation i snabb hårdvara.Modern Signal Processing is based on Fourier transform of current signals in the time plane to transformed signals in the frequency plane. The signal processing then takes place in the frequency plane. The convolution to be performed in the schedule corresponds to multiplication in the frequency schedule. Since multiplication is a simpler operation, previous signal processing has also in this context been implemented through multiplication in fast hardware.

Det är därvid också nödvändigt att transformera från tids- till frekvens- planet och tillbaka igen. För dessa operationer ﬁnns algoritmer som lämpar sig särskilt väl för datorer, s.k. Fast Fourier Transform (F FT), och dess disk- reta motsvarighet, med inverser. Sedan länge ﬁnns det kommersiellt tillgäng- liga processorer skräddarsydda för dessa beräkningar, s.k. digitala signalpro- cessorer (DSP).It is then also necessary to transform from the time to the frequency plane and back again. For these operations there are algorithms that are particularly suitable for computers, so-called Fast Fourier Transform (F FT), and its discrete equivalent, with inverses. For a long time, commercially available processors have been tailored for these calculations, so-called digital signal processors (DSP).

Ett problem med lösningar i frekvensplanet är att beräkningstiden för FFT växer obehagligt med längden på ﬁltret (N*logN). Vid faltningar på 100.000 punkter är denna metod ogynnsam mätt i mängden hårdvara jämfört mot att utföra beräkningarna i tidsplanet. En bidragande orsak är att beräk- ningen av FFT kräver räkning med ﬂyttal, medan alla beräkningar i tidsplanet kan utföras med heltal. En svårighet med faltning i tidsplanet är att det inte ﬁnns någon känd metod att kontinuerligt utföra sådana beräkningar i realtid. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 3.A problem with solutions in the frequency plane is that the calculation time for FFT grows unpleasantly with the length of the filter (N * logN). For folds of 100,000 points, this method is unfavorably measured in the amount of hardware compared to performing the calculations in the schedule. A contributing reason is that the calculation of FFT requires calculation with ﬂ integers, while all calculations in the schedule can be performed with integers. One difficulty with scheduling is that there is no known method to continuously perform such calculations in real time. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 3.

Det ﬁnns inte heller någon känd anordning, med vars hjälp beräkningarna skulle kunna utföras i realtid.There is also no known device with which the calculations could be performed in real time.

UPPFlNNlNGEN I SAMMANFATTNING Ett syfte med uppﬁnningen är att åstadkomma en metod för faltning av digitala signaler. Detta syfte uppnås genom att uppﬁnningen erhållit de i pa- tentkraven 1 respektive 6 angivna särdragen. Uppﬁnningen löser problemet att utföra långa faltningar av ljud i realtid med rimlig mängd hårdvara.THE INVENTION IN SUMMARY An object of the invention is to provide a method for folding digital signals. This object is achieved by the invention having obtained the features specified in claims 1 and 6, respectively. The invention solves the problem of performing long convulsions of sound in real time with a reasonable amount of hardware.

De grundläggande operationerna vid faltning utförs enligt uppﬁnningen parallellt på ett effektivt sätt. Ljudsampel från en insignal och termer, eller ﬁlterkoefﬁcienter, från impulssvaret lagras i register. Varje ljudsampel och ﬂlterkoefﬁcient multipliceras separat och parallellt med varandra. Därefter adderas produkterna.According to the invention, the basic folding operations are carried out in parallel in an efficient manner. Audio samples from an input signal and terms, or efltercoefficients, from the impulse response are stored in registers. Each sound sample and ko lter coefficient is multiplied separately and in parallel with each other. Then the products are added.

Additionerna av produkterna kan ske i ett så kallat adderarträd, varvid ingående termer först adderas parvis. Summorna adderas på nytt parvis i en upprepad sekvens tills en slutsumma är beräknad. Till följd av den kommuta- tiva lagen för addition (ordningsföljden är oväsentlig), så ger detta förfarande precis samma resultat som om de ursprungliga talen hade adderats i tur och ordning.The additions of the products can take place in a so-called adder tree, where the included terms are first added in pairs. The sums are added again in pairs in a repeated sequence until a final sum is calculated. As a result of the Commutative Law of Addition (the order is immaterial), this procedure gives exactly the same result as if the original numbers had been added in turn.

En nyckelfråga för att åstadkomma en effektiv beräkning är hur de data som ingår i impulssvaret ska behandlas. Ett impulssvar kan omfatta i storleksordningen 100 000 sampel. För att undvika problem med massiv in- sats av hårdvara delas enligt uppﬁnnlngen impulssvaret upp i segment. Erfor- derlig hårdvara minskas därigenom dramatiskt genom att den kan användas i ett förfarande med tidsmultiplexering.A key question for achieving an efficient calculation is how the data included in the impulse response is to be processed. An impulse response can comprise in the order of 100,000 samples. In order to avoid problems with massive use of hardware, the impulse response is divided into segments according to the uptake. The required hardware is thereby dramatically reduced by the fact that it can be used in a process with time multiplexing.

Vidare utnyttjas varje segment av impulssvaret effektivt genom att falt- ningsoperationer utförs med segmentet tillsammans med ﬂera sampel från insignalen. De beräknade resultaten summeras i associerade positioner i en utdatabuffert, från vilken utdata matas ut.Furthermore, each segment of the impulse response is used efficiently by performing folding operations with the segment together with ﬂ your samples from the input signal. The calculated results are summed in associated positions in an output buffer, from which the output is output.

Ytterligare fördelar och särdrag hos uppﬁnningen framgår av efterföl- jande beskrivning, ritningar och osjälvständiga patentkrav. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 4.Additional advantages and features of the invention are set forth in the following description, drawings, and dependent claims. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 4.

KORT BESKRIVNING AV RITNINGARNA Uppﬁnningen ska nu närmare beskrivas med hjälp av utföringsexem- pel under hänvisning till bifogade ritningar, på vilka FIG 1 Fig. 2 schematiskt visar en implementering för diskret faltning i tidsplanet, schematiskt visar ett utförande av en implementering för diskret falt- ning i tidsplanet i enlighet med uppﬁnningen, Fig. 3 visar hur register i utförandet enligt Fig. 2 samverkar under olika fa- ser av faltningen, Fig. 4 Fig. 5 visar hur innehåll i register ändras under olika faser av faltningen, schematiskt visar en implementering av en utdatabuffert, som an- vänds i utförandet enligt Fig. 2, och Fig. 6 visar funktionen hos en beräkningsenhet i utförandet enligt Fig. 2.BRIEF DESCRIPTION OF THE DRAWINGS The invention will now be described in more detail with the aid of exemplary embodiments with reference to the accompanying drawings, in which Fig. 1 Fig. 2 schematically shows an implementation for discrete folding in the schedule, schematically shows an execution of an implementation for discrete folding in the schedule in accordance with the invention, Fig. 3 shows how registers in the embodiment according to Fig. 2 interact during different phases of the folding, Fig. 4 Fig. 5 shows how the contents of registers change during different phases of the folding, schematically shows an implementation of an output buffer, which is used in the embodiment according to Fig. 2, and Fig. 6 shows the function of a calculation unit in the embodiment according to Fig. 2.

UPPFINNINGEN Diskret faltning i tidsplanet kan ske i enlighet med Formel 2 nedan. Ett praktiskt exempel på implementering visas i Fig. 1. y(t) = É h(v)x(t - v) (2) där x utgör indata, h utgör faltningskärna (ﬁlter) och y utgör utdata.THE INVENTION Discrete folding in the schedule can take place in accordance with Formula 2 below. A practical example of implementation is shown in Fig. 1. y (t) = É h (v) x (t - v) (2) where x constitutes input data, h constitutes the folding core (ﬁ lter) and y constitutes output.

De grundläggande operationerna i faltning kan parallelliseras mycket effektivt. l ett första skiftregister 10 leds indatasampel in. Ett motsvarande första ﬁlterregister 18 innehåller impulssvaret. Ljudsampel och impulssvar lagras i registren 10 och 18, hos vilka varje värde har egen direktutgång. För varje par av ljudsampel och ﬁlterkoefﬁcient är anordnad en separat enhet för p1000179 psdoc; ver 4 10 15 20 25 30 516 467 5 ß multiplikation, dvs alla multiplikationerna utförs parallellt. Fig. 1 visar samtliga enheter för multiplikation sammanförda i en multiplikationsenhet 12. Alla re- sultaten av dessa multiplikationer skall sedan adderas, och även detta kan utföras i ett enda steg i en additionsenhet 13.The basic operations in folding can be parallelized very effectively. In a first shift register 10, input samples are entered. A corresponding first alter register 18 contains the impulse response. Sound samples and impulse responses are stored in registers 10 and 18, in which each value has its own direct output. A separate unit for p1000179 psdoc is provided for each pair of audio sample and kolterkoef ﬁ cient; ver 4 10 15 20 25 30 516 467 5 ß multiplication, ie all the multiplications are performed in parallel. Fig. 1 shows all the units for multiplication combined in one multiplication unit 12. All the results of these multiplications must then be added, and also this can be performed in a single step in an addition unit 13.

Ett effektivt sätt att organisera additionen är att först addera de ingåen- de talen parvis. Då får man hälften så många tal och kan sedan addera des- sa parvis på liknande sätt. Efter ett fåtal steg har man ett enda tal som resul- tat. Därvid används ett så kallat adderarträd. Till följd av den kommutativa lagen för addition (ordningsföljden är oväsentlig) så vet man att detta är pre- cis samma resultat som om man hade adderat de ursprungliga talen parallellt eller i tur och ordning. Eftersom de parvisa additionerna i varje steg utförs parallellt så blir totaltiden densamma som för en, dvs den totala beräknings- tiden för hela adderarträdet blir proportionell mot zlogN i stället för mot N, där N är det totala antalet bitar hos de ingående talen.An effective way to organize the addition is to first add the included numbers in pairs. Then you get half as many numbers and can then add these in pairs in a similar way. After a few steps, you have a single number that results. A so-called adder tree is used. As a result of the commutative law of addition (the order is insignificant), it is known that this is exactly the same result as if one had added the original numbers in parallel or in turn. Since the pairwise additions in each step are performed in parallel, the total time becomes the same as for one, ie the total calculation time for the whole adder tree becomes proportional to zlogN instead of to N, where N is the total number of bits of the input numbers.

Utan ytterligare åtgärder kan fördröjningen i hela beräkningskedjan än- då bli för lång i förhållande till en viss klockcykeltid. Detta problem kan lösas genom att sätta in ett antal register på vägen och dela upp beräkningen på ett antal klockcykler. På så sätt kan klockfrekvensen öka och ändå ge ut ett resultat per klockcykel (pipelining).Without further measures, the delay in the entire calculation chain can still be too long in relation to a certain clock cycle time. This problem can be solved by inserting a number of registers on the road and dividing the calculation into a number of clock cycles. In this way, the clock frequency can increase and still give a result per clock cycle (pipelining).

Det är möjligt att utföra multiplikationen i ett enda (kombinatoriskt) steg. Därvid kommer den att begränsa klockfrekvensen och det kan räcka att lägga ett register före och ett efter multiplikatorn samt ett inuti och ett efter adderarträdet. Om en ännu snabbare lösning önskas, måste även multipli- kationen delas upp på ﬂera pipelinesteg.It is possible to perform the multiplication in a single (combinatorial) step. In doing so, it will limit the clock frequency and it may be sufficient to place a register before and after the multiplier and one inside and one after the adder tree. If an even faster solution is desired, the multiplication must also be divided into ﬂ your pipeline steps.

I det följande resonemanget är utgångspunkten ett exempel med data- takten 50kHz för ljudet, klockfrekvensen 50 MHz för den digitala elektroniken (1000 ggr snabbare) och en efterklangstid om 2 sekunder samt 16 bitars data både för ljud och impulssvar. Ett impulssvar på 100000 sampel används. Ge- nom att använda en så hög klockfrekvens blir det möjligt att dela upp proble- met i segment och tidsmultiplexa hårdvaran, dvs göra hårdvaran bara en tu- sendel så bred och i stället använda den 1000 gånger. Den smala varianten p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 6.In the following reasoning, the starting point is an example with the data rate 50kHz for the sound, the clock frequency 50 MHz for the digital electronics (1000 times faster) and a reverberation time of 2 seconds and 16 bit data for both sound and impulse response. An impulse response of 100,000 samples is used. By using such a high clock frequency, it becomes possible to divide the problem into segments and time-multiplex the hardware, ie make the hardware only a thousandths as wide and use it 1000 times instead. The narrow variant p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 6.

I av hårdvaran har alltså precis samma struktur som den ursprungliga med ad- derarträd och pipelining Ett praktiskt utförande av en faltningsenhet i enlighet med uppﬁnning- en visas schematiskt i Fig. 2. lndata tas in via en indatabuffert 14, vilken lämpligen är utförd som ett skiftregister. lndatabufferten 14 är via en grind- krets 15 förbunden med ett första register 10. Det första registret 10 är opera- tivt förbundet med ett andra register 11. Även det första registret 10 och det andra registret är lämpligen utförda som skiftregister. Det första registret 10 och det andra registret är âterkopplade via en återkopplingsslinga 25 och grindkretsen 15, så att det är möjligt att skifta data fram och tillbaka mellan registren i en cirkulerande process. Processen beskrivs utförligt nedan.The practical structure of a hardware unit in accordance with the invention is shown schematically in Fig. 2. The input data is taken in via an input buffer 14, which is suitably designed as a shift register. The data buffer 14 is connected via a gate circuit 15 to a first register 10. The first register 10 is operatively connected to a second register 11. The first register 10 and the second register are also suitably designed as shift registers. The first register 10 and the second register are fed back via a feedback loop 25 and the gate circuit 15, so that it is possible to shift data back and forth between the registers in a circulating process. The process is described in detail below.

Ett minne 16 är anordnat för lagring av impulssvaret. lmpulssvaret är uppdelat i segment och verkar som ett ﬁlter på inslgnalen. l det visade ut- förandet är minnet 16 utfört för lagring av 1000 segment om vardera 100 sampel, eller termer. Ett segment av impulssvaret bearbetas tillsammans med ett motsvarande segment av indata. Ett första beräkningssteg sker i en multiplikationsenhet 12. Multiplikationsenheten 12 innefattar ett ﬂertal multi- plikationsorgan för parallell multiplikation av enskilda sampel ur indatasig- nalen och ur impulssvaret. Det segment av impulssvaret som ska bearbetas överförs via en multiplexer 17 till ett första ﬁlterregister 18. Minnet 16 är via multiplexern 17 också förbunden med ett andra ﬁlterregister 19.A memory 16 is provided for storing the impulse response. The impulse response is divided into segments and acts as a filter on the input signal. In the embodiment shown, the memory 16 is designed for storing 1000 segments of 100 samples each, or terms. A segment of the impulse response is processed together with a corresponding segment of input data. A first calculation step takes place in a multiplication unit 12. The multiplication unit 12 comprises a number of multiplication means for parallel multiplication of individual samples from the input data signal and from the impulse response. The segment of the impulse response to be processed is transferred via a multiplexer 17 to a first isterlter register 18. The memory 16 is also connected via the multiplexer 17 to a second reglter register 19.

Det i Fig. 2 visade utförandet är särskilt lämpligt för pipelining, eller lo- gisk rörledning. Därvid innefattar ingående komponenter, t.ex. register och beräkningsenheter, ﬂera logiska block, varvid varje block utför del av en ope- ration. Flera i serie liggande operationer utförs därigenom skenbart samtidigt.The embodiment shown in Fig. 2 is particularly suitable for pipelining, or logical pipeline. In this case, constituent components, e.g. registers and calculation units, ﬂ your logical blocks, each block performing part of an operation. Several operations in series are thus apparently performed simultaneously.

Ett ﬁlterregister räcker om lyssnarens huvud är alldeles stilla. Om lyss- naren vrider på huvudet kan man använda samma ekogram, men måste be- räkna nya impulssvar. På en modern PC kan detta genomföras på några tio- tals millisekunder, tillräckligt snabbt för att kunna skapa en skenbart konti- nuerligt rörlig ljudbild som följer huvudvridningen i realtid.An alter register is sufficient if the listener's head is completely still. If the listener turns his head, you can use the same echogram, but must calculate new impulse responses. On a modern PC, this can be done in a few tens of milliseconds, fast enough to be able to create a seemingly continuously moving sound image that follows the main rotation in real time.

Korrekt återgivning även vid huvudvridning kan åstadkommas genom att dubblera minnet för impulssvar på liknande sätt som motsvarande buffert- p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 7. b ar i faltningsenheten. Medan det ena minnet används författning så fylls det andra med nytt innehåll. Växling mellan minnena kan ske momentant.Correct reproduction even at head rotation can be achieved by doubling the memory for impulse responses in a similar manner as the corresponding buffer p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 7. b ar in the folding unit. While one memory uses constitution, the other is filled with new content. Switching between memories can take place instantaneously.

Medan data från det första ﬁlterregistret 18 bearbetas kan följaktligen nya ﬁlterdata ur impulssvaret överföras till det andra ﬁlterregistret 19. Data används och laddas omväxlande i de båda ﬁlterregistren 18 och 19, så att bearbetningen kan ske utan fördröjning för laddning av registren.Consequently, while data from the first alter register 18 is being processed, new alter data from the impulse response can be transferred to the second alter register 19. Data is used and loaded alternately in the two alter registers 18 and 19, so that the processing can take place without delay for loading the registers.

I utförandet enligt Fig. 2 och med lämplig Klockfrekvens hos den os- cillator som styr elektroniken kommer all hårdvara används hela tiden vid kontinuerlig drift. Ett sätt att ytterligare effektivisera lösningen är att skräddar- sy de olika beräkningsenheterna så att inte onödigt många bitar används i varje fall. På så sätt är det möjligt att både öka hastigheten och minska mängden hårdvara.In the embodiment according to Fig. 2 and with a suitable clock frequency of the oscillator which controls the electronics, all hardware will be used at all times during continuous operation. One way to further streamline the solution is to tailor the various calculation units so that not unnecessarily many bits are used in each case. In this way, it is possible to both increase the speed and reduce the amount of hardware.

Varje cell hos det andra skiftregistret 11 och hos ﬁlterregistren är för- bundna med ett enskilt multiplikationsorgan hos multiplikationsenheten 12, så att multiplikationen kan ske parallellt. Resultatet från varje multiplikation är av längden 32 bitar, om de ingående faktorerna har längden 16 bitar. Normalt behöver emellertid endast de 11 mest signifikanta bitarna av resultatet an- vändas. Det kan vara än mer effektivt att skräddarsy multiplikationsorganen, så att de endast beräknar de bitar som behövs. l ett första steg hos ad- ditionsenheten 13 används då 11+11 bitar, vilket i sin tur ger ett resultat om 12 bitar. Alltså ökas antalet bitar med ett för varje steg adderarträdet. Bero- ende på antalet steg och antalet segment tas bara så många bitar som be- hövs med i slutresultatet.Each cell of the second shift register 11 and of the filter registers is connected to a single multiplication means of the multiplication unit 12, so that the multiplication can take place in parallel. The result of each multiplication is 32 bits in length, if the included factors are 16 bits in length. Normally, however, only the 11 most significant bits of the result need to be used. It can be even more efficient to tailor the multiplication means, so that they only calculate the pieces needed. In a first step of the addition unit 13, 11 + 11 bits are then used, which in turn gives a result of 12 bits. Thus, the number of pieces is increased by one for each step the adder tree. Depending on the number of steps and the number of segments, only as many pieces are taken as are needed in the end result.

En utgång 20 hos additionsenheten 13 är operativt förbunden med en beräkningsenhet 21. En styrenhet 22 är operativt förbunden med beräknings- enheten 21 och en utdatabuffert 23. Styrenheten 22 säkerställer att de del- resultat från faltningsoperationerna som är tillgängliga på utgången 20 adde- ras med ett associerat tidigare beräknat och i utdatabufferten 23 lagrat del- resultat. Styrenheten 22 styr lämpligen också övriga komponenter, tex. skift- registren, beräkningsenheterna och multiplexern. Både multiplikationsenhe- ten 12 och additionsenheten 13 är lämpligen utförda i rörledningsteknik, så kallad pipelining. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 8.An output 20 of the addition unit 13 is operatively connected to a calculation unit 21. A control unit 22 is operatively connected to the calculation unit 21 and an output data buffer 23. The control unit 22 ensures that the sub-results from the convolution operations available on the output 20 are added to an associated previously calculated and stored partial result in the output data buffer 23. The control unit 22 suitably also controls other components, e.g. the shift registers, the calculation units and the multiplexer. Both the multiplication unit 12 and the addition unit 13 are suitably made in pipeline technology, so-called pipelining. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 8.

Funktionen hos kretsen i Fig. 2 kan schematiskt beskrivas på följande sätt. Ljudsampel skiftas in i registren, och de olika delarna av impulssvaret lagras i minnet 16 för att sedan laddas in i ett av ﬁlterregistren en del i taget.The operation of the circuit of Fig. 2 can be schematically described in the following manner. Audio samples are shifted into the registers, and the various parts of the impulse response are stored in the memory 16 and then loaded into one of the filter registers one part at a time.

Registret är dubblerat, det ena laddas medan det andra används för faltning och sedan växlas funktionen (själva bytet sker mellan två klockcykler och tar ingen tid). Eftersom utdata då inte längre produceras i rätt takt införs en ut- databuffert 23 i form av ett minne med särskild beräkningsenhet, se beskriv- ningen till Fig. 6 nedan.The register is doubled, one is loaded while the other is used for folding and then the function is switched (the actual change takes place between two clock cycles and takes no time). Since the output data is then no longer produced at the correct rate, an output buffer 23 is introduced in the form of a memory with a special calculation unit, see the description of Fig. 6 below.

Medan ﬁltersegmentet ändå ligger inne utförs faltningsoperationer någ- ra punkter framåt i tiden (som redan registrerats). Detta sker med hjälp av en tredelad indatabuffert, innefattande indatabufferten 14 och skiftregistren 10 och 11, i vilka indata "gungas" fram och tillbaka i de båda skiftregistren me- dan nya sampel läses in i indatabufferten 14. Resultaten överlagras sedan på rätt positioner i utdatabufferten 23 och matas så småningom ut i rätt takt.While the ﬁlter segment is still in place, folding operations are performed at some point in time (which has already been registered). This is done by means of a three-part input buffer, comprising the input buffer 14 and the shift registers 10 and 11, in which the input data is "swung" back and forth in the two shift registers while new samples are read into the input buffer 14. The results are then superimposed on the correct positions in the output buffer 23 and is eventually fed out at the right rate.

I samband med hantering av huvudvridning, som beskrivs schematiskt ovan, erfordras också en mindre modiﬂering av styrningen för utdatabufferten så att bufferten nollställs bara vid start med nytt ljud och inte då nya impuls- svar laddas in. På så sätt erhålls automatiskt en "glidande övergång" mellan ﬁlter för två diskreta riktningar. Det är inte nödvändigt att byta ﬁlter oftare än vad som motsvarar ungefär halva efterklangstiden, dvs i ovan beskrivna ex- emplet en gång per sekund.In connection with handling of main rotation, which is schematically described above, a minor modification of the control for the output buffer is also required so that the buffer is reset only when starting with a new sound and not when new impulse responses are loaded. In this way, a "sliding transition" between ﬁ lter for two discrete directions is automatically obtained. It is not necessary to change the filter more often than what corresponds to approximately half the reverberation time, ie in the example described above once per second.

F ig. 3A-3D visar hur indata kan användas på ett effektivt sätt. I ett grundläge, som visas i Fig. 3A, skiftas nya indatasampel in via en ingång 24 hos indatabufferten 14. Innehållet i indatabufferten 14 skiftas sedan vidare in i det första skiftregistret 10 och det andra skiftregistret 11. Under faltnings- operationerna kommer indatabufferten 14, det första skiftregistret 10 och det andra skiftregistret 11 att innehålla olika generationer av indata.F ig. 3A-3D shows how input data can be used efficiently. In a basic position, shown in Fig. 3A, new input samples are shifted in via an input 24 of the input buffer 14. The contents of the input buffer 14 are then further shifted into the first shift register 10 and the second shift register 11. During the folding operations, the input buffer 14, the first shift register 10 and the second shift register 11 to contain different generations of input data.

Under faltningsoperationerna är skiftregistren förbundna med varandra på det sätt som visas i Fig. 3B. Härvid är indatabufferten 14 åtskild från skift- registren och tillåts inte förändra något registerinnehåll. Inkommande indata- sampel fortsätter dock att skiftas in i indatabufferten 14 i den takt de ankom- mer. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 Q I Medan ett segment av impulssvaret är inläst i ett av ﬁlterregistren, tex. det första ﬁlterregistret 18, "gungas" indata i det första skiftregistret 10 och det andra skiftregistret 11 fram och tillbaka tills samtliga sampel kombinerats med impulssvarets sampel i det första ﬁlterregistret 18. De i varje position uppkomna delresultaten adderas till tillhörande positioner hos utdatabufferten via beräkningsenheten 21 och matas så småningom ut i rätt takt. Med en lösning anpassad till exemplet är samtliga buffertar/register 100 positioner, eller sampel, långa och 16 bitar breda och minnet 16 för impulssvaret inne- håller 1000 segment. På så sätt hålls faltningsenheten sysselsatt varje klock- cykel när den väl gått igång. u l det i Fig. 3C visade tillståndet har alla sampel från ett segment av in- datasignalen utnyttjats. lndatabufferten 14 innehåller därvid en helt ny upp- sättning, eller generation, indata benämnd G(n). Det första skiftregistret 10 innehåller senast använda data motsvarande en generation G(n-1) och det andra skiftregistret 11 innehåller ännu tidigare utnyttjade data, motsvarande en generation G(n-2).During the folding operations, the shift registers are interconnected in the manner shown in Fig. 3B. In this case, the input buffer 14 is separated from the shift registers and no change in register contents is permitted. However, incoming input samples continue to shift into the input buffer 14 as they arrive. p1000179 ps.doc; ver 4 10 15 20 25 30 516 467 Q I While a segment of the impulse response is loaded in one of the filter registers, e.g. the first shift register 18, the input data in the first shift register 10 and the second shift register 11 are "rocked" back and forth until all samples are combined with the impulse response sample in the first shift register 18. and eventually fed out at the right rate. With a solution adapted to the example, all buffers / registers are 100 positions, or samples, long and 16 bits wide and the memory 16 for the impulse response contains 1000 segments. In this way, the folding unit is kept busy every clock cycle once it has started. In the state shown in Fig. 3C, all samples from a segment of the input signal have been used. The data buffer 14 then contains a completely new set, or generation, of input data called G (n). The first shift register 10 contains the most recently used data corresponding to a generation G (n-1) and the second shift register 11 contains even previously used data, corresponding to a generation G (n-2).

Omkoppling sker därefter till det i Fig. 3D visade tillståndet, i vilket in- databuffertens 14 innehåll förts över till det första skiftregistret 10. Detta är möjligt, eftersom grindkretsen 15 från styrenheten erhållit instruktion att öpp- na för överföringen. Det första skiftregistret 10 innehåller därvid generationen G(n) data. lndatabufferten nollställs samtidigt som förberedelse för införande av nya indatasampel. Vid införandet av nya data från lndatabufferten 14 i det första skiftregistret 10 skiftas samtidigt de senast använda data över från det första skiftregistret 10 till det andra skiftregistret 11, vilket därvid innehåller generationen G(n-1) data. Data i det andra skiftregistret 11 leds inte in i det första skiftregistret 10, eftersom återkopplingsslingan 25 mellan det andra skiftregistret 11 och det första skiftregistret 10 är bruten. Detta kan åstadkom- mas genom att grindkretsen 15 bryter förbindelsen. l det i F ig. 3E visade tillståndet har grindkretsen 15 åter brutit förbindel- sen mellan indatabufferten 14 och det första skiftregistret 10, medan åter- kopplingsslingan 25 åter är sluten. En ny serie faltningsoperationer kan på- börjas och en ny generation indata G(n+1) samlas upp i lndatabufferten 14. p1000179 ps.doc; ver 4 10 15 20 25 516 467 19 I Varje faltning ger upphov till en utdatapunkt och den har beräknats ut- gående från ett visst läge hos den fram- och återcirkulerande bufferten och ett visst segment av impulssvaret. För att ange läget i databufferten är ut- gångspunkten att 100 nya data just skiftats in, se Fig. 4. Detta läge kallas B(0). Nästa klockpuls ger läget B(1), osv. till B(99). Data skiftas först åt höger genom registren, tills samtliga data ur det första registret skiftats in i det and- ra registret 11. l varje position sker en faltningsoperation. Därefter utnyttjas samma data ännu en gång genom att data skiftas i omvänd riktning. På grund av "gungningen" så är nästa läge B(98). Den allra första faltningen härstammar alltså från l(0) och B(0). Detta värde benämns I(O)B(O). Nästa värde som produceras blir då l(0)B(1) osv. Efter |(0)B(99) kommer sedan l(1)B(98).Switching then takes place to the state shown in Fig. 3D, in which the contents of the input buffer 14 are transferred to the first shift register 10. This is possible, since the gate circuit 15 has been instructed by the control unit to open for the transfer. The first shift register 10 then contains the generation G (n) data. The data buffer is reset at the same time as preparation for the introduction of new input samples. When entering new data from the input buffer 14 in the first shift register 10, the most recently used data is simultaneously shifted from the first shift register 10 to the second shift register 11, which thereby contains the generation G (n-1) data. Data in the second shift register 11 is not routed into the first shift register 10, since the feedback loop 25 between the second shift register 11 and the first shift register 10 is broken. This can be achieved by the gate circuit 15 breaking the connection. l it in F ig. 3E, the gate circuit 15 has again broken the connection between the input buffer 14 and the first shift register 10, while the feedback loop 25 is closed again. A new series of convolution operations can be started and a new generation of input data G (n + 1) collected in the input buffer 14. p1000179 ps.doc; ver 4 10 15 20 25 516 467 19 I Each convolution gives rise to an output data point and it has been calculated on the basis of a certain position of the forward and recirculating buffer and a certain segment of the impulse response. To indicate the position in the data buffer, the starting point is that 100 new data have just been shifted in, see Fig. 4. This position is called B (0). The next clock pulse gives the position B (1), etc. to B (99). Data is first shifted to the right through the registers, until all data from the first register has been shifted into the second register 11. In each position a folding operation takes place. Then the same data is used again by shifting data in the reverse direction. Due to the "rocking", the next position is B (98). The very first folding thus originates from 1 (0) and B (0). This value is called I (O) B (O). The next value produced is then l (0) B (1) and so on. After | (0) B (99) then comes l (1) B (98).

När alla segmenten i impulssvaret har gåtts igenom, skiftas 100 nya sampel in och processen börjar om från början med I(O)B(O). Detta värde skall adderas till det tidigare inlagda l(1)B(0). För att skilja faltningsresultaten åt måste därför införas en beteckning för indatatidpunkt. Tiden för första inda- tabufferten kallas för T(0) osv. Med detta tillägg blir värdet för den nya utda- tapunkten (nr 100) efter uppdatering T(O)l(1)B(0) + T(1)l(0)B(0). Till följd av ”gungningen” blir tidsförloppet komplicerat. Beräkningarna kan förstås enkla- re genom fokusering på resultatet i utdatabufferten 23.When all segments of the impulse response have been passed, 100 new samples are shifted in and the process starts again with I (O) B (O). This value must be added to the previously entered l (1) B (0). In order to distinguish the convolution results, a designation for input time must therefore be introduced. The time for the first input buffer is called T (0) and so on. With this addition, the value of the new output point (no. 100) after update becomes T (O) l (1) B (0) + T (1) l (0) B (0). As a result of the "rocking", the passage of time becomes complicated. The calculations can be understood more simply by focusing on the result in the output data buffer 23.

Följande beteckningar införs: m antal element i faltningsregistret, n antal segment i impulssvarsminnet 00) element i utdatabufferten Vid så kallad "steady state", dvs. då faltningen pågått en stund gäller följande Formel 3: Om = Éﬂp + f>1 ß <3) i=0 där p1000179 psdoc; ver 4 10 15 20 25 30 516 467 11 . p=0, 1, q=aL.WmJ j=p+q När faltningen påbörjas skall alla värden i utdatabufferten 23 vara noll- ställda. Som framgår av tabellen ovan så tar det en viss tid innan alla termer- na för uppdatering av varje utdatapunkt ﬁnns tillgängliga. Det tar närmare be- stämt precis den tid som impulssvaret är långt.The following designations are introduced: m number of elements in the convolution register, n number of segments in the impulse response memory 00) elements in the output data buffer At the so-called "steady state", ie. when the folding has been going on for a while, the following Formula 3 applies: If = É ﬂ p + f> 1 ß <3) i = 0 where p1000179 psdoc; ver 4 10 15 20 25 30 516 467 11. p = 0, 1, q = aL.WmJ j = p + q When the convolution starts, all values in the output buffer 23 must be reset. As shown in the table above, it takes some time before all the terms for updating each output point are available. It takes more precisely the time that the impulse response is long.

I praktiken kan utdatabufferten 23 vara utförd som ett vanligt RAM- minne organiserat som en ringbuffert i enlighet med Fig. 5. Ringbufferten har en adresspekare 26 för start och en adresspekare 27 för slut. Mellan start och slut är bufferten nollställd. En styrenhet genererar de adresser som be- hövs för att uppdatering i varje ögonblick skall ske i rätt position och att ut- matning av data sker på rätt sätt. Samtidigt som en datapunkt matas ut noll- ställs den i ringbufferlen och blir därmed redo för att så småningom bli första värde i bufferten igen.In practice, the output data buffer 23 may be designed as an ordinary RAM memory organized as a ring buffer in accordance with Fig. 5. The ring buffer has an address pointer 26 for start and an address pointer 27 for end. Between start and end, the buffer is reset. A control unit generates the addresses needed for the update to take place in the correct position at all times and for the data to be output correctly. At the same time as a data point is output, it is reset in the ring buffer and thus becomes ready to eventually become the first value in the buffer again.

Om faltningsenheten producerar ett resultat varje klockcykel, blir det nödvändigt att under en klockcykel läsa ut ett gammalt värde, addera ett nytt bidrag och skriva tillbaka resultatet till utbufferten. Detta kan till exempel lö- sas genom att utforma en RAM-krets, så att två värden kan nås åt gången.If the convolution unit produces a result every clock cycle, it becomes necessary to read out an old value during a clock cycle, add a new contribution and write back the result to the output buffer. This can be solved, for example, by designing a RAM circuit, so that two values can be reached at a time.

Mellan faltningsenheten och minnet måste det därför ﬁnnas en beräknings- enhet som läser, ackumulerar och skriver.Between the convolution unit and the memory, there must therefore be a calculation unit that reads, accumulates and writes.

Fyra additioner utförs under ett processteg. Eftersom de fyra addition- erna inte är jämnt fördelade över de fyra klockcyklerna måste minst ett buf- fertregister ﬁnnas för mellanlagring av faltningsresultat från en klockcykel till en annan. Ett praktiskt utförande av en sådan beräkningsenhet 21 framgår av Fig. 6. Beräkningsenheten 21 innehåller två parallella pipelines 28 och 29, som under fyra klockcykler läser respektive skriver i två och adderar i fyra.Four additions are performed during a process step. Since the four additions are not evenly distributed over the four clock cycles, at least one buffer register must be created for intermediate storage of convolution results from one clock cycle to another. A practical embodiment of such a calculation unit 21 is shown in Fig. 6. The calculation unit 21 contains two parallel pipelines 28 and 29, which during four clock cycles read and write in two and add in four.

Under klockcykel 1 sker överföring av värden från utdataminnet till buf- fertregister #1 och buffertregister #2. p1000179 ps.doc; ver 4 10 15 20 25 516 467 12 v i Under klockcykel 2 adderas ett nytt resultatvärde från faltningsenheten till värdet i buffertregistret #1. Vidare adderas ett resultatvärde från faltnings- enheten till värdet i buffertregistret #2. Vidare sker överföring av värden från utdataminnet till buffertregister #3 och buffertregister #4.During clock cycle 1, values are transferred from the output memory to buffer register # 1 and buffer register # 2. p1000179 ps.doc; ver 4 10 15 20 25 516 467 12 v i During clock cycle 2, a new result value from the convolution unit is added to the value in the buffer register # 1. Furthermore, a result value from the convolution unit is added to the value in the buffer register # 2. Furthermore, values are transferred from the output memory to buffer register # 3 and buffer register # 4.

Under klockcykel 3 överförs värdena i buffertregistret #1 och buffert- registret #2 till utdatabufferten 23. Samtidigt adderas ett nytt resultatvärde från faltningsenheten till värdet i buffertregistret #3 och motsvarande addition sker till värdet i buffertregistret #4.During clock cycle 3, the values in buffer register # 1 and buffer register # 2 are transferred to the output data buffer 23. At the same time, a new result value is added from the convolution unit to the value in buffer register # 3 and a corresponding addition is made to the value in buffer register # 4.

Under klockcykel 4 överförs slutligen värdena i buffertregistret #3 och buffertregistret #4 till utdatabufferten 23. Den gemensamma styrenheten 22 hanterar dessa beräkningar och adresserar minnet.During clock cycle 4, the values in the buffer register # 3 and the buffer register # 4 are finally transferred to the output data buffer 23. The common control unit 22 handles these calculations and addresses the memory.

Den ovan beskrivna lösningen förutsätter att Iyssnarens huvud är all- deles stilla. Om lyssnaren vrider på huvudet kan samma ekogram användas, men nya impulssvar måste beräknas. På en modern PC kan detta genom- föras på några tiotals millisekunder, tillräckligt snabbt för att kunna skapa en skenbart kontinuerligt rörlig ljudbild som följer huvudvridningen i realtid.The solution described above presupposes that the Iyssnar's head is completely still. If the listener turns his head, the same echogram can be used, but new impulse responses must be calculated. On a modern PC, this can be done in a few tens of milliseconds, fast enough to be able to create a seemingly continuously moving sound image that follows the main rotation in real time.

En möjlig komplettering för att hantera huvudvridning är att dubblera minnet 16 för impulssvar på liknande sätt som motsvarande ﬁlterregister 18 och 19 i faltningsenheten. Medan det ena minnet används för faltning fylls det andra med nytt innehåll. Växling mellan minnena kan ske momentant, så att faltningen inte behöver avbrytas. En mindre modifiering av styrningen för utdatabufferten krävs så att bufferten nollställs bara när vid start med nytt ljud och inte då nya impulssvar laddas in. På så sätt erhålls automatiskt en "g|i- dande övergång" mellan ﬁlter för två diskreta riktningar. Det är inte nöd- vändigt att byta ﬁlter oftare än vad som motsvarar ungefär halva efterklangs- tiden, dvs i det beskrivna exemplet en gång per sekund. p1000179 ps.doc; ver 4A possible addition to handle main rotation is to double the memory 16 for impulse response in a similar way as the corresponding alternator registers 18 and 19 in the convolution unit. While one memory is used for folding, the other is filled with new content. Switching between memories can take place instantaneously, so that folding does not have to be interrupted. A minor modification of the control for the output buffer is required so that the buffer is reset only when starting with a new sound and not when new impulse responses are loaded. In this way, a "smooth transition" between ﬁ lter for two discrete directions is automatically obtained. It is not necessary to change the .lter more often than what corresponds to approximately half the reverberation time, ie in the described example once per second. p1000179 ps.doc; ver 4

Claims

10 15 20 25 30 516 467 13 I 5 PATENT CLAIMS

A method of folding a digital input signal, characterized in that digital signals corresponding to an impulse response for at least one edition of a room environment are stored, that samples of the digital input signal are continuously stored, that a time-limited set of samples of the digital input signal is continuously stored. the input signal is delimited from continuously incoming samples, that the digital signals of the impulse response are divided into a segment consisting of coefficients, that convolution operations are repeatedly performed with a segment of the impulse response and the time-limited set of samples of the digital input signal until all segments of the impulse response treated, that the convolution operations are performed by parallel multiplication of the coefficients of the impulse response and the time-limited set of samples of the digital input signal and by repeated pairwise addition of the products to form sub-results and, that sub-results are summed with previously calculated associated sub-results to form of one output sample, whereby the convolution is performed in the schedule.

Method according to claim 1, wherein the folding operations are performed by a pipeline method (pipelining).

The method of claim 1, wherein the number of coefficients in a segment of the impulse response is equal to the number of samples in the time-limited set of samples of the digital input signal.

The method of claim 1, wherein folding operations are repeatedly performed with a first segment of the impulse response while a second segment of the impulse response is prepared for the folding operations. p1000179 ps.doc; ver 4 10 15 20 25 30

The method of claim 1, wherein the sub-results from the convolution operations are stored until all coefficients in all segments of the impulse response have undergone convolution operations.

6. Device for folding a digital input signal, characterized in that a to (a) a memory (16) for storing digital signals corresponding to an impulse response for at least one edition of a room environment is operational. associated with at least a first filter register (18), the memory (16) and the first filter register (18) are divided into a number of segments with a number of cells in each segment, each cell being able to contain a coefficient of the digital signals of the impulse response, a the input buffer (14) is arranged for storing samples of the digital input signal, the input buffer (14) is operatively connected to memory means (10, 11), the memory means (10, 11) is designed for stepwise shifting of data back and forth between cells of the memory means. (10, 11), each cell of the first filter register (18) is connected to a first input of a multiplier means comprising multiplication unit (12) and cells of the memory means (10, 11) are connected to a second input of the multiplication unit (12) for parallel multiplication of the contents of the cells, one output of each multiplication means is connected to inputs of an addition unit (13) for summing the products from the multiplications to a sub-result, an output of the addition unit (13) via a computing unit (21) is operatively connected to an output data buffer (23), the computing unit (21) comprises addition elements, and a control unit (22) is operatively connected to the computing unit (21) and the output data buffer (23) for controlling the computing unit (21) to sum up new sub-results with previously calculated associated sub-results to form an output sample. p1000179 ps.doc; ver 4 10 15 20 516 467 15. b

The apparatus of claim 6, wherein the input buffer (14) is operatively connected to a first shift register (10) for transmitting a time-limited set of samples of the digital input signal, and the first shift register (10) is operatively connected to a second shift register (11) for stepwise shifting of data back and forth between cells of the first shift register (10) and the second shift register (11),

Device according to claim 6, wherein the multiplication unit (12) is designed to work with pipeline technology (pipelining).

Device according to claim 6, wherein the addition unit (13) is designed to work with pipeline technology (pipelining).

Device according to claim 6, wherein the memory (16) is operatively connected to a second alternator register (19) for transmitting coefficients therein, and wherein the first alternator register (18) and the second alternator register (19) are alternately operatively connected. with the multiplier (12) and with the memory (16) respectively.

The device of claim 6, wherein the memory (16) is doubled to enable uninterrupted folding when changing the impulse response. p1000179 ps.doc; ver 4