FI117956B

FI117956B - Circuit and method for calculating a sum of absolute differences quickly and efficiently

Info

Publication number: FI117956B
Application number: FI20050388A
Authority: FI
Inventors: Timo D Haemaelaeinen; Jarno Vanne; Eero Aho
Original assignee: Timo D Haemaelaeinen; Jarno Vanne; Eero Aho
Priority date: 2005-04-15
Filing date: 2005-04-15
Publication date: 2007-04-30
Also published as: FI20050388A; WO2006108912A1; FI20050388A0

Description

117956117956

Piiri ja menetelmä erojen itseisarvojen summan laskemiseksi nopeasti ja tehokkaasti - Krets och metod för att räkna en sum av absoluta skiiinader snabbt och effektivtCircuit and Method to Calculate Sum of Differences Absolute Values Quickly and Effectively - Krets och method för att räkna en sum av absoluta skiiinader snabbt och effektivt

5 TEKNINEN ALATECHNICAL FIELD

Keksintö liittyy digitaalisen videokoodauksen alaan. Erityisesti keksintö liittyy liike-vektorin määrittämiseen sen selvittämiseksi, miten paljon ja mihin suuntaan digitaalisen videokehyksen makrolohko on siirtynyt vertailukehyksen parhaiten vastaavaan makrolohkoon nähden.The invention relates to the field of digital video coding. In particular, the invention relates to determining a motion vector to determine how much and in what direction a macroblock of a digital video frame has moved relative to the corresponding macroblock of the reference frame.

10 KEKSINNÖN TAUSTAA 'BACKGROUND OF THE INVENTION '

Liikkeen estimointi on eräs tunnettu tapa vähentää peräkkäisten videokehysten välistä ajallista redundanssia, ja sitä hyödynnetään yleisesti digitaalisissa videokoo-dausmenetelmissä. Digitaalisen videokuvavirran kehys on jaettu ei-limittäisiin lohkoihin, joita yleisesti nimitetään makrolohkoiksi. Kunkin makrolohkon kohdalla ΜΙ 5 kitaan vertailukehystä parhaiten vastaavan aiemman makrolohkon löytämiseksi, so. vertailukehyksen sellaisen makrolohkon löytämiseksi, joka pikselisisällöltään on mahdollisimman samanlainen nykyiseen makrolohkoon verrattuna. Osana nykyisen kehyksen koodausta muodostetaan liikevektori, joka ilmaisee kuinka paljon • ♦ v.; ja mihin suuntaan nykyinen makrolohko on siirtynyt parhaiten vastaavasta aiem- v.; 20 masta makrolohkosta. Lisäksi tarvitaan erosignaali, joka ilmaisee kuinka paljon ja millä tavoin nykyinen makrolohko eroaa jopa parhaiten vastaavasta aiemmasta ;***: makrolohkosta.Motion estimation is one known method of reducing the time redundancy between consecutive video frames, and is commonly used in digital video encoding methods. The frame of the digital video stream is divided into non-overlapping blocks, commonly referred to as macroblocks. For each macroblock, ΜΙ 5 pits to find the previous macroblock that most closely matches the reference frame, i.e.. a frame of reference to find a macroblock that is as similar in pixel content as possible to the current macroblock. As part of the encoding of the current frame, a motion vector is formed which indicates how much • ♦ v .; and in which direction the current macroblock has moved best from the corresponding one; 20 large macroblocks. In addition, a difference signal is needed to indicate how much and in what way the current macroblock differs even from the best corresponding prior; ***: macroblock.

··· • · ♦ · · · Makrolohkon yleinen koko on 16 kertaa 16 pikseliä. Tarkasti ottaen makrolohkon * · *···' yleinen määritelmä tarkoittaa 16x16-kokoista luminanssilohkoa sekä kahta ava- 25 ruudellisesti vastaavaa 8x8-kokoista krominanssilohkoa, mutta koska luminanssi- ja krominanssikomponentit korreloivat voimakkaasti, on tavanomaista käsitellä :***: vain luminanssidataa liikkeen estimoinnissa.··· • · ♦ · · · The overall size of a macroblock is 16 times 16 pixels. Strictly speaking, the generic definition of a macroblock * · * ··· 'means a 16x16 luminance block and two open space 8x8 chrominance blocks, but since the luminance and chrominance components are highly correlated, it is common to treat: ***: luminans only .

• · ·• · ·

Liikkeen estimointi on ylivoimaisesti eniten kustannuksia aiheuttava osa digitaaiis- ta videokoodausta, koska sen aiheuttama laskentatarve on suuri ja koska mainit- : .*. 30 tua laskentaa suorittavat laitteisto-osat vaativat suhteellisen suuren mikrosirutilan.Motion estimation is by far the most costly part of digital video coding because of its high computational need and because of the: -. *. Components performing 30 computations require relatively large microchip space.

• · .·'··[ Edullinen liikkeen estimoinnin arkkitehtuuri on sellainen, joka on mahdollista to- • · teuttaa mahdollisimman vähillä laskentaoperaatioilla ja mahdollisimman vähillä loogisilla porteilla. Kompromisseja saatetaan tarvita, jolloin tietty lisäys mikrosirut!- ; i 2 117956 latarpeessa voi olla hyväksyttävä, mikäli saavutetaan merkittävä parannus laskentanopeudessa, ja päinvastoin.The inexpensive motion estimation architecture is one that can be implemented with as few computation operations and logical gates as possible. Compromises may be needed, with a certain increase in microchips! -; i 2 117956 may be acceptable if a significant improvement in computing speed is achieved, and vice versa.

Kuva 1 esittää erästä yleisesti hyväksyttyä ratkaisua, kolmesta eri asteesta koostuvaa ns. pipeline-putkea, joka on yleinen järjestely määritettäessä nykyisen mak-5 rolohkon ja ehdokasmakrolohkon ns. SAD-arvoa (erojen itseisarvojen summaa).Figure 1 shows a generally accepted solution, the so-called three-stage solution. pipeline, which is a general arrangement for defining what is known as the current macro block and candidate macro block. SAD (sum of the absolute values of the differences).

SAD mittaa sitä, miten tarkasti ehdokasmakrolohko vastaa nykyistä makrolohkoa, so. voisiko ehdokasmakrolohko sopia parhaiten vastaavaksi aiemmaksi makroloh-koksi. Ensimmäinen aste koostuu rivistä eroitseisarvoyksiköitä 101. Nykyisestä makrolohkosta ja ehdokasmakrolohkosta luetaan joukko pikseleitä, ja avaruudelli-10 sesti vastaavat pikseliarvot syötetään pareittain eroitseisarvoyksiköihin. Kuten nimestä voi päätellä, kukin eroitseisarvoyksikkö 101 laskee vastaanottamansa kahden pikseliarvon välisen eron itseisarvon. Näiden summaus suoritetaan toisessa asteessa, jota yleisesti nimitetään kompressointirakenteeksi 102. Se laskee kaikkien ensimmäiseltä asteelta vastaanottamiensa erojen itseisarvojen summan ja 15 lähettää tuloksen kolmannelle asteelle, joka on minimi-SAD-määrittäjä 103. Verratessaan nykyistä makrolohkoa joukkoon ehdokasmakrolohkoja minimi-SAD-määrittäjä 103 pitää lukua siitä, mikä niistä johti pienimpään SAD-arvoon, so. mikä oli eniten samanlainen nykyisen makrolohkon kanssa.SAD measures how closely the candidate macroblock matches the current macroblock, i.e.. whether the candidate macroblock could best fit into the corresponding prior macroblock. The first degree consists of a set of difference value units 101. The current macroblock and the candidate macroblock read a number of pixels, and spatially corresponding pixel values are fed in pairs to the difference value units. As the name suggests, each difference value unit 101 calculates the absolute value of the difference between the two pixel values it receives. The summation of these is performed in a second order, commonly referred to as a compression structure 102. It calculates the sum of the absolute values of all differences it receives from the first order and sends the result to the third order as a minimum SAD determiner 103. When comparing a current macroblock which led to the lowest SAD value, i.e. which was most similar to the current macroblock.

Kuvan 1 mukaiseen yleiseen ratkaisuun perustuvia liikkeen estimoijia tunnetaan 20 esimerkiksi tekniikan tason julkaisuista US 5 864 372 ja US 5 838 392. Seuraa- * · v.; vassa tarkastellaan joitakin niiden ominaisuuksia lähemmin.Motion estimators based on the general solution of Figure 1 are known from, for example, prior art US 5,864,372 and US 5,838,392. Let's look at some of their features in more detail.

• ♦ • · · • · ·• ♦ • · · · · ·

Kuva 2 esittää nk. Ohenin eroitseisarvoyksikköä, jota käytetään tässä esimerkkinä * · \.Γ tunnetusta eroitseisarvoyksiköstä 101. Se laskee yhteen operandin A (joka tässä y /"* esimerkissä on kahdeksanbittinen arvo a7,a6.....a0) ja operandin B (tässä ; * · * 25 b7,b6,...,bo) kahden komplementin. B:n kahden komplementointi toteutetaan biteit- täin invertoimalla B ja tuomalla oletusarvo '1' summaimen 201 muistinumerotu-loon. Jos operandi A on alun perin suurempi, summain muodostaa suoraan oikean ϊ.·|: positiivisen tuloksen ja ilmaisee muistinumeron siirron (c7=T), joka invertoidaan: C=0. Jos operandi B oli alun perin suurempi, summain 201 antaa kahdenkomple-30 menttinegatiivisen tuloksen, joka muutetaan positiiviseksi rivillä XOR-lähtöportteja *···[ 202, jotka tällä kertaa saavat toiseen tuloonsa c7 = C = 1. C-arvo käsittää myös korjausbitin, joka siirretään eteenpäin SAD-määrityksen seuraavaan asteeseen : otettavaksi huomioon siellä. Heittomerkki notaatiossa 'ABS muistuttaa siitä, että • · · · :***: korjausbitti C puuttuu tuloksesta.Figure 2 shows the so-called Ohen difference value unit used here as an example of the * · \ .Γ known difference value unit 101. It adds the operand A (which in this example is an eight-bit value a7, a6 ..... a0) and the operand B (here; * · * 25 b7, b6, ..., bo) The complementation of two complementaries of B is accomplished bitwise by inverting B and bringing the default value '1' to the memory number range of adder 201. If operand A is initially larger , the adder directly generates the correct ϊ. · |: positive result and indicates the memory number shift (c7 = T) that is inverted: C = 0. If the operand B was originally larger, the adder 201 gives a double complement of 30, which is converted to a positive line. XOR output ports * ··· [202 which this time get their second input c7 = C = 1. The C value also includes a correction bit which is passed to the next step of the SAD specification: to be taken into account there. in notation 'ABS reminds you that • · · ·: ***: correction bit C is missing from the result.

• · · 3 117956• · · 3 117956

Kuva 3 esittää summainpuuta, joka tunnetaan tieteellisestä julkaisusta Q. Shu and H. Chen: "An efficient implementation of motion estimation algorithms", in Proc. 4th Int. Conf. Solid-State and Integrated Circuit Technology, Oct. 1995, pp. 697-699. Summainpuu oli itse asiassa vain varsinaisten kompressointirakennerat-5 kaisujen edeltäjä, mutta se tarjoaa hyvin intuitiivisen tavan ymmärtää kuvassa 1 esitetyn toisen asteen tarkoitusta. Nykyisen makrolohkon ja ehdokasmakrolohkon pikselien erojen itseisarvot syötetään pareittain ensimmäisen rivin summaimiin 301. Summaus etenee toisen rivin summainten 302 ja kolmannen rivin summain-ten 303 kautta lähtösummaimeen 304. Kukin summaimista 301, 302, 303 ja 304 10 vastaanottaa myös valitun korjausbitin ottaakseen sen huomioon. Summainpuun lähtö koostuu lopullisesta summasta 'SAD0..15 ja jäljellejääneestä korjausbitistä C15.Figure 3 shows an adder tree known from the scientific publication Q. Shu and H. Chen, "An Efficient Implementation of Motion Estimation Algorithms," in Proc. 4th Int. Conf. Solid-State and Integrated Circuit Technology, Oct. 1995, p. 697-699. The adder tree was, in fact, only a precursor to the actual compression structures-5, but it provides a very intuitive way of understanding the purpose of the secondary order shown in Figure 1. The absolute values of the pixel differences between the current macroblock and the candidate macroblock are fed in pairs to the first row adder 301 and the third row adder 303 to the output adder 304. Each of the adder 301, 302, 303 and 304 10 also receives the selected correction. The sum tree output consists of the final sum 'SAD0..15 and the remaining correction bit C15.

Kuvassa 4 on esitetty julkaisusta US 5 864 372 tunnettu kompressointirakenne, tässä yksinkertaistettuna versiona, joka vastaanottaa tuloinaan vain neljä eron it-15 seisarvoa ’ABS0 - 'ABS3 ja vastaavat korjausbitit CO - C3. Kompressointiraken-teeseen 401 kuuluu joukko kaskadoituja 4:2 -kompressointiyksikköpareja, joista yksi nähdään osittaisessa suurennoksessa 402, Mentäessä kohti eniten merkitseviä bittejä vastaan tulee raja, jonka jälkeen kompressointiyksikköparit korvataan jonkin verran yksinkertaisemmilla piireillä. Kukin mainituista kaskadoiduista 4:2 20 -kompressointiyksikköpareista ja mainituista yksinkertaisemmista piireistä muodostaa yhden bitin SAD S -vektoriin, joka on summavektori, sekä yhden bitin SAD_C -vektoriin, joka on muistinumerovektori. Summavektori SAD_S ja muisti-[;*;* numerovektori SAD_C tallennetaan tilapäisesti lähtörekistereihin 404 ja 403 ja : *.** syötetään takaisin lisätietoina kompressointirakenteelle 401.Figure 4 illustrates a compression structure known from US 5,864,372, here in a simplified version that receives as input only four difference it-15 standing values 'ABS0 to' ABS3 and the corresponding correction bits CO-C3. Compression structure 401 includes a plurality of cascaded 4: 2 compression unit pairs, one of which is seen at partial magnification 402, Going toward the most significant bits comes after which the compression unit pairs are somewhat replaced by simpler circuits. Each of said cascaded 4: 2 20 compression unit pairs and said simpler circuits forms one bit in the SAD S vector, which is a sum vector, and one bit in the SAD_C vector, which is a memory number vector. The sum vector SAD_S and memory - [; *; * the number vector SAD_C are temporarily stored in the output registers 404 and 403 and: *. ** is fed back as additional information to the compression structure 401.

• · · • · " : 25 Kuva 5 esittää julkaisussa US 5 838 392 ehdotettua minimi-SAD-määrittäjää. Tä- .·*·,' män ratkaisun eräänä ominaisuutena on, että se ei varsinaisesti koskaan laske to- • · dellisia SAD-arvoja, vaan pelkästään käsittelee summavektoreita (SAD_S -vekto- , reita) ja muistinumerovektoreita (SAD_C -vektoreita) erikseen. Näiden minimiarvot • · · tallennetaan summarekisteriin 501 ja vastaavasti muistinumerorekisteriin 502. 4:2 • # *·;·* 30 kompressointiyksikön 503 ja summaimen 504 yhdistelmä suorittaa kahden komp- lementti -vähennyslaskun nykyisen ja minimi-SAD:n välillä, jolloin summaimen 504 ·:··· lähtöön saatu muistinumerobitti ilmaisee vähennyslaskun tuloksen etumerkin. Oi- . \ kean vertailuhetken määrittämiseksi tarvitaan ohjaustulo. Jos muistinumerobitti • · * “j/ summaimelta 504 on'0'vertailuhetkellä, nykyinen SAD-arvo on pienempi kuin ai- ’...· 35 emmin tallennettu minimi-SAD, joten nykyinen SAD tallennetaan rekistereihin 501 ja 502. Invertteri 505 ja kaksi logiikkaporttia 506 ja 507 huolehtivat asianmukaisesti 4 117956 tallennusprosessin ohjauksesta. Lisäksi tallennetaan liikevektorin syöttödata (esitetty kuvassa 5 merkinnällä (k,I)) rekisteriin 508. Muistinumerobitin arvo '1' vertai-luhetkellä tarkoittaa, että nykyinen SAD on ylittänyt minimi-SAD:n. Tällaista tilannetta ilmaisemaan on käytettävissä ilmoitus "end_tag", ja sitä voidaan käyttää 5 esimerkiksi päättämään SAD-laskenta ennenaikaisesti.Figure 5 shows the minimum SAD descriptor proposed in US 5,838,392. One feature of this solution is that it never actually calculates the actual SAD. values, but only handles sum vectors (SAD_S vectors, paths) and memory number vectors (SAD_C vectors) separately, the minimum values of which are · · · stored in summary register 501 and memory number register 502. 4: 2 • # * ·; · * 30 compression units and the combination of adder 504 performs two complement subtraction between the current and minimum SAD, whereby the memory bit obtained at the output of adder 504 ·: ··· indicates the sign of the result of the subtraction. “J / of adder 504 is '0 'at the time of comparison, the current SAD value is smaller than the time before ... · 35 emmin stored minimum SAD, so current SAD stores The inverter 505 and the two logic ports 506 and 507 properly control the 4 117956 storage process. In addition, motion vector input data (represented by (k, I) in Figure 5) is stored in register 508. A memory bit value of '1' at the reference instant indicates that the current SAD has exceeded the minimum SAD. An "end_tag" statement is available to indicate such a situation and can be used, for example, to terminate the SAD calculation prematurely.

Tunnettujen ratkaisujen haittapuolia ovat mm. liiallinen mikropiiritilan käyttö, las-kentaviiveet sekä ei-optimaalinen kokonaissuorituskyky.The disadvantages of the known solutions include: excessive use of integrated circuit mode, counting delays, and sub-optimal overall performance.

Esillä olevan keksinnön tavoitteena on toteuttaa piiri ja menetelmä SAD:n laskemiseksi ja minimi-SAD:n löytämiseksi parhaalla suorituskyvyllä ja pinta-alatehokkuu-10 della.It is an object of the present invention to provide a circuit and method for calculating the SAD and finding the minimum SAD with the best performance and area efficiency.

Keksinnön tavoite saavutetaan rakenteella, jossa korjausbitit otetaan erityiskäsittelyyn melko varhain laskentaprosessissa, edullisimmin jo eroitseisarvoyksiköistä, ja vain yhdistetään erojen itseisarvojen käsittelyyn melko valikoivasti kompressointi-vaiheessa.The object of the invention is achieved by a structure in which the correction bits are subjected to special processing quite early in the computation process, most preferably already from the difference values units, and only combined with the processing of the absolute values of the differences in the compression step.

15 Keksinnön mukaiselle piirille on tunnusomaista se, mitä on esitetty piiriin kohdistuvan itsenäisen patenttivaatimuksen tunnusmerkkiosassa.The circuit of the invention is characterized by what is disclosed in the characterizing part of the independent claim on the circuit.

Keksinnön mukaiselle menetelmälle on tunnusomaista se, mitä on esitetty menetelmään kohdistuvan itsenäisen patenttivaatimuksen tunnusmerkkiosassa.The method according to the invention is characterized by what is stated in the characterizing part of the independent claim to the method.

* · * · · *** * .·.·*. Lisäksi keksintö kohdistuu digitaaliseen videokooderiin, jolle on tunnusomaista se, « « « 20 mitä on esitetty digitaaliseen videokoodauslaitteeseen kohdistuvan itsenäisen pa-tenttivaatimuksen tunnusmerkkiosassa.* · * · · *** *. ·. · *. Further, the invention relates to a digital video encoder characterized by the features of the independent claim of a digital video encoding device.

• · • · ···• · • · ···

Keksinnön ensimmäinen aspekti on edistynyt eroitseisarvoyksikkö, jossa käyte- :***; tään yhdelle operandeista yhden komplementti -esitystä, eikä kahden komple- mentti -esitystä, kuten oli tavallista aiemmissa ratkaisuissa. Eroitseisarvoyksikössä . .·. 25 muodostetaan muistinumerobitti, jota käytetään tarpeen mukaan biteittäin inver- .·**. toimaan tulos, mutta matalan suorituskyvyn "end-around" muistinumeroketjua eikä ♦ · mitään jatkuvaa T-muistinumerosyöttöä tarvita. Muistinumerobitti muodostaa myös korjausbitin. Kahdessa vierekkäisessä eroitseisarvoyksikössä muodostetut *"*: korjausbitit voidaan yhdistää puolisummaimessa tai vastaavassa piiriyksikössä jo : *·. 30 ennen niiden viemistä kolmiasteisen pipeline-prosessin toiseen asteeseen.A first aspect of the invention is an advanced difference value unit employing: ***; one of the operands is a one-complement representation, not a two-complement representation, as was usual in previous solutions. In the unit of difference. . ·. 25 generates a memory number bit, which is used bitwise inverter as needed. **. to work, but a low-performance "end-around" memory chain and ♦ · no need for continuous T-memory input. The memory number bit also forms a correction bit. * "*: Correction bits formed in two adjacent difference value units can already be combined in a half-adder or similar circuit unit: * ·. 30 before being exported to the second stage of the three-stage pipeline process.

• ♦ · *·· · • · ·• ♦ · * ·· · · · ·

Keksinnön toinen aspekti on toinen aste, so. kompressointirakenne, jossa korjaus-bittejä aluksi käsitellään korjaamattomien eroitseisarvojen käsittelyyn käytettävästä 5 117956 pääkompressointipuusta erillisessä korjauspuussa. Kompressointirakenteen eräs edullinen kokoonpano perustuu CSA-summainten (carry save adder) käyttöön, joita voidaan allokoida ja kytkeä eri tavoin. Eräässä edullisessa suoritusmuodossa käytetään CSA-summaimia kuudella tasolla pääkompressointipuussa ja kolmella 5 tasolla korjauspuussa, jolloin korjauspuun toiselta ja kolmannelta tasolta viedään kompressoituja korjaustietoja pääkompressointipuun kolmannen, neljännen, viidennen ja kuudennen tason CSA-summaimien asianomaisiin tuloihin. Strategia, jonka mukaan kompressoidut korjaustiedot yhdistetään pääkompressointipuuhun, muotoillaan edullisimmin siten, että lopullisissa summa- ja muistinumerovektoreis-10 sa, jotka saadaan lähtöinä kompressointirakenteesta, on mahdollisimman vähän yhteisiä bittipaikkoja.Another aspect of the invention is a second degree, i. a compression structure where the correction bits are initially processed from a 5 117956 main compression tree used to process uncorrected difference values in a separate repair tree. A preferred configuration of the compression structure is based on the use of CSAs (carry save adder) which can be allocated and connected in different ways. In a preferred embodiment, the CSA adder is used at six levels of the main compression tree and three 5 levels of the repair tree, whereby compressed correction information is applied from the second and third levels of the repair tree to the respective inputs of the third, fourth, fifth, and sixth level CSA The strategy of combining the compressed correction data with the main compression tree is most preferably formulated so that the final sum and memory number vectors obtained from the compression structure have as few common bit positions as possible.

Keksinnön kolmas aspekti on kolmas aste, so. minimi-SAD-määritysyksikkö, jossa voidaan valinnan mukaan toteuttaa erilaisia ennenaikaisen lopetuksen mekanismeja. Näitä ovat mm. nykyisen SAD:n arvon vertaaminen aiempaan minimi-SAD-15 arvoon jo laskennan aikana, kun nykyisen SAD.n laskenta on vielä kesken, jolloin aiemman minimin ylittäminen päättää laskennan ennen sen valmistumista. Lisäksi ensimmäisen, liikettä sisältämättömän SAD:n arvoon voidaan lisätä arvoa pienentävä negatiivinen bonus, jolloin on tilastollisesti todennäköisempää, että liikevekto-ria ei tarvitse lähettää lainkaan. Kolmantena minimi-SAD-määritysyksikön toimin-20 taan vaikuttavana mekanismina esitellään kynnysarvojen käsite. Voidaan asettaa . yksi tai useampia kynnysarvoja SAD-laskennalle siten, että aina kun nykyinen las- kettu SAD ylittää kynnysarvon, tehdään tietty kynnysarvoon sidottu päätös nykyi- • * « sen SAD-laskennan jatkamisesta. Toisin kuin ensin mainitussa mekanismissa, ί V kynnysarvo on parametriarvo eikä minkään aiemman SAD-laskennan tulos.The third aspect of the invention is a third step, i.e.. a minimum SAD assay unit that can optionally implement various early termination mechanisms. These include e.g. comparing the current SAD value with the previous minimum SAD-15 value during the computation while the current SAD is still being computed, when exceeding the previous minimum completes the computation before it is completed. In addition, a negative downward bonus may be added to the value of the first non-motion SAD, making it statistically more likely that no motion vector will be transmitted at all. As a third mechanism affecting the operation of the minimum SAD assay unit, the concept of thresholds is introduced. Can be set. one or more thresholds for the SAD calculation such that whenever the current calculated SAD exceeds the threshold, a certain threshold-related decision is made to continue the current SAD calculation. Unlike the first mechanism, the ίV threshold is a parameter value and not the result of any previous SAD calculation.

··♦ • t * * • · · : .·. 25 Tässä patenttihakemuksessa esimerkkeinä esitettyjen suoritusmuotojen ei pidä !···) tulkita asettavan rajoituksia oheisten patenttivaatimusten sovellettavuudelle. Ver- • · biä "käsittää" on käytetty tässä patenttihakemuksessa avoimena rajoitteena, joka , ei sulje pois tässä mainitsemattomia ominaisuuksia. Epäitsenäisissä patenttivaa- • · · *;j·* timuksissa mainittuja ominaisuuksia voidaan vapaasti yhdistellä, ellei nimenomaan *···*’ 30 toisin ole mainittu.·· ♦ • t * * • · ·:. ·. The exemplary embodiments of this application are not to be construed as limiting the applicability of the appended claims. The term "comprising" is used in this patent application as an open limitation which, however, does not exclude features not mentioned herein. The features mentioned in the dependent claims are free to be combined unless specifically stated otherwise.

*♦· • · • · **’. Keksinnölle tunnusomaisina pidetyt uudet ominaisuudet on esitetty yksityiskohtai sesti oheisissa patenttivaatimuksissa. Keksintöä itseään, sen rakennetta ja toimin-: taperiaatetta, sekä sen lisätavoitteita ja -etuja on kuitenkin selostettu seuraavassa :***: eräiden suoritusmuotojen avulla ja viitaten oheisiin piirustuksiin.* ♦ · • · • · ** '. The novel features considered to be characteristic of the invention are set forth in detail in the appended claims. However, the invention itself, its structure and operating principle, and further objects and advantages thereof, are described below: ***: by means of some embodiments and with reference to the accompanying drawings.

* * * .* * *.

6 1179566 117956

Kuva 1 esittää tunnettua periaatetta SAD-laskennan kolmesta asteesta, kuva 2 esittää tunnettua Chenin eroitseisarvoyksikköä, kuva 3 esittää tavanomaista summauspuuta, kuva 4 esittää tunnettua kompressointirakenteen käsitettä, 5 kuva 5 esittää tunnettua Chenin minimi-SAD-määritysyksikköä, kuva 6 esittää keksinnön erään suoritusmuodon mukaista SAD-laskentapiiriä ja -menetelmää.Figure 1 shows a known principle of three steps of SAD calculation, Figure 2 shows a known Chen difference unit, Figure 3 a conventional summation tree, Figure 4 a known concept of compression structure, Figure 5 illustrates a known Chen minimum SAD determination unit, Figure 6 illustrates an embodiment of the invention SAD calculation circuit and method.

kuva 7 esittää keksinnön erään suoritusmuodon mukaista eroitseisarvoyksik-köparia, 10 kuva 8 esittää keksinnön erään suoritusmuodon mukaisen kompressointirakenteen osaa, kuva 9 esittää keksinnön erään suoritusmuodon mukaisen kompressointiraken teen toista osaa, kuvat 10a - 10d esittävät kuvien 8 ja 9 mukaisten epäsäännöllisten summainyksi-15 köiden koostumusta, kuva 11 esittää keksinnön erään yksinkertaisen suoritusmuodon mukaista mini-mi-SAD-määrittäjää, kuva 12 esittää keksinnön erään monipuolisemman suoritusmuodon mukaista minimi-SAD-määrittäjää, 20 kuva 13 esittää kuvissa 7, 8, 9 ja 12 esitettyihin suoritusmuotoihin sovitettua kuvan 6 mukaista SAD-laskentapiiriä ja -menetelmää, kuva 14 esittää eri SAD-laskentatoteutusten viive-ja pinta-alatarvevertailua ja • · kuva 15 esittää keksinnön erään suoritusmuodon mukaista videokooderia.Fig. 7 shows a part of a difference value unit according to an embodiment of the invention; 10 Fig. 8 shows a part of a compression structure according to an embodiment of the invention; Fig. 9 shows another part of a compression structure according to an embodiment; Fig. 11 shows a mini-mi-SAD detector according to a simple embodiment of the invention; Fig. 12 shows a minimum-SAD detector according to a more versatile embodiment of the invention; Fig. 13 shows a SAD according to Fig. 6 adapted to the embodiments shown in Figs. 14 and 14 illustrate a delay and area comparison of different SAD calculations and a video encoder according to an embodiment of the present invention.

• · « « * • · · • · ·• · «« * • · · • · ·

Kuva 6 on kaavamainen esitys, joka esittää SAD:n toteuttavaa liikkeen estimointi-25 piiriä ja -menetelmää käytettäväksi osana digitaalista videokoodausta. Johdonmu- • · · kaisesti aiempien nimitysten kanssa oletetaan, että on olemassa nykyinen makro- lohko, jolle tulisi muodostaa liike-estimaatti, sekä joukko ehdokasmakrolohkoja, joiden joukosta tulisi tunnistaa sopivin, so. periaatteessa se, joka eroaa vähiten • .·. nykyisestä makrolohkosta. Tämän peruskriteerin puhtaasta soveltamisesta voi- • · · .·*·. 30 daan hienoisesti poiketa esimerkiksi käyttämällä "nollabonusta", joka suosii jotakin ehdokasmakrolohkoista, mutta näihin yksityiskohtiin mennään tuonnempana.Figure 6 is a schematic representation showing motion estimation circuitry and method implementing SAD for use as part of digital video encoding. Consistently with the previous designations, it is assumed that there is a current macroblock for which a motion estimate should be formed, as well as a number of candidate macroblock blocks from which the most appropriate one should be identified, i. basically the one that least differs •. from the current macroblock. The pure application of this basic criterion can • • ·. · * ·. There is a slight deviation, for example, by using a "zero bonus" which favors one of the candidate macroblock blocks, but these details will be discussed below.

• · • ♦• · • ♦

• M• M

·:··: Ensimmäinen aste 601 koostuu eroitseisarvoyksiköistä 602, joista kukin on järjes- . *, tetty määrittämään kahden monibittisen syöttöarvon välisen eron itseisarvo. Eroit- • * · seisarvoyksiköitä 602 on yhtä monta kuin on nykyisen makrolohkon ja ehdokas-35 makrolohkojen samanaikaisesti vertailtavissa olevia pikseleitä. Kuvassa 6 on tehty yleinen oletus, että vertailtavat makrolohkot ovat kooltaan 16x16 pikseliä ja että 7 117956 yhtä pikseliriviä (tai -saraketta) kustakin makrolohkosta vertaillaan kerrallaan. Näin ollen eroitseisarvoyksiköitä 602 on kuusitoista, joista vain neljä on erikseen esitetty kuvassa. Oletetaan myös, että pikseliarvot esitetään kahdeksalla bitillä, jolloin kukin eroitseisarvoyksiköistä 602 vastaanottaa kaksi kahdeksanbittistä syöttöarvoa ja 5 muodostaa kahdeksanbittisen korjaamattoman lähtöarvon ja muistinumerobitin.·: ··: The first degree 601 consists of difference units 602, each of which is a sequence. *, used to determine the absolute value of the difference between two multi-bit input values. The difference * * · stability units 602 are equal to the number of pixels that can be simultaneously compared between the current macroblock and the candidate-35 macroblock. In Figure 6, it is a general assumption that the macroblocks being compared are 16x16 pixels in size and that 7,117,956 one row (or column) of pixels from each macroblock are compared at a time. Thus, there are sixteen difference value units 602, of which only four are shown separately in the figure. It is also assumed that the pixel values are represented by eight bits, whereby each of the difference value units 602 receives two eight-bit input values and 5 generates an eight-bit uncorrected output value and a memory bit.

Olisi hyvin helppoa muuttaa eroitseisarvoyksiköiden määrää ja/tai bittitulojen määrää kussakin eroitseisarvoyksikössä, jos käytettäisiin joitain muita suunnittelupa-rametreja, tai valita vertailtavat pikselit jonkin muun valintastrategian mukaisesti kuin yksinkertaisesti rivi tai sarake kerrallaan.It would be very easy to change the number of difference value units and / or bit inputs in each difference value unit if other design permission parameters were used, or to select the pixels to be compared according to a selection strategy other than simply row or column at a time.

10 Toinen aste 611 on kompressointirakenne, joka vastaanottaa korjaamattomat lähtöarvot ja korjausbitit (tai yleisemmin "korjausarvot" tai vain "korjaukset") ensimmäisen asteen 601 eroitseisarvoyksiköiltä 602. Toisin kuin tunnetussa Ohenin10 The second stage 611 is a compression structure that receives the uncorrected output values and the correction bits (or more generally "correction values" or only "corrections") from the differential equation units 602 of the first order 601.

IIII

kompressointirakenteessa, toiseen asteeseen kuuluu pääkompressointipuu 612 ja 11 erillinen korjauspuu 613. Edellinen on järjestetty vastaanottamaan korjaamattomat 15 eroitseisarvot ja toisen asteen aiemmin kompressoimat summa- ja muistinumero-vektorit takaisinkytkentänä. Korjauspuu on järjestetty vastaanottamaan korjaukset. Korjausbittien kompressoiminen korjauspuussa 613 synnyttää joukon kompressoituja korjausarvoja, jotka syötetään sopiviin kohtiin pääkompressointipuussa 612.in the compression structure, the second stage includes a main compression tree 612 and a separate repair tree 613. The former is arranged to receive the uncorrected difference values and the second order previously compressed sum and memory number vectors as feedback. The repair tree is arranged to receive repairs. Compressing the correction bits in the correction tree 613 generates a plurality of compressed correction values that are input at appropriate positions in the main compression tree 612.

Tämän tuloksena toinen aste 611 muodostaa summa-ja muistinumerovektoreita, 20 jotka yhdessä ilmaisevat erojen itseisarvojen summan siinä pikselirivissä (tai -sa- ^ \v rakkeessa tai -ryhmässä) jota vertailtiin samanaikaisesti. Nämä viedään kolman- • # ·.·*.·* teen asteeseen 621, joka koostuu minimi-SAD-määrittäjästä. Se kumuloi rivikoh- :*·*; täisiä (tai sarakekohtaisia tai ryhmäkohtaisia) SAD-arvoja, vertaa kumuloitunutta ·'**; summaa minimiarvoon sopivalla hetkellä ja muodostaa lähtöarvoja, joiden joukos- ··· : .·. 25 sa on löydetty minimi-SAD. Kolmas aste 621 vastaanottaa myös eräitä ohjaustulo- .*··! ja ja muodostaa joitakin muita lähtöjä, joita voidaan käyttää esimerkiksi toteutta maan ennenaikaisen lopetuksen mekanismeja erillisen ohjauselimen (ei esitetty , kuvassa 6) ohjaamana.As a result, the second stage 611 generates sum and memory number vectors which together express the sum of the absolute values of the differences in the pixel row (or array or group) that were simultaneously compared. These are exported to third degree # #. · *. · * Tea stage 621, which consists of a minimum SAD descriptor. It cumulates line item-: * · *; full (or column specific or group specific) SAD values, compare cumulative · '**; sum to the minimum value at a convenient moment and generate output values with · ···:. ·. 25 sa is the minimum SAD found. Third stage 621 also receives some control input. * ··! and generates some other outputs which may be used, for example, to implement the mechanisms of premature termination of the earth controlled by a separate control member (not shown in Figure 6).

• · ϊ • * ·• · ϊ • * ·

Kuva 7 esittää erästä edullista järjestelyä ensimmäisen asteen kahdesta vierek- .·*·. 30 käisestä eroitseisarvoyksiköstä 602, Kuvassa nähdyillä eroitseisarvoyksiköillä * · muodostetaan 0:nnen ja 1:sen pikseliparin erojen itseisarvot. Kunkin eroitseisar-voyksikön 602 perusratkaisu muistuttaa tunnettua Chenin ratkaisua (ks. kuva 2), · ·.· · mutta yksityiskohdissa on merkittäviä eroavuuksia. Kuvassa oikealla olevassa eroitseisarvoyksikössä on 8-bittinen summain 711, joka käsittelee operandia A 35 (a7...a0) ja invertoitua operandia B (b7...bo, esitetty ennen bitti-invertointia). Kah deksanbittinen summain 711 käyttää kahden komplementti -esityksen asemesta 8 117956 operandille B yhteenlaskussa yhden komplementti -esitystä, so. S = A + B . Jos operandi A oli pienempi (tai yhtä suuri) kuin operandi B, syntynyt negatiivinen tulos S (S7...S0) bitti-invertoidaan XOR-porteissa 712 oikean positiivisen arvon saamiseksi, eikä lisäkorjausbittiä tarvita (C7 = Ό'). Jos kuitenkin operandi A oli suurempi 5 kuin operandi B, 8-bittinen summain 711 muodostaa oikean positiivisen tuloksen S miinus yksi. Koska tarvittava "end-around" muistinumeron lisäys ei muuta lähtevää muistinumerotietoa (C7), välitön "end-around" muistinumeron lisäys on tarpeeton tässä tapauksessa. Näin ollen summaimen 711 muodostama muistinumerobitti (C7 = '1') siirretään eteenpäin korjausbittinä ja yksibittinen virheenkorjaus suoritetaan 10 myöhemmin.Figure 7 shows a preferred arrangement of two adjacent first order · * ·. Of the 30-handed difference value units 602, the difference values shown in the figure * · form absolute values of the differences between the 0 and the 1 pixel pair. The basic design of each differential cavity unit 602 is similar to the known Chen solution (see Figure 2), but with significant differences in detail. In the figure to the right, the difference value unit has an 8-bit adder 711 that processes operand A 35 (a7 ... a0) and inverted operand B (b7 ... bo, shown before bit inverting). Instead of two complement representations, the 8-byte bit adder 711 uses 8 117956 for operand B to add a single complement representation, i.e.. S = A + B. If operand A was smaller (or equal) than operand B, the resulting negative result S (S7 ... S0) is bit-inverted at XOR gates 712 to obtain the correct positive value, and no additional correction bit is needed (C7 = Ό '). However, if operand A was greater than 5 than operand B, the 8-bit adder 711 would produce the correct positive result S minus one. Since the required "end-around" number insertion does not change the outgoing memory number information (C7), an immediate "end-around" number insertion is unnecessary in this case. Thus, the memory number bit (C7 = '1') formed by adder 711 is forwarded as a correction bit and a one-bit error correction is performed 10 later.

Eroitseisarvoyksiköiden 602 toimintaa kuvaavat kaavat ovat seuraavat:The formulas describing the operation of the differential value units 602 are as follows:

,ABs-iA + ^ = A + (28-1"B)s(A-B~1)mod(28) jaC = l ,A>B [ A + B = 28-1~(a + (28-1-B))=B-A jaC = 0 ,A<B, ABs-iA + ^ = A + (28-1 "B) s (AB ~ 1) mod (28) jaC = 1, A> B [A + B = 28-1 ~ {a + (28-1- B)) = BA and C = 0, A <B

Näin ollen summaimen muistinumerotulo voidaan kokonaan eliminoida ilman mi-15 tään laitteistokustannuksia. Tämä on merkittävä ero Chenin menetelmään, jossa tarvittiin jatkuva '1' muistinumerosyöttö.Thus, the memory number input of the adder can be completely eliminated without any hardware cost. This is a significant difference to Chen's method, which required continuous input of '1' memory numbers.

Toinen ero kaikkiin tunnettuihin menetelmiin verrattuna on, että eroitseisarvoyksik-kö 602 mahdollistaa välittömän korjausbittien kumuloinnin ilman mitään lisäviivet-: Y: tä. Huomattakoon että vasemmanpuoleinen eroitseisarvoyksikkö on täysin saman- 20 lainen kuin yllä kuvattu; muistinumerobittilähtö on siinä vain piirretty 8-bittisen * · summaimen 701 vastakkaiselle puolelle, jotta helpommin voidaan havainnollistaa !···. välittömän korjausbittikumuloinnin periaatetta. Vasemmanpuoleisen 8-bittisen * * summaimen 701 korjausbitti ja oikeanpuoleisen 8-bittisen summaimen 711 korja- \\\: usbitti viedään, invertoimatta, tuloina CO ja C1 puolisummaimeen 721, joka muo- *···* 25 dostaa kaksibittisen korjausbittivektorin C0..1. Kun kaikki eroitseisarvoyksiköt 16- kanavaisessa ensimmäisessä asteessa ovat vastaavasti pareina, niin ensimmäi- sen asteen lähtö koostuu kuudestatoista 8-bittisestä korjaamattomasta eroitseisar- *...· vosta'ABS0,'ABS1,...,’ABS15 sekä kahdeksasta 2-bittisestä korjausbittivektorista .···. C0..1, C2..3,...,C14..15. Puolisummaimien käyttö korjausbittivektorien muodosta- ,"]j 30 miseksi hävittää tiedon kunkin yksittäisen korjausbitin tarkasta arvosta, mutta tämä • * . ei vaikuta haitallisesti jatkoprosessointiin; päinvastoin, muodostetut 2-bittiset kor- • · : jausbittivektorit soveltuvat paremmin käytettäväksi keksinnön suoritusmuodon mu- kaisessa kompressointirakenteessa.Another difference from all known methods is that the difference value unit 602 allows for the immediate accumulation of correction bits without any additional delay-: Y. Note that the left-hand difference unit is exactly the same as described above; the memory bit output is only drawn on the opposite side of the 8-bit * · adder 701 for easier illustration! ···. the principle of immediate correction bit accumulation. The correction bit of the left-hand 8-bit * * adder 701 and the right-hand bit of the 8-bit adder 711 are inverted, without inversion, as inputs CO and C1 to the half-adder 721, which converts * ··· * 25 to the two-bit correction bit vector C0..1 . When all the difference value units in the 16-channel primary stage are respectively paired, the first-order output consists of sixteen 8-bit uncorrected difference-state * ... · vectors'ABS0, 'ABS1, ...,' ABS15 and eight 2-bit correction bit vectors . ···. C0..1, C2..3, ..., C14..15. Using half-sumers to form correction bit vectors erases information about the exact value of each individual correction bit, but this does not adversely affect further processing; on the contrary, generated 2-bit correction vectors are better suited for use in the compression according to an embodiment of the invention. .

. 9 117956. 9 117956

Kuvat 8 ja 9 esittävät pääkompressointipuun ja korjauspuun edullisia suoritusmuotoja. Kuvan 8 pääkompressointipuu vastaanottaa uusia syöttötietoja ('ABSO, 'ABS1,..., 'ABS 15; C0..1) ensimmäisen asteen eroitseisarvoyksiköiltä sekä ta-kaisinkytkentätietoja (SAD_S(pre), SAD_C(pre)) omilta lähtörekistereiltään (ei esitetty 5 kuvassa). Lisäksi on INIT -tulo takaisinkytkentäarvojen nollaamiseksi uuden SAD-laskennan alussa. Takaisinkytkentäsummavektorissa SAD_S(pre) on 16 bittiä, joita alustuksen mahdollistavan porttirivin jälkeen merkitään ssis, ssi4,...,sso tai lyhyesti vain SS15..0- Syistä, joita käsitellään tarkemmin tuonnempana, takaisinkytkentä-muistinumerovektorissa SAD_C(pre) on kolme bittiä vähemmän, so. vain 13 bittiä, 10 joita vastaavasti alustusporttien jälkeen merkitään SC15, sci4,...,sc3 tai SC15..3. Huomattakoon että kaikki alaindeksinumerot, kuten esimerkiksi merkinnässä "SC15..3". viittaavat vastaaviin bittipaikkoihin. Kaikissa tapauksissa, joissa merkintä viittaa useampiin bittipaikkoihin, nämä on annettu alaindeksissä eniten merkitsevästä vähiten merkitsevään bittipaikkaan. Normaalikokoiset numerot korjausbittivektori-15 merkinnöissä, kuten CO..1 tai C12..13, viittaavat puolestaan vain niihin eroitseisar-voyksiköihin, joista korjausbitit otettiin kyseisen korjausbittivektorin muodostaneeseen puolisummaimeen.Figures 8 and 9 show preferred embodiments of the main compression and repair wood. The main compression tree of Figure 8 receives new input data ('ABSO,' ABS1, ..., 'ABS 15; C0..1) from first order difference value units and feedback data (SAD_S (pre), SAD_C (pre)) from its own output registers (not shown 5). the figure). In addition, there is an INIT input to reset the feedback values at the beginning of a new SAD calculation. The feedback sum vector SAD_S (pre) has 16 bits which, after the port line that allows formatting, are denoted ssis, ssi4, ..., sso, or briefly just SS15..0- For reasons discussed later, the feedback memory vector SAD_C (pre) has three bits less , ie. only 13 bits, 10 respectively after the initialization ports SC15, sci4, ..., sc3 or SC15..3. Note that all sub-index numbers, such as "SC15..3". refer to the corresponding bit positions. In all cases where an entry refers to multiple bit positions, these are given in the subscript from the most significant to the least significant bit position. Normal-sized numbers in the correction bit vector-15 entries, such as CO..1 or C12..13, refer only to those differential value units from which the correction bits were included in the half-adder that formed that correction bit vector.

Kuvassa 8 CSA-summaimien tulo- ja lähtöbittileveydet on merkitty periaatteella MSB...LSB (eniten merkitsevä bitti ... vähiten merkitsevä bitti). Kaikissa CSA- 20 summaimissa vasemmanpuoleinen lähtö muodostaa osittaiset muistinumerot ja . . oikeanpuoleinen lähtö muodostaa osittaisen summan. Useimmilla CSA-sum- • ♦ · ***;’ maimilla on kolme tuloa, paitsi yhdellä kaksituloisella CSA-summaimella korjaus- *·1·· puussa. CSA-summaimet 801, 802, 803 ja 804, joilla on poikkeuksellinen tulojen ·· ♦ : V leveys, on merkitty asteriskilla ja niiden rakenne on selitetty tavallisten CSA-sum- ·«· :...ϊ 25 maimien ja porttirakenteiden avulla kuvissa 10a, 10b, 10c ja 10d. Näissä merki- tyissä CSA-summaimissa tulobittiasteet, jotka vastaanottavat bittejä kaikilta kol-meitä tulolta, on toteutettu kokosummaimilla, kun taas bittiasteet, joihin liittyy vain ··· kaksi tuloa, on toteutettu puolisummaimilla. Merkitsemättömät kolmen tulon CSA- . summaimet on kokonaan toteutettu kokosummaimilla.In Figure 8, the input and output bit widths of the CSA adder are denoted by the MSB ... LSB (most significant bit ... least significant bit) principle. In all CSA-20 adderers, the left output generates partial memory numbers and. . the right output represents a partial sum. Most CSA summer • ♦ · ***; 's have three inputs except for one dual-input CSA sumer in the correction * · 1 ·· tree. The CSA adder 801, 802, 803, and 804 having an exceptional input ·· ♦: V width are marked with an asterisk and their structure is explained by the usual CSA sum · · · · ... ... ϊ 25 in Figures 10a. , 10b, 10c and 10d. In these marked CSA adder, the input bit stages that receive bits from all three inputs are implemented by size aggregators, while the bit stages which have only ··· two inputs are implemented by the semi-adder. Unmarked three-input CSA-. the adder is completely implemented by size adder.

* · · • · · 30 Pääkompressointipuun ylimmällä tasolla on kuusi CSA-summainta, joilla on seu-raavat tulot ja lähdöt: ·♦· ••••e·: • · ♦ • · 4 « · • « · * 1 · · 1 · • · • · • · «* The top level of the main compression tree has six CSA admitters with the following inputs and outputs: 4 4 4 4 4 4 4 1 1 1 1 1 1 1 ·••••• ««

Taulukko 1: CSA:t numeroituna vasemmalta oikealle 10 117956 _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA 1 ‘bitit 7..0: ‘bitit 7..3: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0Table 1: CSAs numbered from left to right 10 117956 _Leather Input Middle Input Right Output Right Output CSA 1 'bits 7..0:' bits 7..3: 'bits 7..0:' bits 8..1 ' bits 7..0

SS7..0 SC7..3 'ABSOSS7..0 SC7..3 'ABSO

* bitti 2: vakio'0' ‘bitit 1..0: __C0..1_______ CSA 2 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 'ABS1 'ABS2 'AB S 3___ CSA 3 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 _'ABS4 'ABS5 'ABS6___ CSA 4 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 _'AB S 7 'ABS8 'ABS9___ CSA 5 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 _'ABS 10 ’ABS 11 'ABS 12___ CSA 6 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 _['ABS13 'ABS14 |'ABS15 __ Pääkompressointipuun toiseksi ylimmällä tasolla on neljä CSA-summainta, joilla on seuraavat tulot ja lähdöt: .. 5 Taulukko 2: CSA:t numeroituna vasemmalta oikealle • » » ♦ · ♦ * · _ _____ :V: _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA 1 ‘bitit 8..1: ‘bitit 8..1: ‘bitit 8..1: ‘bitit 9..2 ‘bitit 8..1 ylimm. tason ylimm. tason ylimm. tason **··* CSAV.nva- CSA2:nva- CSA 3:n va- * · : _sen lähtö sen lähtö sen lähtö_____ C'.: CSA 2 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 ylimm. tason ylimm. tason ylimm. tason . CSAI.noi- CSA2:noi- CSA3:noi- ;;;* kea lähtö kea lähtö kea lähtö___ **·:·: CSA 3 ‘bitit 8..1: ‘bitit 8..1: ‘bitit 8..1: ‘bitit 9..2 ‘bitit 8..1 :**·. ylimm. tason ylimm. tason ylimm. tason CSA4:nva- CSA5:nva- CSA6:nva- _sen lähtö sen lähtö sen lähtö____ ? CSA 4 ‘bitit 7..0: ‘bitit 7..0: ‘bitit 7..0: ‘bitit 8..1 ‘bitit 7..0 .··*. ylimm. tason ylimm. tason ylimm. tason CSA4:noi- CSA 5:n oi- CSA 6:n oi- ___kea lähtö kea lähtö kea lähtö_____ 11 117956 Pääkompressointipuun kolmannella tasolla on kolme CSA-summainta, joilla on seuraavat tulot ja lähdöt:* bit 2: constant '0 '' bits 1..0: __C0..1 _______ CSA 2 'bits 7..0:' bits 7..0: 'bits 7..0:' bits 8..1 'bits 7..0 'ABS1' ABS2 'AB S 3___ CSA 3' bits 7..0: 'bits 7..0:' bits 7..0: 'bits 8..1' bits 7..0 _'ABS4 'ABS5' ABS6___ CSA 4 'bits 7..0:' bits 7..0: 'bits 7..0:' bits 8..1 'bits 7..0 _'AB S 7' ABS8 'ABS9___ CSA 5 'bits 7..0:' bits 7..0: 'bits 7..0:' bits 8..1 'bits 7..0 _'ABS 10' ABS 11 'ABS 12___ CSA 6' bits 7.. 0: 'bits 7..0:' bits 7..0: 'bits 8..1' bits 7..0 _ ['ABS13' ABS14 | 'ABS15 __ The second top level of the main compression tree has four CSA adder the following inputs and outputs: .. 5 Table 2: CSAs numbered from left to right • »» ♦ · ♦ * · _ _____: V: _Leather Input Middle Input Right Output Right Output CSA 1 'bits 8..1:' bits 8..1: 'bits 8..1:' bits 9..2 'bits 8..1 top top level top level level ** ·· * CSAV.nva- CSA2: va- CSA 3 va- * · _ output its output its output _____ C '.: CSA 2' bits 7..0: 'bits 7..0:' bits 7..0: 'bits 8..1' bits 7..0 max. top level top level level. CSAI.noi- CSA2: noi- CSA3: no ;;; * kea output kea output kea output ___ ** ·: ·: CSA 3 'bits 8..1:' bits 8..1: 'bits 8..1 : 'bits 9..2' bits 8..1: ** ·. top folder m. top level top level level CSA4: va- CSA5: va- CSA6 - output its output its output ____? CSA 4 'bits 7..0:' bits 7..0: 'bits 7..0:' bits 8..1 'bits 7..0 ·· *. top folder m. top level top level level CSA4: no- CSA 5 o- CSA 6 very good output good output good output _____ 11 117956 The third level of the main compression tree has three CSA adders with the following inputs and outputs:

Taulukko 3: CSA:t numeroituna vasemmalta oikealle _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA1 *bitit 9..2: *bitit 3..2: ‘bitit 9..2: ‘bitit 10..3 ‘bitit 9..2 toisen tason lähdöt A ja B toisen tason CSA 1 :n korjauspuusta CSA 3:n _vasen lähtö__vasen lähtö___ CSA 2 ‘bitit 8..1: ‘bitit 8..1: toi- ‘bitit 8.1: ‘bitit 9..2 ‘bitit 8..1 toisen tason sen tason toisen tason CSA 1 :n oi- CSA 2:n va- CSA 3:n oi- _kea lähtö sen lähtö kea lähtö___' CSA 3 ‘bitti 8: sse ‘bitit 8..1: toi- ‘bitti 8: sc8 ‘bitit 9..1 ‘bitit 8..0 ‘bitit 7..0: sen tason ‘bitit 7..0: toisen tason CSA 4:n va- toisen tason CSA 2:n oi- sen lähtö CSA 4:n oikea lähtö ‘bitti 0: lähtö kea lähtö C korjaus- __puusta____]___ 5 Pääkompressointipuun neljännellä tasolla on kaksi CSA-summainta, joilla on seuraavat tulot ja lähdöt: * · ***** Taulukko 4: CSA:t numeroituna vasemmalta oikealle » · « * * ·4· « r ____^ ^ _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö ;...*' CSA 1 ‘bitit 10..3: ‘bitti 10: ‘bitti 10; ‘bitit 11..3 ‘bitit 10..2 kolmannen ssio scio : tason CSA ‘bitit 9..2: ‘bitit 9..2: »«· · .··*. 1:n vasen kolmannen kolmannenTable 3: CSAs numbered from left to right _ Left Input Middle Input Right Output Right Output CSA1 * bits 9..2: * bits 3..2: 'bits 9..2:' bits 10..3 'bits 9. .2 second level outputs A and B of the second level CSA 1 correction tree CSA 3 _leaf output__leaf output__CSA 2 'bits 8..1:' bits 8..1: action 'bits 8.1:' bits 9..2 'bits 8..1 second level second level second level CSA 1 right CSA 2 right CSA 3 right output its output right output ___' CSA 3 'to bit 8' bits 8..1 : act 'bit 8: sc8' bits 9..1 'bits 8..0' bits 7..0 level 'bits 7..0: second level CSA 4 second level CSA 2 second output CSA 4 right output 'bit 0: output k output C correction __ tree ____] ___ 5 The fourth level of the main compression tree has two CSA adders with the following inputs and outputs: * · ***** Table 4: CSAs numbered from left to right »·« * * · 4 · «r ____ ^ ^ _Light Input Middle Input Right Input Left output True output; ... * 'CSA 1' bits 10..3: 'bit 10:' bit 10; 'Bits 11..3' bits 10..2 third ssio scio: level CSA 'bits 9..2:' bits 9..2: »« · ·. ·· *. 1 left third third

lähtö tason CSA tason CSAentry level CSA level CSA

‘bitti 2: lähtö 1:n oikea 2:n vasen J#·’: D korjaus- lähtö lähtö .**·. puusta_____;__ / CSA 2 ‘bitti 9: ssg ‘bitit 9..1: ‘bitti 9: sc9 ‘bitit 10..2 ‘bitit 9..1 • **: ‘bitit 8..1: kolmannen ‘bitit 8..1: • · · kolmannen tason CSA kolmannen * ·'Bit 2: output 1 right 2 left J # ·': D correction output output ** ·. tree _____; __ / CSA 2 'bit 9: ssg' bits 9..1: 'bit 9: sc9' bits 10..2 'bits 9..1 • **:' bits 8..1: third 'bits 8 ..1: • · · Third Level CSA Third * ·

. tason CSA 3:n vasen tason CSA. Level CSA 3 Left Level CSA

: 2:n oikea lähtö 3:n oikean ··« · .*··. lähtö lähdön 7 ____MSB:tä __ 12 117956 Pääkompressointipuun viidennellä tasolla on yksi CSA, jolla on seuraavat tulot ja lähdöt:: 2 right output 3 right ·· «·. * ··. output at output 7 ____ MSB __ 12 117956 The fifth level of the main compression tree has one CSA with the following inputs and outputs:

Taulukko 5: _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA ‘bitti 11: ‘bitti 11: ‘bitit 9..1: ‘bitit 12..2 ‘bitit 11..1 sen sen neljännenTable 5: _Light Input Middle Input Right Input Left Output Right Output CSA 'bit 11:' bit 11: 'bits 9..1:' bits 12..2 'bits 11..1 its fourth

‘bitit 10..2: ‘bitit 10..2: tason CSA'Bits 10..2:' bits 10..2: level CSA

neljännen neljännen 2:n oikea tason CSA tason CSA lähtö 1 :n oikea 2:n vasen lähtö lähtö ‘bitti 1: lähtö ‘bitti 1: lähtö E korjaus- F korjaus- puusta_puusta_____ 5 Pääkompressointipuun kuudennella tasolla on kaksi CSA-summainta, joilla on seuraavat tulot ja lähdöt:fourth fourth 2 right level CSA level CSA output 1 right 2 left output output 'bit 1: output' bit 1: output E correction F from correction tree_wood_____ 5 The sixth level of the main compression tree has two CSA adders having the following inputs and outputs:

Taulukko 6: CSA:t numeroituna vasemmalta oikealle _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA 1 ‘bitit 15..12: ‘bitti 12: vii- ‘bitit 15..12: ‘bitit 15..13 ‘bitit 15..12 • · ♦ • · SS15..12 dennen ta- SC15..12 : son CSA:n ··· vasemm.Table 6: CSAs numbered from left to right _ Left Input Middle Input Right Output Right Output CSA 1 'bits 15..12:' bit 12: bits 15..12: 'bits 15..13' bits 15. .12 • · ♦ • · SS15..12 dennen ta- SC15..12: son CSA ··· left.

I lähdön MSB___ _ CSA 2 ‘bitit 11..3: ‘bitit 11..2: ‘bitit 11..2: ‘bitit 12..3 ‘bitit 11..2 ··♦ neljännen viid. tason viid. tason . tason CSA:nva- CSA:n oike- aa*·'·· .···. CSAI.nva- semm. läh- an lähdön sen lähtö dön 10 10MSB:tä ‘bitti 2: lähtö LSB:tä G korjaus- : !·. puusta • aa - · — — - - — • a a « • a a aa a a *** Korjauspuun ylimmällä tasolla on kaksi CSA-summainta, joilla on seuraavat tulot ja 10 lähdöt:I output MSB___ _ CSA 2 'bits 11..3:' bits 11..2: 'bits 11..2:' bits 12..3 'bits 11..2 ·· ♦ Fourth Pt. level five. level. level CSA: correct CSA * · '··. ···. CSAI.nva- nothing. near output it output 10 10MSB 'bit 2: output LSB G correction:! ·. of the tree • aa - · - - - - - • a a «• a a aa a a *** There are two CSA sumers at the top level of the repair tree with the following inputs and 10 outputs:

Taulukko 7: CSA:t numeroituna vasemmalta oikealle 13 117956 _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA 1 ‘bitit 1..0: ‘bitit 1..0: ‘bitit 1..0: ‘bitit 2..1 ‘bitit 1..0 C2..3 C4..5 C6..7___ CSA 2 ‘bitit 1..0: ‘bitit 1..0: ‘bitit 1..0: ‘bitti 2 ‘bitit 1..0 C8..9 C10..11 C12..13 ‘bitti 1: ____lähtö E__'Table 7: CSAs numbered from left to right 13 117956 _Leather Input Middle Input Right Output Right Output CSA 1 'bits 1..0:' bits 1..0: 'bits 1..0:' bits 2..1 ' bits 1..0 C2..3 C4..5 C6..7 ___ CSA 2 'bits 1..0:' bits 1..0: 'bits 1..0:' bit 2 'bits 1..0 C8 ..9 C10..11 C12..13 'bit 1: ____ output E__'

Korjauspuun toisella tasolla on kaksi CSA-summainta, joilla on seuraavat tulot ja lähdöt: 5 Taulukko 8: CSA:t numeroituna vasemmalta oikealle _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö CSA 1 ‘bitti 2: - ‘bitti 2: ‘bitti 3: ‘bitti 2:At the second level of the repair tree, there are two CSA adders with the following inputs and outputs: 5 Table 8: CSAs numbered from left to right _Leather Input Middle Input Right Output Right Output CSA 1 'bit 2: -' bit 2: 'bit 3: 'bit 2:

ylimm. tason ylimm. tason lähtö A lähtö Btop folder m. top level level output A output B

CSA 1:n va- CSA 2:n va- semm. läh- semm. läh- _dön MSB__ don MSB_________- CSA 2 ‘bitit 1..0: ‘bitit 1 ..0: ‘bitit 1 ..0: ‘bitti 2: ‘bitti 1 ylimm. tason ylimm. tason C14..15 lähtö D ‘bitti 0:CSA 1 or CSA 2 left. closer. output MSB__ don MSB _________-CSA 2 'bits 1..0:' bits 1 ..0: 'bits 1 ..0:' bit 2: 'bit 1 max. top level output D 'bit 0 of level C14..15:

CSA1:noi- CSA 2:n oi- ‘bitti 1 lähtö CCSA1: oA bit 1 of CSA 2 output C

• · ·*.·.__kea lähtö kea lähtö• · · *. · .__ kea output kea output

• · · 11 > I | I I• · · 11> I | I I

• * • φ · • t ·• * • φ · • t ·

• I• I

.···. Korjauspuun kolmannella tasolla on yksi CSA-summain, joilla on seuraavat tulot ja !**:' j lähdöt: • · · • · · ·»» ♦ ·*.*..·’ Taulukko 9: , .·, _Vasen tulo Keskitulo Oikea tulo Vasen lähtö Oikea lähtö Ϊ·'.' CSA ‘bitti 1: ‘bitti 1: toi- ‘bitti 1: toi- ‘bitti 2: ‘bitti 1:. ···. In the third level of the repair tree, there is one CSA adder with the following inputs and! **: 'j outputs: · 9 Ta 9 9 9 9 9 Table 9:,. Right Input Left Output Right Output Ϊ · '.' CSA 'bit 1:' bit 1: act 'bit 1: act' bit 2: 'bit 1:

*:** ylimm. tason sen tason sen tason lähtö G lähtö F*: ** top level its level its level output G output F

CSA1:nva- CSA2:nva- CSA 2:n oi- *"*: semm. läh- semm. läh- kean lähdönCSA1: nva- CSA2: nva- CSA 2 oi- * "*: nothing closer. Outgoing output

• __dön LSB dön LSB MSB• __d LSB dön LSB MSB

• · · ““ io • · *** Toiminnan aikana kuvan 9 mukaisen kolmiasteisen korjauspuun muodostamat summa- ja muistinumerobitit viedään yksi kerrallaan kuvan 8 mukaisen pääkom- . 14 117956 pressointipuun sopiviin muuten käyttämättömiin CSA-tuloihin. Lihavoidut kirjainparit A - G viittaavat korjauspuun lähtöjen ja pääkompressointipuun tulojen välisiin yhteyksiin. Lisäksi kuvassa 8 on lihavoituna esitetty niiden bittiasteiden numerot, jotka vastaanottavat bitit korjauspuusta. Ehdollisesti alustettujen aiempien summa-5 ja muistinumerovektorien LSB:t SS7..0 ja SC7..3 viedään pääkompressointipuun ensimmäiseen asteeseen kuvan 8 ja taulukon 1 mukaisesti. Joukko välibittejä näistä vektoreista, nimittäin ssn..8 ja SC-11..8, viedään erikseen pääkompressointipuuhun.• · · “io • · *** During operation, the sum and memory bit bits formed by the three-stage repair tree of Figure 9 are exported one by one to the master command of Figure 8. 14 117956 for pressed wood suitable for otherwise unused CSA inputs. Bold pairs of letters A through G refer to the relationships between repair tree outputs and main compression tree inputs. In addition, Figure 8 shows in bold the numbers of bit stages that receive bits from the correction tree. The LSBs SS7..0 and SC7..3 of the conditionally initialized previous sum-5 and memory number vectors are passed to the first stage of the main compression tree as shown in Figure 8 and Table 1. A plurality of intermediate bits from these vectors, namely ssn..8 and SC-11..8, are applied separately to the main compression tree.

Kuvassa 8 nämä väliviennit on merkitty ko. bittiasteiden numeroiden ympärille piirretyillä ympyröillä (summavektoribitit) ja neliöillä (muistinumerovektoribitit). Näiden 10 vektorien MSB:t ssis,.i2 ja SC15..12 kumuloidaan pääkompressointipuun viimeisessä asteessa.In Figure 8, these intermediate fittings are marked with the corresponding inserts. by circles (sum vector bits) and squares (memory number vector bits) around bit rate numbers. The MSBs of these 10 vectors, ssis, .i2, and SC15..12, are accumulated in the final stage of the main compression tree.

Kompressoidussa muistinumerovektorissa SAD_C(cur) on 13 bittiä, joiden 3 eniten n merkitsevää bittiä (MSB) tulevat kuudennen tason CSA 1:n vasemmasta lähdöstä.The compressed memory number vector SAD_C (cur) has 13 bits, with the 3 most n significant bits (MSB) coming from the left output of the sixth level CSA 1.

Loput noista biteistä tulevat kuudennen tason CSA 2:n vasemmasta lähdöstä.The rest of those bits come from the left output of the sixth level CSA 2.

15 Kompressoidussa summavektorissa SAD_S(cur) on 16 bittiä, joiden 4 MSB:tä tule- vat kuudennen tason CSA 1:n oikeasta lähdöstä, 10 seuraavaa bittiä tulevat kuudennen tason CSA 2:n oikeasta lähdöstä, seuraava bitti (bittiaste 1) on viidennen tason CSA:n oikean lähdön vähiten merkitsevä bitti (LSB) ja viimeinen bitti (bittiaste 0) on kolmannen tason CSA 3:n oikean lähdön LSB.15 The compressed sum vector SAD_S (cur) has 16 bits with 4 MSBs coming from the right output of the sixth level CSA 2, the next 10 bits coming from the right output of the sixth level CSA 2, the next bit (bit stage 1) is The least significant bit (LSB) and the last bit (bit rate 0) of the CSA right output is the LSB of the third level CSA 3 right output.

20 Kuvien 8 ja 9 kompressointipuurakenteella on se edullinen ominaisuus, että se vä- ·*,·*.: hentää operandien määrää varhaisimmassa mahdollisessa vaiheessa. Vaikka :Y: varhaisin mahdollinen operandivähennys lisääkin jonkin verran pinta-alatarvetta :*·*,· piiritoteutuksessa, se myös vähentää kompressoituihin summavektoreihin .·*·. (SAD_S(cur)) ja muistinumerovektoreihin (SAD_C(cur)) liittyvien yhteisten bittiastei- : !·, 25 den määrää. Tässä yhteydessä yhteisellä bittiasteella tarkoitetaan bittiastetta, joka • · · ”.Y ei esiinny pelkästään toisessa mainituista vektoreista. Ilman korjausbittien tuomis- "* ta pääkompressointipuu pystyisi vähentämään yhteisten bittiasteiden määrää 16:sta 12:een. Korjausbittien kumulointiin kehitettyjen erillisten rakenteiden ansios- • · · *·|·* ta kuvien 8 ja 9 mukainen kompressointipuurakenne pystyy vähentämään yhteis- 30 ten bittiasteiden määrää 16:sta 13:een, joka on vain yhden bittiasteen vähemmän kuin teoreettinen maksimi; yhteiset bittiasteet ovat ssi5..3 ja scis,.3. Yhteisten bittias- ·*· teiden määrän vähentäminen ei välttämättä lisää lopullisen summa-ja muistinumerovektorien yhteenlaskun suoritusnopeutta, koska nopean summaimen viive ei : tyypillisesti ole summaimen bittileveyden tasaisesti kasvava funktio. Vähäisempi8 and 9 have the advantageous feature of reducing the number of operands at the earliest possible stage. Although: Y: the earliest possible operand subtraction somewhat increases the area requirement: * · *, · in the circuit implementation, it also reduces to the sum of the compressed sum vectors. (SAD_S (cur)) and memory bit vectors (SAD_C (cur)) are common bit rates:!, 25 den. In this context, a common bit rate is a bit degree that • · · ”.Y does not exist in only one of the vectors mentioned. Without the introduction of the correction bits, the main compression tree would be able to reduce the number of common bit stages from 16 to 12. Thanks to the separate structures developed to accumulate the correction bits, the compression tree structure of Figures 8 and 9 can reduce the number of common bit stages From 16 to 13, which is only one bit order less than the theoretical maximum; common bit steps are ssi5..3 and scis, .3 Reducing the number of common bit * * · paths does not necessarily increase the execution speed of the final sum and memory vectors, because the fast adder delay is not: typically not a steadily increasing function of the adder bit width.

«M«M

35 yhteisten bittiasteiden määrä kuitenkin pienentää laitteistokustannuksia sen ansi- 15 117956 osta, että summaimen toteutus on kapeampi ja liikkeen estimoijan myöhemmissä asteissa on pienempi määrä rekistereitä.However, the number of common bit stages reduces hardware costs due to the narrower implementation of the adder and the smaller number of registers in the later stages of the motion estimator.

Ensimmäinen korjausbittivektori C0..1 voidaan välittömästi viedä yhteen nollannen asteen CSA-summaimista pääkompressointipuussa, koska aiemman SAD:n 5 kompressoidun muistinumerovektorin kolme vähiten merkitsevää bittiä ovat käyttämättömiä edellä kuvatun mukaisesti. Tarkasteltaessa kuvassa 4 esitetyn kaltaista tekniikan tason mukaista kompressointirakennetta, jossa kaikki 16 korjausbittiä viedään CSA-tulojen O-bittiasteeseen, voidaan huomata, että mainittu tekniikan tason mukainen ratkaisu aiheuttaa - verrattuna perinteiseen CSA-puuhun, jossa ei 10 ollut korjausbittien lisäystä - pinta-alalisäyksen, joka on lähes yhtä suuri kuin esillä olevan keksinnön ylimääräinen CSA-puu. Toisaalta mainittu tekniikan tason mukainen ratkaisu mahdollistaa vain vähempien SAD-tuloksen asteiden esittämisen pelkästään summavektorilla, so. siihen liittyy useampia yhteisiä bittiasteita summa-ja muistinumerovektoreissa kuin esillä olevassa keksinnössä.The first correction bit vector C0..1 can be immediately exported to one of the zero-order CSA summers in the main compression tree because the three least significant bits of the previous SAD 5 compressed memory number vector are idle as described above. Looking at a prior art compression structure such as that shown in Figure 4, where all 16 correction bits are applied to the O-bit rate of the CSA inputs, it can be seen that this prior art solution causes - as compared to a traditional CSA tree with no 10 bit enhancements - which is almost as large as the extra CSA tree of the present invention. On the other hand, the prior art solution allows only the lower degrees of SAD result to be represented by the sum vector alone, i.e.. it involves more common bit degrees in sum and memory number vectors than in the present invention.

15 Olisi mahdollista esittää puita, jotka hyödyntävät muuntyyppisiä summaimia kuin CSA-summaimia, ja niitä voitaisiin käyttää vähentämään operandien määrää varhaisessa vaiheessa ja toteuttamaan erillinen korjauspuu, josta tulokset syötetään pääkompressointipuuhun, kuten edellä kuvatussa esimerkkiratkaisussa. CSA-summainten käyttöön uskotaan kuitenkin liittyvän tiettyjä etuja, koska ne ovat ra-20 kenteeltaan suhteellisen yksinkertaisia ja koska niillä saadaan aikaan säännöllinen v.: summaimen allokointi- ja kytkentäjärjestely.It would be possible to present trees that utilize other types of adder than CSA adder, and could be used to reduce the number of operands at an early stage and implement a separate repair tree from which the results are fed to the main compression tree, as in the example described above. However, there are certain advantages associated with the use of CSA adder because they are relatively simple in structure and provide a regular v .: adder allocation and switching arrangement.

• · • ♦ · • · ♦ .1^ Olisi mahdollista toteuttaa ennenaikaisen lopetuksen mekanismi kompressointi- puussa tekniikan tason julkaisussa US 2003/0043911 A1 esitettyä menettelyä • · /"* noudattaen. Ajatuksena on keskeyttää käynnissä oleva SAD-laskenta, mikäli ku- * · ·♦· ϊ 25 muloitunut SAD-arvo ylittää jonkin ennalta määrätyn, kiinteän kynnysarvon. Jos tällä tavoin jätettäisiin pois 1-3 MSB:tä, se mahdollistaisi kuitenkin vain erittäin vähäiset säästöt pääkompressointirakenteessa, kun taas suuremman bittimäärän i.:*: pois jättäminen aiheuttaisi helposti liiallisia rajoituksia käsiteltäviin SAD-arvoihin.It would be possible to implement a premature termination mechanism in a compression tree following the procedure described in prior art US 2003/0043911 A1 • · / "*. The idea is to interrupt the ongoing SAD computation if * · · ♦ · ϊ 25 mapped SAD values exceed a predetermined, fixed threshold value, but omitting 1-3 MSB in this way would, however, allow only very small savings in the main compression structure, whereas a larger number of bits i.:*: omitting would easily cause excessive restrictions on the SADs to be processed.

,·*’*: Bittileveyden pienentäminen jättää kompressointirakenteen korkeuden ennalleen., · * '*: Reducing the bit width leaves the compression structure height unchanged.

··· 30 Täysi 16-bittinen rakennetoteutus on erittäin käyttökelpoinen ratkaisu; mitä tahan-’··*! sa haluttua kumuloituneen muistinumerovektorin lähtöbittiä voidaan käyttää kes keytyksenä, joka Hipaisee ennenaikaisen lopetuksen.··· 30 A full 16-bit structure implementation is a very workable solution; whatever I want- '·· *! the desired output bit of the accumulated memory number vector may be used as an interrupt that will emit an early termination.

• · ♦ • · «• · ♦ • · «

Kuvassa 11 on esitetty esimerkkinä eräs kolmannen asteen, so. minimi-SAD-mää-**** rittäjän, yksinkertainen toteutus. Huomattakoon että ehdotettu toteutus ei sisällä 35 liikevektorin määrittämistä, joka jää jonkin toisen sopivan piirielimen vastuulle. Ku- 16 117956 van 11 toteutus ei myöskään sisällä mitään ennenaikaisen lopetuksen mekanismeja, joista tarkemmin tuonnempana.Figure 11 illustrates, by way of example, a third-order, i. minimum SAD - **** entrepreneur, simple implementation. Note that the proposed implementation does not include the determination of the 35 motion vectors, which are the responsibility of another suitable circuit member. Also, the implementation of Fig. 11 117956 van 11 does not include any premature termination mechanisms, more specifically below.

Kompressoidut nykyiset summa- ja muistinumerovektorit vastaanotetaan SAD_S ja SAD_C -tulojen kautta. Vertailu aiemmin tallennettuun minimi-SAD-arvoon alkaa 5 bitti-invertoimalla mainittu aiemmin tallennettu minimi-SAD invertterissä 1101. Käyttäen 15-bittistä CSA:ta 1102 bitti-invertoitu minimi-SAD-arvo kumuloidaan SAD S ja SAD C -vektoreihin. Jatkuva yhden bitin lisäys, jota tarvitaan kahden komplementti-esityksessä, hoidetaan OR-portilla 1103, joka on kytketty vastaanottamaan bitti-invertoidun minimi-SAD:n ja nykyisen SAD_S:n vähiten merkitseviä 10 bittejä ja lähettämään tulos lisä-LSB:nä kasvattamaan CSA:n 1102 vasenta (muis-tinumero) lähtöä. CSA-kompressointi tuottaa kaksi erovektoria dif_Sis..i ja dif_ci6..i, jotka lasketaan yhteen 16-bittisessä summaimessa 1104. Summain voi olla jonkin verran pelkistetty, koska vain eniten merkitsevä summabitti (s16) tarvitaan sen lähdössä. Tämä bitti ilmaisee vähennyslaskun etumerkin.The compressed current sum and memory number vectors are received through the SAD_S and SAD_C inputs. Comparison with the previously stored minimum SAD value starts with 5 bit inversion of the previously stored minimum SAD inverter 1101. Using a 15-bit CSA, the 1102 bit inverted minimum SAD value is accumulated in the SAD S and SAD C vectors. The continuous one-bit insertion required in the two complement representations is handled by OR port 1103, which is coupled to receive the least significant 10 bits of the bit-inverted minimum SAD and the current SAD_S and send the result as an additional LSB to increase the CSA: n 1102 left (memo number) outputs. CSA compression produces two difference vectors dif_Sis..i and dif_ci6..i which are summed in a 16-bit adder 1104. The adder may be somewhat reduced because only the most significant sum bit (s16) is needed at its output. This bit indicates the sign of the subtraction.

15 Koska yhden SAD-arvon laskenta ulottuu useamman jakson ajalle, tarvitaan ohja-ustulo CMPR ilmoittamaan oikea hetki SAD-arvon vertailulle. Porttirakenteella, jossa on NAND-portti 1105 ja AND-portti 1106, otetaan huomioon sekä CMPR-signaali että ZERO_SAD -tulosignaali, joka ilmaisee, onko kyseessä ensimmäinen SAD-arvon laskenta, jolloin ei ole olemassa aiemmin tallennettua minimi-SAD-20 arvoa. Mikäli eniten merkitsevä summabitti s16 on '0' tai jos ZERO_SAD -bitti on '1' • » \v vertailun aikana, niin nykyisen SAD-arvon 13 eniten merkitsevää bittiä, jotka on rinnakkain laskettu 13-bittisessä summaimessa 1107, sekä 3 vähiten merkitsevää bittiä, jotka on suoraan otettu SAD_S:stä (koska vastaavat bittiasteet puuttuivat ·***; SAD_C:stä), tallennetaan rekisteriin 1108 valmiina uutena minimi-SAD-arvona.15 Since the calculation of one SAD value extends over several cycles, the control input CMPR is needed to indicate the correct time for the SAD value comparison. A gate structure having a NAND port 1105 and an AND port 1106 takes into account both the CMPR signal and the ZERO_SAD input signal, which indicates whether this is the first SAD value calculation, so that there is no previously stored minimum SAD-20 value. If the most significant sum bit s16 is '0' or if the ZERO_SAD bit is '1' • »\ v during comparison, then the 13 most significant bits of the current SAD value, calculated in parallel in the 13-bit adder 1107, and the 3 least significant bits , taken directly from SAD_S (because the corresponding bit stages were missing · ***; SAD_C), are stored in register 1108 as a ready new minimum SAD value.

M· : ,·. 25 Multiplekserillä 1109 kierrätetään vanha minimi-SAD tai tallennetaan uusi, riippuen .1·1·.’ signaalista NEW MIN. Invertteri 1110 invertoi tulevan ZERO_SAD-ohjausbitin oi- • « kean toiminnan varmistamiseksi.M ·:, ·. 25 The multiplexer 1109 recycles the old minimum SAD or stores a new one, depending on the .1 · 1 ·. 'Signal NEW MIN. Inverter 1110 inverts the incoming ZERO_SAD control bit to ensure correct operation.

Μ,ϊ Uusi minimi-SAD-arvo aktivoi molemmat kaksi lähtösignaalia SAD_RDY ja MIN_RDY, jotka tallennetaan lähtörekistereihin 1111 ja 1112. Jos uusi arvo ei ollut y..' 30 ensimmäinen, eikä pienempi kuin aiempi minimiarvo, vain SAD_RDY aktivoituu.Μ, ϊ The new minimum SAD value activates both two output signals SAD_RDY and MIN_RDY, which are stored in output registers 1111 and 1112. If the new value was not the first of y .. '30, and not lower than the previous minimum value, only SAD_RDY is activated.

Bittien CMPR, ZERO_SAD, Si6, SAD_RDY ja MIN_RDY keskinäiset riippuvuudet on esitetty seuraavassa taulukossa.The interdependencies between the CMPR, ZERO_SAD, Si6, SAD_RDY, and MIN_RDY bits are shown in the following table.

« · * » · • · · M· « · · * 1 • · * · ««· *» · • · · M · «· · * 1 • · * ·«

Taulukko 10: 17 117956Table 10: 17 117956

CMPR ZERO SAD s16__SAD RDY MIN RDYCMPR ZERO SAD s16__SAD RDY MIN RDY

0__0__0__0__0 0__0__1__0__0 o__i__o__o__o 0 __1__i__g__o 1 __o__g__i__1 1__g__i__i__o 1__i__g__i__1 Γ 1 l 1 I 1 1 10__0__0__0__0 0__0__1__0__0 o__i__o__o__o 0 __1__i__g__o 1 __o__g__i__1 1__g__i__i__o 1__i__g__i__1 Γ 1 l 1 I 1 1 1

Kuva 12 esittää minimi-SAD-määrittäjää, jossa on toteutettu useita ennenaikaisen lopetuksen mekanismeja. Minimi-SAD-määritys etenee pääpiirteissään samalla 5 tavoin kuin edellä selostettiin kuvaan 11 liittyen. Ensimmäinen ennenaikaisen lopetuksen mekanismi perustuu siihen, että jo ennen kuin nykyinen SAD-arvo on valmis (so. ennen CMPR-ohjaussignaalin aktivoitumista), mikäli eniten merkitsevä summabitti s^ tulee '1 ':ksi, sillä hetkellä kumuloituva SAD on jo ylittänyt aiemmin tallennetun minimi-SAD-arvon. Siksi eniten merkitsevä summabitti Si6 annetaan 10 ylimääräisenä lähtösignaalina SAD_EXC lähtörekisterin 1201 kautta. AND-portti 1202 suorittaa AND-operaation Sie-bitin ja invertoidun ZERO_SAD -ohjausbitin vä-lilla, jolloin SAD_EXC -lähtö pysyy tilassa '0', kun ensimmäistä SAD-arvoa laske- * * * s/.' taan. Verrattuna kuvaan 11, minimi-SAD:n ylityksen ilmaiseva mekanismi aiheut- !;V taa pinta-alatarpeessa vain yhden AND-portin ja yhden 1-bittisen rekisterin lisäyk- • * · .Figure 12 illustrates a minimum SAD determinant in which a number of early termination mechanisms are implemented. The minimum SAD determination proceeds essentially in the same manner as described above with respect to Figure 11. The first premature termination mechanism is based on the fact that even before the current SAD value is complete (i.e. before the CMPR control signal is activated), if the most significant sum bit s ^ becomes '1', the current cumulative SAD has already exceeded the previously stored minimum -SAD value. Therefore, the most significant sum bit Si6 is provided as 10 additional output signals SAD_EXC via the output register 1201. AND port 1202 performs an AND operation between the Sie bit and the inverted ZERO_SAD control bit, whereby the SAD_EXC output remains in the state '0' when the first SAD value is lowered by * * * s /. ' a. Compared to Figure 11, the mechanism for detecting a minimum-SAD overrun requires only one AND port and one 1-bit register increment.

15 sen, eikä mitään lisäviivettä. SAD_EXC -lähdön saadessa arvon '1' on tarkoitus, *···* että ohjausalgoritmi lopettaa nykyisen SAD.n laskennan.15, and no further delay. When the SAD_EXC output receives a value of '1', it is intended * ··· * that the control algorithm stops computing the current SAD.

φ ·φ ·

» » I»» I

• · · M* » .·**. Nykyisen SAD-arvon vertaaminen kiinteään kynnysarvoon on järkevää oleellisesti vain laskettaessa ensimmäistä SAD-arvoa, so. kun ZERO_SAD -ohjausbitti on ak- . ,·. tiivinen. Siksi kuvan 12 toteutukseen sisältyy ZERO_SAD -ohjausbitin ohjaama • · · ‘.Il 20 multiplekseri 1203, joka valitsee joko invertoidun aiemman minimi-SAD-arvon tai • · **:·* TH-tulon ja invertterin 1204 kautta saadun invertoidun kynnysarvon vertailun poh- jaksi. AND-portti 1205 ja lähtörekisteri 1206 antavat TH EXC-indikaation samalla *:*·: tavoin kuin edellä selostettiin SAD_EXC -lähtöön liittyen, paitsi että AND-portti . 1205 vastaanottaa invertoimattoman ZERO_SAD-bitin yhteen tuloistaan, jolloin 25 TH_EXC -lähtöjen antaminen on mahdollista vain ensimmäisen SAD-arvon las- • ♦ *··** kennan aikana. Tämän mekanismin lisäpinta-alavaatimus on 16 NOT-porttia, yksi 18 117956 16-bittinen multiplekseri, yksi AND-portti ja yksi 1-bittinen rekisteri. Viivelisäys vastaa multiplekserin 1203 aiheuttamaa viivettä. .• · · M * ». · **. Comparing the current SAD value with a fixed threshold value makes sense only substantially when calculating the first SAD value, i.e. when the ZERO_SAD control bit is ak-. ·. tive. Therefore, the implementation of Fig. 12 includes a ZERO_SAD control bit controlled by a · · · · · · · · · · · · · · · · · · · · 20 20 20 multiplexer 1203, which selects either an inverted previous minimum SAD value or a · · **: · * TH input and Line. The AND port 1205 and the output register 1206 provide the TH EXC indication in the same *: * · manner as described above for the SAD_EXC output, except that the AND port. The 1205 receives a non-inverting ZERO_SAD bit from one of its inputs, so that the output of 25 TH_EXC outputs is only possible during the first SAD value calculation. An additional area requirement for this mechanism is 16 NOT ports, one 18 117956 16-bit multiplexer, one AND port, and one 1-bit register. The delay increase corresponds to the delay caused by the multiplexer 1203. .

Kynnysarvon ylitysilmaisun toteutus tukee myös vertailua useampiin kynnysarvoihin. Kun nykyinen kynnysarvo on ylitetty, seuraavaksi suurempi kynnysarvo voi-5 daan syöttää TH-tulon kautta seuraavan jakson aikana. Yhden jakson viive peräkkäisten kynnysarvovertailujen välissä estää jäljellä olevien suurempien kynnysarvojen tarkastelun, jos nykyinen kynnysarvo ylitettiin vasta viimeisen jakson aikana ennen SAD:n valmistumista. Tämä ei kuitenkaan ole vakava haitta, koska kynnysarvoja tavallisesti käytetään vain nopeuttamaan algoritmin suoritusta. Näin olio Ien vertailematta jäänyt suurempi kynnysarvo voi pahimmassa tapauksessa aiheuttaa tarpeetonta laskentaa, mutta oikea tulos saavutetaan silti.The implementation of the threshold override detection also supports comparison with multiple thresholds. Once the current threshold is exceeded, the next higher threshold can be input via the TH input during the next period. A one-period delay between successive threshold comparisons prevents consideration of the remaining higher thresholds if the current threshold was only exceeded during the last period before the SAD was completed. However, this is not a serious disadvantage since thresholds are usually only used to speed up algorithm performance. Thus, an unmatched higher threshold for an object can, at worst, cause unnecessary computation, but the correct result is still achieved.

Olisi mahdollista suorittaa kynnysarvon ylityksen ilmaisu täysin rinnakkaisesti kahdentamalla minimi-SAD-määrittäjän vertailuosat kullekin halutulle kynnystasolle. Rinnakkainen kynnysarvon ilmaisu pienentäisi viivettä tarpeettoman valintamulti-15 plekserin verran, ja poistaisi viiveen kokonaan peräkkäisten kynnysarvovertailujen väliltä. Aiheutuva piiripinta-alan lisäys olisi kuitenkin suhteellisen suuri kahdennetun vertailulogiikan takia.It would be possible to perform the detection of a threshold overrun in parallel by duplicating the reference portions of the minimum SAD for each desired threshold level. A parallel threshold indication would reduce the delay by an unnecessary selection multi-15 plexer, and completely eliminate the delay between successive threshold comparisons. However, the resulting increase in circuit area would be relatively large due to duplicate reference logic.

Kolmas ennenaikaiseen lopetukseen liittyvä mekanismi perustuu siihen, että suositaan tarkoituksella ensimmäisen, so. liikkeettömän ehdokasmakrolohkon SAD- 20 arvoa. Tällöin vähennetään ennalta määrätty parametritulo -ZERO BONUS nolla-• · · — liikevektorin SAD-arvosta. Käytännössä -ZERO_BONUS -arvo kytketään AND-The third premature termination mechanism is based on the deliberate preference for the first, i.e. the SAD-20 value of the stationary candidate macroblock. This will subtract the predetermined parameter input -ZERO BONUS from the SAD value of the zero · · · motion vector. In practice, the -ZERO_BONUS value is set to AND-

• I I• I I

1 porttien muodostaman sallintaelimen 1207 kautta CSA:lle 1208, joka summaa sen nykyiseen SAD-arvoon. Sallintaelimen 1207 AND-portit vastaanottavat toisen tu- /··* lonsa, aktivointitulon, ZERO SAD -ohjaustulolta, jolloin -ZEROJ30NUS -tulo ma- i.: i 25 terialisoituu nollasta poikkeavaksi tuloksi CSA.IIe 1208 vain, jos ZERO_SAD • · · !...: -ohjausbitti on aktiivinen. Tunnetuissa ratkaisuissa sopiva nollabonus on esimer kiksi sata, kun käytetään ns. diamond search -algoritmia. -ZEROJ30NUS -arvo : f: voi olla suunnitteluvaiheessa määrätty vakio, tai se voi olla ajonaikainen parametri, :*'*· jota ohjataan digitaalisen videokoodauksen ohjausalgoritmilla. Kuvan 12 mukainen * · « 30 nollabonusmekanismi merkitsee 16 AND-portin ja yhden 16-bittisen CSA:n muo- *”\ dostamaa pinta-alatarvelisäystä, eikä lainkaan viivelisäystä.1 through the gateway enable member 1207 to the CSA 1208, which sums it to its current SAD value. The AND ports of the enabling member 1207 receive their second input / ·· * input, the activation input, from the ZERO SAD control input, whereby the -ZEROJ30NUS input min. ...: control bit is active. In known solutions, for example, a suitable zero bonus is one hundred when using the so-called "zero" bonus. diamond search algorithm. -ZEROJ30NUS value: f: may be a fixed constant at design stage, or may be a runtime parameter: * '* · controlled by a digital video encoding control algorithm. The * · «30 null bonus mechanism of Fig. 12 represents an increase in area required by the 16 AND gates and one 16-bit CSA, and no delay increase at all.

« · • X Kuvan 12 toteutuksen bittien CMPR, ZERO_SAD, s16, SAD_RDY, MIN RDY, « * · “X SAD_EXC ja TH_EXC keskinäiset riippuvuudet on esitetty seuraavassa taulukos- • · *** sa.The interdependencies between the bits CMPR, ZERO_SAD, s16, SAD_RDY, MIN RDY, * * of the implementation of Fig. 12 are shown in the following table.

Taulukko 11: 19 117956Table 11: 19 117956

CMPR ZERO SAP s16 SAD RDY MIN RDY SAD EXC TH_EXCCMPR ZERO SAP s16 SAD RDY MIN RDY SAD EXC TH_EXC

0__0__0__0__0__0__0 0__0__1__0__0__1__0 0__1__0__0__0__0__0 0 __i__i__o__o__g__1 1 __g__g__i__i__g__o 1__g__i__i__g__i__o 1__i__g__i__i__g__o 1 1 1 l 1 l 1 I 1 I o I 10__0__0__0__0__0__0 0__0__1__0__0__1__0 0__1__0__0__0__0__0 0 __i__i__o__o__g__1 1 __g__g__i__i__g__o 1__g__i__i__g__i__o 1__i__g__i__i__g__o 1 1 1 l 1 l 1

Verrattaessa tunnettuun Chenin minlmi-SAD-määrittäjään (ks. US 5 838 392) kuvien 11 ja 12 järjestelyihin voidaan sanoa liittyvän se haittapuoli, että ne vaativat 5 yhden ylimääräisen 15-bittisen summaimen (tai 13-bittisen summaimen kuvan 11 tapauksessa), joka on SAD.n täydellisen arvon laskemiseen käytettävä summain 1107. Kuvien 11 ja 12 mukaisten järjestelyjen hyvänä puolena on kuitenkin se, että ne vaativat vain noin puolet minimi-SAD-arvon tallentamiseen käytettävistä rekistereistä ja valintalogiikasta ja vain noin puolet minimi-SAD-arvon invertointiin käy-10 tettävästä invertointilogiikasta. Minimi-SAD:n vertailu on myös nopeampaa, koska CSA+OR -yhdistelmä on nopeampi kuin 4:2-kompressoija. Kapeampi summain riittää erovektorien yhteenlaskuun, ja varsinainen SAD-arvo on välittömästi käytet-. v. tävissä, mikä on etu, koska sitä tarvitaan esimerkiksi valintapäätökseen INTRA ja •11' INTER-koodaustilojen välillä.Compared to the known Chen minlmi SAD determiner (see US 5,838,392), the arrangements of Figures 11 and 12 can be said to have the drawback that they require 5 one additional 15-bit adder (or 13-bit adder in Fig. 11), which is However, the advantage of the arrangements shown in Figures 11 and 12 is that they require only about half of the registers and selection logic used to store the minimum SAD value, and only about half of the inversion of the minimum SAD value. 10 inverting logic. Minimum SAD comparison is also faster because the CSA + OR combination is faster than the 4: 2 compressor. A narrower adder is sufficient for the sum of the difference vectors, and the actual SAD value is immediately used. v. here, which is an advantage because it is needed, for example, to select between INTRA and • 11 'INTER coding modes.

• * 15 Kuva 13 esittää kuvan 7 mukaisia eroitseisarvoyksiköitä, kuvien 8 ja 9 mukaista ij’: kompressointirakennetta, sekä kuvan 12 mukaista minimi-SAD-määrittäjää sijoitet- :***· tuna kuvassa 6 aiemmin esitettyyn kokonaisrakenteeseen. Eroitseisarvoyksiköt 602 esiintyvät pareittain, ja kunkin parin välissä on puolisummain 721 kaksibittis- , .·. ten korjausbittivektorien muodostamiseksi. Ensimmäisen asteen 601 ja toisen as- ,··*. 20 teen 611 välissä olevat rekisterit 1301 huolehtivat korjaamattomien eroitseisarvo- Ύ jen ja korjausbittivektorien siirtämisestä. Vastaavasti toisen asteen 611 ja kolman- ··« nen asteen 621 välissä on rekisterit 1302 kompressoitujen SAD_S ja SAD_C -arvojen siirtämiseksi. Lähtörekisterit esitettiin kuvissa 11 ja 12 osina minimi-SAD- • )·, määrittäjää, joten niitä ei ole erikseen esitetty kuvassa 13. Ensimmäinen vaihe 601 • · · 25 vastaanottaa 32 operandia, jotka ovat 16 pikseliarvoa kultakin nykyiseltä ja ehdo-***' kasmakrolohkolta. Toinen vaihe 611 vastaanottaa 26 operandia, jotka ovat 16 ero- itseisarvoa, 8 korjausbittivektoria ja kaksi takaisinsyöttöarvoa toisen vaiheen 611 20 117956 lähdöistä. Lisäksi toinen vaihe 611 vastaanottaa ohjaustulon INIT. Kolmas vaihe 621 vastaanottaa kompressoidut SAD_S ja SAD_C -arvot, takaisinsyöttöarvon MIN_SAD omalta lähdöltään, sekä kynnysarvon TH, nollabonusarvon -ZERO_BONUS ja ohjaussignaalit CMPR ja ZERO_SAD. Kolmannen vaiheen 621 5 lähdöt ovat MIN_SAD -arvo ja indikaatiolähdöt SAD_RDY, MIN_RDY, SAD_EXC ja TH_EXC.Fig. 13 shows the difference value units of Fig. 7, the ij 'compression structure of Figs. 8 and 9, and the minimum SAD determinant of Fig. 12, inserted in the overall structure shown in Fig. 6. Differential value units 602 occur in pairs, and each pair has a half - adder 721 two - bit,. ten correction bit vectors. First degree 601 and second degree, ·· *. The registers 1301 between the 20 paths 611 take care of transmitting the uncorrected difference values and the correction bit vectors. Similarly, second order 611 and third order 621 have registers 1302 for transmitting the compressed SAD_S and SAD_C values. The output registers are shown in Figures 11 and 12 as parts of the minimum SAD- •) · descriptor, so they are not shown separately in Figure 13. The first step 601 • · · 25 receives 32 operands, which are 16 pixel values from each current and condition - *** ' macroblocks. The second step 611 receives 26 operands which are 16 differential values, 8 correction bit vectors, and two feedback values from the outputs of the second step 611 20 117956. Further, the second step 611 receives the control input INIT. The third step 621 receives the compressed SAD_S and SAD_C values, a feed back value MIN_SAD from its source, a threshold TH, a zero bonus value -ZERO_BONUS, and control signals CMPR and ZERO_SAD. The outputs of the third step 621 5 are the MIN_SAD value and the indicator outputs SAD_RDY, MIN_RDY, SAD_EXC and TH_EXC.

Osia kuvan 13 esittämästä kokonaisrakenteesta voitaisiin korvata toisin valituilla ratkaisuilla. Jos esimerkiksi haluttaisiin jostain syystä käyttää nimenomaan Chenin eroitseisarvoyksiköitä (ks. kuva 2) esillä olevan keksinnön suoritusmuodon mukai-10 sen ensimmäisen asteen 601 asemesta, niitä olisi mahdollista käyttää edellyttäen, että Chenin korjausbitit muutettaisiin korjausbittivektoreiksi ylimääräisessä väliasteessa ennen toista astetta 611. Tai jos haluttaisiin käyttää Chenin kompressointi-rakennetta esillä olevan keksinnön suoritusmuodon mukaisen toisen asteen 611 tilalla, sekin olisi mahdollista käyttämällä taas väliastetta, jonka tällä kertaa pitäisi 15 siepata korjausbitit ensimmäiseltä asteelta ennen niiden yhdistämistä korjausbittivektoreiksi. Myös tekniikan tason mukaista minimi-SAD-määrittäjää voitaisiin käyttää esillä olevan keksinnön suoritusmuodon mukaisen kolmannen asteen 621 tilalla tekemällä pieniä muutoksia, jotka sinänsä olisivat alan ammattilaiselle helppoja tehdä. Edellä selostetun uuden ja keksinnöllisen ratkaisun osien korvaaminen joh-20 taisi kuitenkin väistämättä heikompaan suoritustehoon ja vähemmän edulliseen . . ratkaisuun.Parts of the overall structure shown in Figure 13 could be replaced by alternatively chosen solutions. For example, if for some reason it were desired to explicitly use Chen's difference value units (see Figure 2) instead of the first order 601 of the embodiment of the present invention, they could be used provided that Chen's correction bits were converted to correction bit vectors in the auxiliary medium before 611. Chen's compression structure instead of the second order 611 according to the embodiment of the present invention would also be possible using the intermediate, which this time should intercept the correction bits from the first order before combining them into the correction bit vectors. Also, the prior art minimum SAD detector could be used in place of the third order 621 of the embodiment of the present invention by making small changes that would be readily accomplished by one skilled in the art. However, replacing parts of the new and inventive solution described above would inevitably lead to lower performance and less favorable performance. . solution.

• · · .• · ·.

• * · • ·• * · • ·

Tunnetaan menetelmiä teoreettisten viive- ja pinta-alaestimaattien johtamiseksi ·*·'. keksitylle liikkeen estimoijalle. Pinta-alakustannus perusporteille, joilla on n tuloa, • · .···. on n kustannusyksikköä (CU), paitsi XOR ja XNOR-porteille, joille se on 2n CU.Methods for deriving theoretical delay and area estimates · * · 'are known. to the invented motion estimator. Area cost for basic ports with n inputs, • ·. ···. is n cost units (CUs) except for XOR and XNOR ports for which it is 2n CUs.

25 Viivekustannus puolestaan kaikille käytetyille porteille on 1 τ. Laskelmissa ei taval- * · · lisesti oteta huomioon rekistereitä, kytkentöjä, eikä fan-out / fan-in -tekijöitä. Saa-***** vutetut tulokset ovat näiden oletusten takia parempia vertailutarkoituksiin kuin käy tettäväksi itsenäisinä absoluuttisina mittaustuloksina.In turn, the delay cost for all gates used is 1 τ. Usually * · · registers, connections, and fan-out / fan-in factors are not included in the calculations. Due to these assumptions, the results obtained ***** are better for comparison purposes than for use as independent absolute measurements.

• · · • · «• · · • · «

Summaimet edustavat tyypillisesti suurta osaa viive- ja pinta-alakustannuksista.Adapters typically represent a large portion of the delay and area costs.

* · * 30 Siksi voidaan ajatella kolmenlaisia summaimia, joilla on erilaisia ominaisuuksia: * * **··[ RCA- eli ripple carry -summaimia, CLA- eli carry look-ahead -summaimia ja CSA- \ * eli carry save -summaimia. Lisäksi analysoidaan erikseen CLA-summaimista n- : tuloporttiversioiden (CLA(n)) lisäksi 2-tuloporttiversioita (CLA(2)), koska käytettä- ;***. vissä olevan porttileveyden vaikutus on oleellinen CLA.IIe. CLA-versioiden erona 35 on, että CLA(n):ssä käytetyt n-tuloportit korvataan 2-tuloporttipuilla CLA(2):ssa. Huomionarvoista on, että n-tuloporttien suora korvaaminen 2-tuloporttipuilla jättää 21 117956 tilaa CLA(2):n pinta-alaoptimoinnllle. Tällaista optimointia ei kuitenkaan ole otettu huomioon näissä laskelmissa.* · * 30 Therefore, there are three types of adder with different properties: * * ** ·· [RCA or ripple carry adder, CLA (carry look-ahead adder) and CSA \ * (carry save) adder. In addition, in addition to the n-input port versions (CLA (n)), the 2-input port versions (CLA (2)) of the CLA admixtures are separately analyzed because *** is used. the effect of the gate width at the center is essential to the CLA. The difference with CLA versions is that the n-input ports used in CLA (n) are replaced by 2-input port trees in CLA (2). It is noteworthy that the direct replacement of n-input ports with 2-input port trees leaves 21 117956 spaces for CLA (2) area optimization. However, such optimization is not taken into account in these calculations.

Analyysin pohjaksi oletetaan, että yhden puolisummaimen viive- ja pinta-alakus-tannukset ovat 1 τ ja 6 CU, kun taas vastaavat luvut kokosummaimelle ovat 3 τ 5 and 14 CU. Analyysissä on oletettu, että kaikki summaimet sisältävät vain sen logiikan, jota tarvitaan pyydetyn tuloksen muodostamiseksi. Toisin sanoen käyttämättömiin tulo- ja lähtöportteihin liittyvät logiikat on poistettu.The assumption is based on the assumption that the delay and surface costs for one half adder are 1 τ and 6 CU, whereas the corresponding figures for the full adder are 3 τ 5 and 14 CU. In the analysis, it is assumed that all the adders contain only the logic needed to produce the requested result. In other words, the logic associated with unused input and output ports has been removed.

Kuvan 14 ylempi ja alempi kaavio esittävät teoreettisia viive- ja pinta-alaestimaat-teja eri estimointiyksiköille. Yksinkertaisella vinoviivoituksella varustettu sarake 10 edustaa RCA/CSA-toteutusta, valkoinen sarake edustaa CLA(2)/CSA-toteutusta, ristiviivoitettu sarake edustaa CLA(n)/CSA-toteutusta ja kaksinkertaisella vinoviivoituksella varustettu sarake edustaa CSA-toteutusta. Kussakin yksikössä sum-maintyypin vaihto tapahtuu vain RCA:n ja CLA.n välillä. Nämä summaimet ovat kuvan 7 eroitseisarvoyksikössä summaimet 701 ja 711 ja kuvan 11 minimi-SAD-15 määritysyksikössä summaimet 1104 ja 1107. Muut summaimet on kussakin tapauksessa toteutettu CSA-summaimilla.The upper and lower diagrams in Figure 14 show theoretical delay and area estimates for different estimation units. Column 10 with a single slash represents a RCA / CSA implementation, a white column represents a CLA (2) / CSA implementation, a cross-hatched column represents a CLA (n) / CSA implementation, and a column with a double slash represents a CSA implementation. In each unit, the sum repute type changes only between RCA and CLA. These adders are the adder units 701 and 711 in the difference value unit of Figure 7 and the adder units 1104 and 1107 in the minimum SAD-15 determination unit of Figure 11.

Tarkastellaan ensin eroitseisarvoyksikön analysoituja toteutuksia. Kaikissa tapauksissa pinta-alakustannus on laskettu 16 yksikölle. Keksitty CLA-pohjainen yksikkö on suositeltavin ratkaisu suurta suorituskykyä vaativiin sovelluksiin, kun taas :V: 20 keksitty RCA-pohjainen yksikkö tarjoaa pinta-alatehokkaimman toteutuksen. Esi- • * ;y: tetty vain CSA-summaimia sisältävä kompressointirakenneyksikkö puolestaan on erittäin tehokas ratkaisu suoritusnopeuden ja pinta-alatehokkuuden suhteen. Lo- • · puksi, vertailu RCA- ja CLA-pohjaisten minimi-SAD-määrittäjien välillä paljastaa, .’*1 että viivelisäys RCA-pohjaisissa toteutuksissa on paljon merkittävämpi kuin pinta- y/ 25 alasäästöt. Siksi CLA-pohjainen minimi-SAD-määrittäjä on suositeltu toteutus.Let us first consider the analyzed implementations of the difference value unit. In all cases, the area cost is calculated for 16 units. The invented CLA-based unit is the preferred solution for high-performance applications, while: V: 20 the invented RCA-based unit offers the most surface-efficient implementation. The pre-*; y: compression building unit containing only the CSA adder is a very efficient solution in terms of execution speed and area efficiency. Finally, a comparison between RCA and CLA-based minimum SAD determinants reveals, '* 1 that the delay increase in RCA-based implementations is much more significant than the surface / 25 down savings. Therefore, a CLA-based minimum SAD descriptor is the recommended implementation.

• · • · ·*·• · • · · * ·

Analyysitulosten perusteella ehdotetaan kahta tehokasta pipeline-järjestelyä edul- ; lisimmille yksiköille. Ensimmäinen niistä noudattaa kuvassa 13 esitettyä järjeste- • · · .···. lyä. Kyseisessä kolmiastejärjestelyssä RCA-pohjaiset eroitseisarvoyksiköt ovat riit- !** tävän nopeita ensimmäistä astetta varten. Pipeline-asteet ovat teoriassa hyvin ta- 30 sapainossa, sillä kunkin ao. yksikön kombinointiviive on lähellä arvoa 19 τ. CLA-pohjaisille kokoonpanoille käytetään CLA(2) ja CLA(n) teoreettisten viive-estimaat-: tien keskiarvoa. Rakenteen latenssi on kolme kellojaksoa. Eräs toinen ehdotettu pipeline-järjestely sisältää vain kaksi pipeline-astetta. Tämän toisen järjestelyn • « suoritusnopeutta kasvatetaan käyttämällä CLA-pohjaisia eroitseisarvoyksiköitä. 35 Teoreettisten viivearvojen mukaan käytettävissä olevan pipeline-väliasteen opti- 22 117956 maalinen sijainti on ennen kompressointirakenteen viimeistä CSA-astetta. Näin ollen molempien pipeline-asteiden teoreettinen kombinointiviive on noin 23 τ.Based on the results of the analysis, two effective pipeline arrangements are suggested; for the larger units. The first one follows the system shown in Figure 13 • · ·. ···. processing. In this three-stage arrangement, the RCA-based difference units are fast enough for the first degree. The pipeline degrees are theoretically well balanced, since the combination delay of each of the units in question is close to 19 τ. For CLA-based configurations, the mean CLA (2) and CLA (n) theoretical delay estimates are used. The latency of the structure is three clock cycles. Another proposed pipeline arrangement contains only two pipeline degrees. The execution rate of this second arrangement is increased by using CLA-based difference values. 35 According to theoretical delay values, the optimum position of the available pipeline intermediate is 22 117956 before the final CSA of the compression structure. Thus, the theoretical combination delay of both pipeline stages is about 23 τ.

Ehdotetuille 3-aste- ja 2-asterakenteille on esitetty logiikkasynteesiin perustuvat pinta-ala- ja ajoitustulokset. Synopsys Design Compiler on eräs tunnettu ASIC-5 synteesissä käytetty työkalu, ja käytettäväksi tekniikaksi voidaan olettaa 0,18 mikronin CMOS-prosessi. Ehdotettuihin arkkitehtuureihin sisältyvät summaimet on toteutettu erityisillä Synopsys DesignWare -komponenteilla. Tarkkaan ottaen RCA:t ja CLA:t on toteutettu tunnetuilla DW01 j-pl ja DW01_cla -komponenteilla. CSA:t puolestaan koostuvat 1-bittisistä DW01_rpl -summaimista. Pinta-alametriikat (so-10 lumäärä) perustuvat ekvivalentteihin 2-tulo-NAND-portteihin, kun taas viivearvot vastaavat pipeline-arkkitehtuurien kriittistä polkua. Rekisterit on myös sisällytetty viive- ja pinta-alametriikoihin.For the proposed 3-step and 2-aster structures, surface and timing results based on logic synthesis are presented. Synopsys Design Compiler is a known tool used in the synthesis of ASIC-5, and the technology used can be assumed to be a 0.18 micron CMOS process. The adders included in the proposed architectures are implemented with special Synopsys DesignWare components. Strictly speaking, RCAs and CLAs have been implemented with known DW01 j-p1 and DW01_cla components. The CSAs, in turn, consist of 1-bit DW01_rpl summers. The area metrics (ie-10 number of volumes) are based on equivalent 2-input NAND ports, while the delay values correspond to the critical path of the pipeline architectures. The registers are also included in the delay and area metrics.

Valittua tekniikkaa käyttäen ehdotettu 3-aste-SAD-arkkitehtuuri pystyy toimimaan 780 MHz:n taajuudella. Tämän maksimitaajuusarkkitehtuurin toteuttaminen mak-15 saa alle 5600 NAND-porttia. On selvää, että toimintataajuusvaatimuksista tinkiminen kasvattaa arkkitehtuurin pinta-alatehokkuutta. Sama arkkitehtuuri 580 MHz.n toimintataajuudella maksaa 4200 NAND-porttia. Edelleen, 380 MHz:n toimintataajuus saavutetaan arkkitehtuurilla, jolla on noin 4000 NAND-portin minimipinta-ala.Using the selected technology, the proposed 3-step SAD architecture is capable of operating at 780 MHz. Implementing this maximum frequency architecture, the mak-15 receives less than 5600 NAND ports. It is clear that compromising the operating frequency requirements will increase the surface area efficiency of the architecture. The same architecture at 580MHz costs 4200 NAND ports. Further, the 380 MHz operating frequency is achieved by an architecture having a minimum surface area of about 4000 NAND ports.

Toisaalta, ehdotettu 2-astearkkitehtuuri voidaan kellottaa 700 MHziiin 7400 NAND-20 portilla. 520 MHz:n ja 360 MHz:n toimintataajuudet puolestaan saavutetaan 4900 • · v.; ja 3900 NAND-portilla. Tutkimuksen johtopäätös on, että ehdotettu 3-astearkki- tehtuuri on suositeltava toteutus suorituskykyisille ja pinta-alatehokkaille järjestel-mille, joissa kolmen jakson latenssi on hyväksyttävissä. Ehdotettu 2-astearkki-tehtuuri puolestaan sopii paremmin pienen latenssin high-end -järjestelmille.On the other hand, the proposed 2-tier architecture can be clocked up to 700 MHZ on a 7400 NAND-20 port. The 520 MHz and 360 MHz operating frequencies, in turn, are achieved at 4900 • · v .; and 3900 on the NAND port. The conclusion of the study is that the proposed 3-step architecture is a recommended implementation for high performance and area efficient systems with acceptable three cycle latency. The proposed 2-step sheet operation, on the other hand, is more suitable for low latency high-end systems.

• * • · * :·: ; 25 Kuva 15 esittää keksinnön erään suoritusmuodon mukaista videokooderia. Tuleva • · · *··.: videosignaali 1501 viedään liikkeen estimoijalle 1502, joka suorittaa makrolohko- vertailut liikevektorien selvittämiseksi. Jälkimmäiset viedään liikkeen kompensoijal-le 1503, joka oleellisesti antaa lähtönään ao. makrolohkot aiemmasta kehyksestä, i*": johon liikevektori osoittaa. Alkuperäisen kehyksen ja liikevektorien osoittaman ke- 30 hyksen ero lasketaan summaimessa 1504, jonka lähtö siten muodostaa ennustus-**\* virheen. Ennustusvirheelle lasketaan DCT (diskreetti kosinimuunnos) -kertoimet ’ DCT-kooderilohkossa 1505, ja lasketut kertoimet kvantisoidaan kvantisoijassa j#:*: 1506. Sieltä kvantisoidut DCT-kertoimet viedään Huffman/Run-Length -kooderiin 1507, jonka lähtö muodostaa koodatun videosignaalin 1508.• * • · *: · :; Figure 15 shows a video encoder according to an embodiment of the invention. Incoming • · · * ··: video signal 1501 is applied to motion estimator 1502 which performs macroblock comparisons to determine motion vectors. The latter are exported to motion compensator 1503, which essentially outputs the macroblocks from the previous frame, i * ": as indicated by the motion vector. The difference between the original frame and the frame indicated by the motion vectors is calculated in adder 1504, whose output thus forms a prediction. The prediction error is computed with DCT (Discrete Cosine Conversion) coefficients in the 'DCT encoder block 1505, and the calculated coefficients are quantized in the quantizer j #: *: 1506. From there, the quantized DCT coefficients are exported to a Huffman / Run-Length encoder 1507 whose output is encoded by videoign.

* * ♦ 23 117956* * ♦ 23 117956

Liikkeen kompensointia varten suoritetaan käänteiskvantisointi ja DCT-dekoodaus ao. lohkoissa 1509 ja 1510. Summain 1511 ja tilapäinen kehysmuisti 1512 tarvitaan syöttämään takaisinkytkentätieto asianmukaisesti liikkeen kompensoijaan 1503. Liikkeen kompensoijan 1503 lähdöstä on kytkentä summaimeen 1511. Tila-5 päisen kehysmuistin 1512 ja liikkeen estimoijan 1502 välisen kytkennän kautta jälkimmäinen saa tietoa vertailukehyksen sisällöstä.For motion compensation, inverse quantization and DCT decoding are performed in the respective blocks 1509 and 1510. Adder 1511 and temporary frame memory 1512 are required to properly input feedback information to motion compensator 1503. Output of motion compensator 1503 is coupled to adder 1511 and space 5 the latter provides information on the content of the reference frame.

Ohjausyksikkö 1513 ohjaa videokooderin toimintaa. Yksi ohjausyksikön 1513 päätehtävistä on tehdä päätöksiä kullekin kehykselle valittavasta koodausmuodosta: intra-koodaus (eli 'itsenäinen' tai 'itseriittoisa' koodaus) kehyksille, joita olisi vaikea 10 koodata ennustamalla muista kehyksistä, sekä inter-koodauksen eri muodot ennustettaville kehyksille. Ohjausyksikkö 1513 suorittaa myös kaikenlaisia koodaus-prosessiin vaikuttavia ohjelmisto- tai firmware-pohjaisia ohjausrutiineja. Ohjausyksikön 1513 toimintoja ei välttämättä ole toteutettu yhteen keskitettyyn piirielimeen, vaan ne voivat olla hajautetut kahteen tai useampaan hajautettuun ohjausaliyksik-15 köön.The control unit 1513 controls the operation of the video encoder. One of the main functions of the control unit 1513 is to make decisions on the coding mode to be selected for each frame: intra-coding (i.e., 'independent' or 'self-linked' coding) for frames that would be difficult to code by predicting other frames and various forms of inter-coding for predictable frames. The control unit 1513 also performs any software or firmware based control routines affecting the encoding process. The functions of the control unit 1513 may not be implemented in one centralized circuit element, but may be distributed in two or more distributed control sub-units 15.

1 '1 '

Esillä oleva keksintö vaikuttaa pääasiassa liikkeen estimoijaan 1502, joka keksinnön suoritusmuodon mukaisessa videokooderissa käsittää edellä selostetut eroit-seisarvoyksiköt, kompressointirakenteen sekä minimi-SAD-määrittäjän. Keksinnöllä on kuitenkin vaikutusta myös ohjausyksikköön 1513, erityisesti jos käytetään 20 edellä kuviin 12 ja 13 liittyen selostettuja ennenaikaisen lopetuksen kriteerejä. Yk-v\: sinkertaiset aikapohjaiset ohjaussignaalit, kuten CMPR, INIT ja ZERO_SAD voi- :Y: vat tulla liikkeen estimoijaan 1502 sisältyvältä ajastinyksiköltä, tai ne voivat olla oh- :*·*: jausyksikön 1513 liikkeen estimoijalle 1502 osoitettuja lähtöjä. Parametriarvot, jot- • · .·**. ka voivat myös olla ohjelmoitavia, kuten TH ja -ZERO_BONUS, tulevat tyypillisesti : 25 ohjausyksiköltä 1513 tai ohjausyksikön 1513 ohjaamilta rekistereiltä. ' • · · * • ♦ · · • · ·The present invention mainly affects the motion estimator 1502, which in the video encoder according to an embodiment of the invention comprises the above described difference units, a compression structure and a minimum SAD determiner. However, the invention also has an effect on the control unit 1513, particularly if the early termination criteria described above with reference to Figures 12 and 13 are used. Un-v \: simple time-based control signals, such as CMPR, INIT, and ZERO_SAD, may come from the timer unit included in the motion estimator 1502, or may be outputs to the motion estimator 1502 of the ohm unit 1513. Parameter values such as • ·. · **. These can also be programmable, such as TH and -ZERO_BONUS, typically coming from: 25 control units 1513 or registers controlled by control unit 1513. '• · · * • ♦ · · • · ·

Lähdöt SAD_RDY, MIN_RDY, SAD_EXC, TH_EXC ja MIN_SAD ohjataan edullisimmin liikkeen estimoijalta 1502 ohjausyksikköön 1513, joka tekee ohjauspäätök-J.j.i siä mainittujen lähtöjen arvojen perusteella. Esimerkiksi aktivoitunut SAD_EXCThe outputs SAD_RDY, MIN_RDY, SAD_EXC, TH_EXC and MIN_SAD are preferably controlled from the motion estimator 1502 to the control unit 1513, which makes control decisions based on the values of said outputs. For example, activated SAD_EXC

-lähtöbitti ilmaisee ohjausyksikölle 1513, että nykyinen laskettu SAD on jo ylittänyt 30 aiemmin lasketun minimi-SAD:n, joten ohjausyksikkö 1513 voi käskeä liikkeen es-**’. timoijan 1502 lopettamaan laskennan ennen aikojaan ja siirtymään laskemaan seuraava SAD-arvoa. Ohjausyksikkö 1513 voi olla myös ohjelmoitu tarkkailemaan ··*· liikkeen estimoijalta 1502 vastaanottamiaan MIN SAD -arvoja, jolloin esimerkiksi ! inter/intra-koodauspäätökset perustuvat sille, ovatko SAD-minimiarvot tiettyjen ra-35 jojen sisällä vai eivät. Alan ammattimies pystyy esittämään ohjausyksikön 1513 rakenteen ja toiminnan edellä esitettyjen selitysten perusteella.The output bit indicates to the control unit 1513 that the current computed SAD has already exceeded 30 previously calculated minimum SADs, so that the control unit 1513 can command the movement es - ** '. simulator 1502 to stop computing prematurely and proceed to calculate the next SAD. The control unit 1513 may also be programmed to monitor the MIN SAD values it has received from the motion estimator 1502, for example! inter / intra coding decisions are based on whether or not the minimum SAD values are within certain limits. The structure and operation of the control unit 1513 will be apparent to one skilled in the art on the basis of the foregoing explanations.

24 11795624 117956

On huomattava, että edellä esitetty selitys sisältää havainnollistavia esimerkkejä, jotka eivät rajoita keksinnön sovellettavuutta. Keksintö ei esimerkiksi mitenkään rajoitu vain peräkkäisten kuvakehysten kahden makrolohkon 16 pikselin vertailuun. Samanaikaisesti vertailtavien pikselien lukumäärä voi olla jokin muu, vertailtavilla 5 kehyksillä voi olla jokin muu suhde kuin peräkkäisyys, ja "pikselit" voivat olla jotakin muuta kuin kaksiulotteisen graafisen esityksen alkio-osia. Yksinkertaisena yleistyksenä voidaan sanoa, että nimitys "makrolohko" voidaan korvata nimityksellä "säännöllinen joukko monibittiarvoja", jolloin keksintö on sovellettavissa kaikenlaisiin tapauksiin, joissa on tarkoituksena selvittää, missä määrin samanlainen jo-10 kiri sellainen säännöllinen joukko on toisen kanssa. Käsitteen "pikseli" voidaan ymmärtää kuvaavan dimensiotonta entiteettiä, kuten digitaalisen äänisignaalin näytettä, jota verrataan toiseen digitaaliseen äänisignaaliin, tai kolmi- tai useampiulotteista entiteettiä, kuten "vokselia" eli "tilavuuspikseliä", joka on kolmiulotteisten digitaalisten kuvien atominen rakenneosa. Pikseliarvot kuvakehyksissä ovat aina 15 positiivisia, mitä voidaan pitää postulaattina kuvankäsittelysovelluksessa; jos ehdotettua ratkaisua sovellettaisiin johonkin muuhun tarkoitukseen, missä vertailtavien arvojen etumerkki voisi vaihdella, tarvittaisiin jotakin esiprosessointia oikean toiminnan varmistamiseksi.It should be noted that the foregoing description contains illustrative examples, which do not limit the applicability of the invention. For example, the invention is by no means limited to comparing 16 pixels of two macroblocks of consecutive picture frames. Simultaneously, the number of pixels to be compared may be any other, the frames to be compared may have a relationship other than succession, and "pixels" may be anything other than the elements of a two-dimensional graphic representation. As a simple generalization, the term "macroblock" can be replaced by the term "regular set of multi-bit values", whereby the invention is applicable to all kinds of cases intended to determine the extent to which a similar set of jo-10 is written with another. The term "pixel" may be understood to describe a dimensionless entity such as a sample of a digital audio signal compared to another digital audio signal, or a three or more dimensional entity such as a "voxel" or "volume pixel" which is an atomic component of three-dimensional digital images. The pixel values in the photo frames are always 15 positive, which can be considered as a postulate in an image editing application; if the proposed solution were to be applied for another purpose where the sign of the values being compared could vary, some preprocessing would be needed to ensure correct operation.

• w • · 1 • 1 · * ·• w • · 1 • 1 · * ·

• I• I

• I » * · · • · • · « * · • · • · 1 • · · *·· • · • · · • » · * · 1 1 • · · • · ·««• I »* • • • • •« * • • • • • • • • • • • • • • • • • • 1 1 • • • • • • • • • • • • 1

• · I• · I

• · · • 1 1 • · · • · • « • 1 · * · * · ··· • · · 1 · • · • · · • « I ··· ♦ · .;' • · 1• · · • 1 1 • • • • • «• 1 · * · * · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · '' '' · ''. • · 1

Claims

1. A circuit for calculating the sum, hereinafter referred to as the SAD value, of absolute difference values between two regular quantities of multibit values, the circuit comprising: - a plurality of units of absolute difference values (602), each of which is. V. coupled to receive a multibit value from each of two quantities of regular *, ·, multibit values, and arranged to generate as an output an uncorrected abso- lution difference of two multibit values and a correction bit value, such as indicates the correction of the uncorrected absolute difference value, - a compression structure (611), coupled to receive uncorrected absolute difference values and indications of the correction bit values from the units of absolute ··· difference values (602) and arranged to form a sum vector and a transfer digit vector, which together represent the sum of the outputs of units:> t (s) of absolute difference values (611); *** characterized by the compression structure (611) comprising: * ·· y. 30 - a main compression tree (612) coupled to receive uncorrected absolute *** \ difference values from the units of absolute difference values (602), and wherein the main compression tree (612) comprises a plurality of interconnected adder and is arranged to compress a first set of input values to a second set. ***; output values, and ··· '35. a correction tree (613) coupled to receive indications of the correction bit values from the units of absolute difference values (602), and which correction tree (613) comprises a plurality of interconnected adder and is arranged to compress a third set of input values to a fourth set of output values, and which correction tree (613) is coupled to provide its output value to the adder of said main compression tree (612). 5

2. A circuit according to claim 1, characterized in that the couplings between said adder of said main compression tree (612) form a composition consisting of the piano such that the adder of the first plane is coupled to receive the eating utensils part of said main compression tree (612). the adder of the last plane is coupled to receive at least part of the output values of said main compression tree, and the adder of the second and subsequent planes has inputs connected to the outputs of the adjoining plane's adder.

Circuit according to claim 2, characterized in that said first branch of said main compression tree (612) comprises six adder, said second pad comprises four adder, the third plane comprises three adder, the fourth plane comprises two adder, the fifth plane comprises an adder and the sixth plane, which is said last pane, comprises two adder, - each said adder of the main compression tree (612) comprises three inputs, a transfer digit output and a sum output, - said adder of said correction tree (613) a composition consisting of the piano such that the adder of the first plane is coupled to receive the input memory of a part of said input values of said correction tree (612) and that: the adder of the second plane has the inputs connected to the outputs of the preceding planes adder, ** ··: - said correction tree (613) said first piano comprises television an adder, of: each having three inputs, a transfer digit output and a sum output, * ··: ... - the second tab of said correction tree (613) comprises two adder, one of which has two inputs, a transfer digit output and a sum output , and the: 30 others have three inputs, a transfer digit output and a sum output · ** '; - the third pane of said correction tree (613) comprises an adder, which comprises three inputs, a transfer digit output and a sum output, and the adder of the main compression tree and the correction tree are connected in a manner which shown in Figures 8 and 9.: *. 35 • * *, * ···.

4. A circuit according to claim 3, characterized in that said adder of said main compression tree and said correction tree is a carry save type adder. 32 117956

5. A circuit according to claim 1, characterized in that the correction bit values of a pair formed by two adjacent units of absolute difference values (602) are cumulated into the correction bit vector before entering the compression structure (611), the indications of the correction bit values being coupled from the units. of absolute difference values (602) to the compression structure (611) are said correction bit vectors.

6. A circuit according to claim 5, characterized in that it comprises a half-adder (721) for each pair formed by the adjacent units of absolute difference values, said half-adder (721) being coupled to receive said correction bit values and arranged to change said correction bit values to said correction bit vectors.

The circuit according to claim 1, characterized in that each unit of absolute difference values comprises: - an adder (701, 711), with a direct input to a multi-bit operand and an inverting input to a second multi-bit operand and a sum output with as many bits as there are bits in said multi-bit operand, 20. a transition digit output, arranged to form a bit value "1", if the operand supplied to said direct input is greater than the operand passed to said inverting input, and the bit value " 0 "in other cases, and · · II '- as many XOR ports (702, 712) as there are bits in said adder output, each XOR port (702, 712) having an input coupled to receive a bit of said sum output, and a second input coupled to receive an inverted value of said transfer digit output. • · * · • «· ··» · • · ·: ... i

8. A circuit according to claim 1, characterized in that it comprises a minimum SAD definer (621) coupled to receive SAD values formed in said compression structure (611) and arranged to compare a quantity successively. ** ·. calculated the SAD values to find the lowest of them.

The circuit according to claim 8, characterized in that said minimum SAD definers (621) comprise::: * · 35 - a register (1108) arranged to store the previously found minimum SAD value, «·« ·. ** ·. a sum vector input and a transfer digit vector input coupled to receive a first output value and a second output value from said main compression tree (612), - a three-input carry save adder (1102), the first of which is coupled to receive a sum vector via said sum vector input, one is coupled to receive a transfer digit vector via said transfer digit vector input and one is coupled to receive said previously found inverted value for minimum SAD, and which carry save adder (1102) is a sum output and a transfer digit. output, - a first adder (1104) coupled to receive operands from said sum and transfer digit output of said carry save adder (1102) and having a single byte thread arranged to indicate, by the addition of said operands, whether the value represented by said first and second output values from the main compression tree (612) is smaller than the previously found minimum SAD value, - arrangement (1105, 1106) of logic gates arranged to generate an indication where said single bit adder (1104) indicates the SAD value represented by said first and other output values obtained from the main compression tree (612) are lower than the previously found minimum SAD value at a specified suitable comparison time, - a second adder (1107) arranged to calculate the SAD value of said sum and transfer digit vectors, 20. a multiplexer (1109) coupled to receive said indication from the arrangement of said logic port (1105, 1106) and to respond to an active indication by storing the SAD value calculated with said second adder (1107) in said register (1108) and missing of an active indication by storing said time, the minimum SAD value found used in said register (1108). 25

A circuit according to claim 9, characterized in that said minimum SAD * · »· * definition (621) comprises: • · · - a ZERO_SAD input, arranged to convey an indication of whether sum - and the transfer digit vectors are connected in said sum and transfer digit * / 30 vector inputs to the SAD value which connects to two regular quantities: ***: multi-bit values, which are not associated with relative motion in the frame, and y .. ' - a bonus input and an activation arrangement (1207) formed by the logical *** ports, which respond to the indication in said ZERO SAD output by activating the inclusion of the bonus value via said bonus input in the calculation of: 35 SAD value, which SAD value is connected to two regular quantities of multi-bit values, which are not associated with relative motion in the frame.

A circuit according to claim 9, characterized in that said minimum SAD definers (621) comprise: - a threshold value input coupled to receive a threshold value, - a selection multiplexer (1203) arranged to controllably replace the coupling of the previously found mini SAD. - the threshold value coupling via said threshold value input to the input of said carry save adder (1102), and - an output register (1206) connected to the threshold coupled to receive and transmit an indication when said first adder (1104) said single bit input the value represented by said first and second output values 10 from said main compression tree (612) is higher than said threshold value.

The circuit according to claim 11, characterized in that - said minimum SAD definition (621) comprises a ZERO__SAD input arranged to convey an indication of whether the sum and transfer digit vectors in said sum and transfer digit output represent an SAD -value connected to two regular quantities of multi-bit values not associated with relative motion in the frame, and - said selection multiplexer (1203) responds to the indication in said ZERO_SAD input. 20

A circuit according to claim 9, characterized in that said minimum SAD definers (621) comprise: • · · - a ZERO_SAD input arranged to convey an indication of whether the sum and transfer digit vectors of said sum and transmission digit vector inputs represent an SAD value that connects to two regular * *! * ·· * amounts of multibit values not associated with relative motion in the frame, and: - an output register (1201) connected to the minimum SAD value and linked receiving and transmitting an indication, wherein said single bit adder (1104) said bit bit thread indicates that the SAD value represented by said first and second output values received from the main compressor tree (612) is higher than said timestamp: ***: donors found the minimum SAD value at a time when the sum and transfer digit φ · · vectors in said sum and transfer digit entries do not represent *** \ SAD value connected to two regular quantities of multi-bit values that are not * · * ··> associated with relative motion in the frame, which is different from the specified comparison time *. • · · *

Digital video codecs for encoding a digital video signal, characterized in that it comprises a circuit according to claim 1. 117956

A method for calculating the sum, hereinafter referred to as the SAD value, of the absolute difference values between two regular amounts of multibit values, wherein in the process: 5. (602) differences between multibit values taken in pairs from the corresponding locations of said two regular quantities are calculated; thus, uncorrected absolute difference values and correction bits are formed, and - the calculated uncorrected absolute difference values and correction bits are compressed (611) into SAD values; Characterized in that - as part of said compression (611), the correction bits are initially compressed separately from the compression of the uncorrected absolute difference values (612) and the compressed correction bits are measured at given locations in the main process comprising compression (612) of the absolute difference values. 15

Method according to claim 15, characterized in that the correction bits are taken from the calculation (602) of two adjacent absolute difference values and said correction bits are cumulated (721) in the correction bit vector before they are brought to the compression of the correction bits (613). 20

The method according to claim 15, characterized by comparing a quantity of: 1: 1 · successively calculated SAD values to find the lowest of them. • · · • · · 1

18. A method according to claim 17, characterized by prioritizing a SAD value which is connected to two regular quantities of multi-bit values, which are not • 1. "In the context of relative movement in the frame. • · 1 • · · • · · · • 1 · • · * ·· 2 3

19. A method according to claim 17, characterized by comparing the SAD value connected to two regular quantities of multi-bit values, which are not associated with relative movement in the frame, with a threshold value, and deciding on further measures depending on if the SAD value is lower or higher than. ···. said threshold value. • · t · «* ·· 2 · '* 1. 1 • · * · · • · · • · 1 · · 2 * · 3