FR2645985A1

FR2645985A1 - Circuit for carrying out linear conversion of digital data

Info

Publication number: FR2645985A1
Application number: FR8905121A
Authority: FR
Inventors: Jean Gobert
Original assignee: Laboratoires dElectronique et de Physique Appliquee
Current assignee: Laboratoires dElectronique Philips SAS
Priority date: 1989-04-18
Filing date: 1989-04-18
Publication date: 1990-10-19
Anticipated expiration: 2009-04-18
Also published as: FR2645985B1

Abstract

Circuit for carrying out a linear transformation Xi/xi of N digital data items Xi into N digital data items xi. It comprises calculating means which determine the data xi by carrying out the whole of the processing in serial mode with the aid of 1-bit operators organised into a layered structure, a data item Xi passing, in the course of the processing, through the same number of layers whatever the path followed. It also comprises input/output means comprising N modules which put the data Xi received in parallel into series at the input, and, at the output, de-serialise the data Xi which are delivered in parallel. The circuit makes it possible to carry out discrete cosine transforms DCT or inverse discrete cosine transforms IDCT or other discrete linear transforms. Application to image, speech processing.

Description

DESCRIPTION 'CIRCUIT POUR OPERER UNE TRANSFORMATION LINEAIRE DE DONNEES
NUMERIQUES". DESCRIPTION CIRCUIT FOR OPERATING A LINEAR TRANSFORMATION OF DATA
DIGITAL ".

L'invention concerne un circuit pour opérer une transformation linéaire Xj/xj de N données numériques Xj de b bits en N données numériques Xj de c bits, comprenant des moyens d'entrée/sortie respectivement des données Xj et xj et des moyens de calcul de la transformation linéaire
Xj/xj.The invention relates to a circuit for performing a linear transformation Xj / xj of N digital data Xj from b bits to N digital data Xj of c bits, comprising input / output means respectively data Xj and xj and calculation means of the linear transformation
Xi / xj.

Un tel circuit permet de réaliser des transformations linéaires telles qu'une transformée en cosinus discrète (DCT) ou une transformée en cosinus discrète inverse (IDCT) ou d'autres transformées linéaires discrètes c'est-à-dire où les signaux à traiter sont numérisés. Such a circuit makes it possible to carry out linear transformations such as a discrete cosine transform (DCT) or an inverse discrete cosine transform (IDCT) or other discrete linear transforms that is to say where the signals to be processed are digitized.

De telles transformées trouvent leur application dans le traitement des images, de la parole, etc. Such transforms find their application in the processing of images, speech, etc.

Une transformée de ce genre est connue du document FR-2 596 892. Les transformées linéaires connues mettent en oeuvre des séries d'opérations de multiplication et d'addition. Diverses architectures ont été proposées pour réaliser ces séries d'opérations généralement à base de processeurs d'usage général ou d'opérateurs de conception classique. Ce document de l'art antérieur cité exploite les particularités de certaines opérations a effectuer et définit des opérateurs dédiés qui permettent de disposer d'une architecture globale plus compacte ayant un meilleur rapport performance/prix que les solutions précédentes. Ainsi chaque multiplieur est câblé selon la valeur du coefficient déterminé associé à la branche dans laquelle il se trouve.De même les opérations d'addition/soustraction sont effectuées a l'aide d'additionneurs/soustracteurs précâblés. A transform of this type is known from FR-2,596,892. The known linear transforms implement series of operations of multiplication and addition. Various architectures have been proposed to carry out these series of operations generally based on processors of general use or operators of conventional design. This document of the cited prior art exploits the peculiarities of certain operations to be performed and defines dedicated operators that make it possible to have a more compact overall architecture having a better performance / price ratio than the previous solutions. Thus each multiplier is wired according to the value of the determined coefficient associated with the branch in which it is located. Likewise the operations of addition / subtraction are carried out using adders / subtractors prewired.

Un tel circuit utilise des multiplieurs du type parallèle-série qui opèrent en boucle fermée sur plusieurs bits à l'aide d'un accumulateur qui stocke les résultats intermédiaires avant de délivrer le résultat. final de la multiplication. Ce résultat doit être sérialisé à l'issue de chaque multiplication pour opérer avec des additionneurs série. Dans un tel circuit les données numériques entrent et sortent habituellement en mode parallèle et rien n'est indiqué sur la manière d'interfacer un tel circuit avec le monde extérieur. Such a circuit uses parallel-series type multipliers that operate in closed loop on several bits using an accumulator which stores the intermediate results before delivering the result. final multiplication. This result must be serialized at the end of each multiplication to operate with serial adders. In such a circuit the digital data usually enters and leaves in parallel mode and nothing is indicated on how to interface such a circuit with the outside world.

Le problème posé est de réaliser un circuit intégré très économique qui simplifie les chemins suivis par les données à l'intérieur du circuit, en supprimant tout contrôle associé et en adaptant les interfaces entrée/sortie pour optimiser a la fois la densité d'intégration et les échanges des données avec l'extérieur, le calcul devant être effectué avec une grande précision-. The problem is to realize a very economical integrated circuit that simplifies the paths followed by the data inside the circuit, eliminating any associated control and adapting the input / output interfaces to optimize both the integration density and the exchange of data with the outside, the calculation to be performed with great precision.

La solution à ce problème posé consiste en ce que les moyens de calcul déterminent les données xj en effectuant la totalité du traitement en mode série à l'aide d'opérateurs à 1 bit organisés selon une structure en couches, commandée par une horloge H, une donnée Xj traversant au cours du traitement le même nombre de couches quel que soit le trajet suivi, et les moyens d'entrée/sortie comprenant N modules de sérialisationldésérialisation qui opèrent en entrée la sérialisation des données Xj reçues en parallèle et en sortie la désérialisation des données xj qui sont délivrées en parallèle par le circuit, chaque module comprenant
une batterie de 1 registres qui opère en mode parallèle/série, où 1 est égal au plus grand des nombres b ou c,
un expanseur de bit de signe qui donne à chaque donnée d'entrée sérialisée Xj une taille prédéterminée expansée sur p bits, liée au nombre de bits nécessaires aux moyens de calcul pour qu'ils opèrent avec une précision de calcul prédéterminée,
Le calcul des échantillons transformés xj peut être présenté sous une forme minimisant le nombre d-'opérations à effectuer. Une telle mise en oeuvre universellement connue est appelée papillon et est aussi applicable à beaucoup de transformées. A titre d'exemple, une transformée IDCT monodimensionnelle de huit échantillons ne nécessite plus selon cette mise en oeuvre que 16 multiplications et 30 additions ou soustractions. Selon l'invention toutes les opérations sont faites en bits série c'est-à-dire que les opérateurs ne traitent qu'un bit de leurs opérandes à la fois.Pour toutes les opérations, le bit de poids faible des opérandes est utilisé le premier. Le résultat est aussi généré avec le bit de poids faible en tête. Pour tous les opérateurs du circuit, le temps d'utilisation d'un bit d'une opérande (une période) est égal au temps de génération d'un bit du résultat. Il en résulte que l'on peut directement câbler, à un retard constant près réalise sé par une ou plusieurs bascules, la sortie d'un opérateur à l'entrée du suivant. Ceci amène une grande simplicité matérielle pour le chemin de données et supprime le contrôle asso cle . The solution to this problem is that the calculation means determine the data xj by carrying out all the processing in series mode using 1-bit operators organized in a layered structure, controlled by a clock H, a datum Xj traversing during the processing the same number of layers whatever the path followed, and the input / output means comprising N serialization modules de-serialization which operate as input serialization data Xj received in parallel and output deserialization data xj which are delivered in parallel by the circuit, each module comprising
a battery of 1 registers which operates in parallel / series mode, where 1 is equal to the greatest of the numbers b or c,
a sign bit expander that gives each serialized input data Xj a predetermined size expanded on p bits, related to the number of bits required by the computing means to operate with a predetermined calculation precision,
The calculation of the transformed samples xj can be presented in a form that minimizes the number of operations to be performed. Such a universally known implementation is called a butterfly and is also applicable to many transformations. By way of example, a one-dimensional IDCT transform of eight samples no longer requires, according to this implementation, only 16 multiplications and 30 additions or subtractions. According to the invention, all the operations are done in serial bits, that is to say that the operators process only one bit of their operands at the same time. For all the operations, the least significant bit of the operands is used. first. The result is also generated with the least significant bit at the head. For all the operators of the circuit, the time of use of a bit of an operand (a period) is equal to the generation time of a bit of the result. As a result, it is possible to directly wire, at a constant delay near realizes by one or more flip-flops, the output of an operator at the input of the next. This brings great material simplicity to the data path and removes the associated control.

Pour assurer un transfert efficace des données, à un instant donné, dans chaque batterie de registres circulent I bits pris en succession et continûment dans la suite de bits Xj(LSB) à Xj(MSB), xj-1(LSB) à xj-1(MSB), où les lettres LSB et MSB représentent le bit respectivement le moins et le plus significatif. To ensure an efficient data transfer, at a given moment, in each battery of registers circulates I bits taken in succession and continuously in the sequence of bits Xj (LSB) to Xj (MSB), xj-1 (LSB) to xj- 1 (MSB), where the letters LSB and MSB represent the bit least and the most significant respectively.

Pour le chargement et le déchargement externe, chaque batterie qui est commandée par un signal de commande
VP, recopie en mode parallèle le contenu de la batterie précédente de sorte qu'une donnée numérique transite successivement de la dernière batterie vers la première batterie, la première batterie étant connectée à un port de sortie et la dernière batterie à un port d'entrée, réalisant ainsi en N cycles les N introductions de données sur le port d'entrée et simultanément la présentation des N résultats précédents sur le port de sortie. On opère sans troncature interne avant que le résultat final soit obtenu avec la précision recherchée. On conserve ainsi toute la précision de calcul et tous les opérateurs possèdent une cadence constante. Pour cela le circuit possède sur chaque entrée un expanseur de bit de signe.L'expanseur comprend une bascule qui stocke le bit de signe de la donnée Xj et un sélecteur qui étend ledit bit de signe pour donner à la donnée Xj le nombre de bits prédéterminé p. For external charging and discharging, each battery that is controlled by a command signal
VP, copies in parallel mode the contents of the previous battery so that a digital data transits successively from the last battery to the first battery, the first battery being connected to an output port and the last battery to an input port , thus realizing in N cycles the N data introductions on the input port and simultaneously the presentation of the N previous results on the output port. We operate without internal truncation before the final result is obtained with the desired precision. This preserves all the precision of calculation and all operators have a constant rate. For this, the circuit has on each input a sign bit expander. The expander comprises a latch which stores the sign bit of the data Xj and a selector which extends said sign bit to give the data Xj the number of bits. predetermined p.

L'invention sera mieux comprise à l'aide des figures suivantes qui représentent
figure 1 : un schéma d'un circuit selon l'invention
figure 2 : un diagramme des temps du fonctionnement du circuit selon l'invention.The invention will be better understood with the aid of the following figures which represent
FIG. 1: a diagram of a circuit according to the invention
FIG. 2: a diagram of the operating times of the circuit according to the invention.

figure 3 . un schéma d'une batterie de registres R. figure 3. a diagram of a battery of R. registers

figure 4 : un schéma représentant les connexions au port d'entrée et au port de sortie pour échanger les données en mode parallèle. Figure 4: A diagram showing the connections to the input port and the output port for exchanging data in parallel mode.

A titre d'exemple, la figure t représente un schéma pour opérer la transformée en cosinus discrète inverse avec N=8 échantillons. Le circuit comprend N batteries de registres R1 à R8 à décalage de 1=16 cellules qui servent d'une part à l'introduction des données Xi à X8 de b=12 bits et d'autre part à la sortie des résultats x1 à xi de c=16 bits. By way of example, FIG. 1 represents a diagram for operating the inverse discrete cosine transform with N = 8 samples. The circuit comprises N banks of registers R1 to R8 with shift of 1 = 16 cells which serve firstly to the input of the data Xi to X8 of b = 12 bits and secondly to the output of the results x1 to xi of c = 16 bits.

L'interface avec l'extérieur se fait en parallèle grâce à ces registres. Les entrées parallèles d'une batterie de registres quelconque Ri sont reliées aux sorties parallèles du registre Ri+1 (à l'exception du dernier). Le port d'entrée 51 est connecté à l'entrée de la batterie de registres Ri et le port de sortie 52 à la sortie de la batterie de registres R1. Les sorties série des batteries sont reliées à des additionneurs à i bit t21 à 124 et à un soustracteur 13 et à des bascules 111 à 116 à travers des sélecteurs 91 à 98 permettant d'aiguiller (commande BS) soit les sorties des batteries R soit des bascules 101 à 108 ayant mémorisé, sur une horloge de commande Hs, le bit de signe des données. On réalise ainsi l'extension de signe nécessaire au fonctionnement de l'ensemble. Le calcul de la transformée linéaire proprement dite est effectué sur les couches Ci à C7. Les données traversent les deux premières couches en deux cycles d'horloge. En aval de ces deux couches une batterie de 16 multiplieurs 311 à 3116 de type série connu (couche 4) effectue les produits des signaux émis par la couche C2 par des coefficients constants codés sur q=lO bits et délivre 16 résultats. La valeur q est choisie en fonction de la précision prédétermi:iée des calculs. Ces 16 résultats sont repris par 3 couches Cs, Cs, C7 successives qui réalisent les combinaisons nécessaires à la transformée linéaire IDCT. Une dernière couche d'additionneurs Ci est utilisée pour réaliser des arrondis.Enfin les résultats xi à xi sont introduits par l'entrée série ES1 à ESg des batteries de registres à travers des cellules de retard (couche C9) 80t à 808. The interface with the outside is done in parallel thanks to these registers. The parallel inputs of any register battery Ri are connected to the parallel outputs of the register Ri + 1 (except the last). The input port 51 is connected to the input of the register bank Ri and the output port 52 to the output of the register bank R1. The serial outputs of the batteries are connected to adders i bit t21 to 124 and to a subtractor 13 and to flip-flops 111 to 116 through selectors 91 to 98 for switching (control BS) either the outputs of the batteries R is flip-flops 101 to 108 having stored, on a control clock Hs, the sign bit of the data. This provides the extension of sign necessary for the operation of the whole. The calculation of the linear transform proper is carried out on the layers Ci to C7. The data passes through the first two layers in two clock cycles. Downstream of these two layers a battery of 16 multipliers 311 to 3116 of known series type (layer 4) performs the products of the signals emitted by the layer C2 by constant coefficients encoded on q = 10 bits and delivers 16 results. The value q is chosen according to the predefined precision of the calculations. These 16 results are taken up by 3 successive Cs, Cs, C7 layers which perform the necessary combinations to the IDCT linear transform. A final layer of adders Ci is used to perform rounding. Finally the results xi to xi are introduced by the serial input ES1 to ESg of the register banks through delay cells (layer C9) 80t at 808.

Les moyens de calcul de la transformation linéaire comprennent les couches C2 à Ci. Les moyens d'entrée-sortie comprennent les ports d'entrée Si et de sortie 52, les registres Ri à Ri, les couches Cr et Cg. L'ensemble est commandé par une horloge H qui m'est pas représentée sur la figure 1 pour ne pas alourdir le dessin. The means for calculating the linear transformation comprise the layers C2 to Ci. The input-output means comprise the input and output ports 52, the registers Ri to Ri, the layers Cr and Cg. The assembly is controlled by a clock H which is not shown in Figure 1 to not weigh down the drawing.

Les batteries de registres R sont l'une après l'autre chargées en mode parallèle, sous le contrôle du signal
VP (fig. 3), avec les valeurs Xi à Xi. Les b bits occupent sur la figure 4 les positions les plus à droite des l registres des batteries. On présente successivement sur 8 cycles ces valeurs sur le port d'entrée Si. A chaque cycle le registre Ri recopie le contenu du registre Ri+1. Les registres fonctionnent ensuite en sérialiseurs sur 1 cycles sous le contrôle du signal VS. (fig. 3). Compte-tenu des divers étages d'addition/soustraction et de la multiplication par un coefficient de m bits, on doit pouvoir représenter des nombres sur une échelle accrue entre la colonne 2 et la colonne 9.Le fonctionnement de l'ensemble du circuit en mode pipeline impose d'allouer un même nombre de cycles à tous les opérateurs. Afin de maintenir cette cadence unique sur tout le circuit, on aligne à la valeur maximale, soit p bits, la taille des données bien que l'échelle des nombres à représenter passe progressivement de 2b sur la colonne 2 à 2P sur la colonne 9. Pour ce faire, après la sérialisation de b cycles, on sélectionne les bascules 101 à 108, qui sont connectées à l'entrée de chaque batterie sur la connexion propre au bit de signe, et le bit de signe des données est répliqué sur k cycles, ce qui réalise une extension de l'échelle de représentation des données à 2P, avec p=b+k.Après l'ensemble des calculs, pour ne garder que les c bits les plus significatifs, il faut réaliser une opération d'arrondi pour éviter l'erreur systématique par défaut que provoquerait une troncature effectuée sans précaution. Cette étape nécessite une dernière couche Ca d'additionneurs 721 à 728. Sur une entrée d'un tel additionneur, on injecte le nombre à arrondir et sur l'autre un nombre correspondant à la moitié du plus grand nombre susceptible d'être ignoré par la troncature effectuée par la prise en compte des seuls c bits les plus significatifs. Ce nombre est égal à 2P-C-1 et est tres facilement généré. Il suffit par exemple dans une batterie de (p-c-2) registres de placer un 1 dans le registre de poids le plus élevé et de placer des 0 dans tous les autres.Le contenu de cette batterie est utilisé pour effectuer l'opération d'arrondi en mode série. Cette valeur d'arrondi est connue au départ selon la précision désirée. Les résultats traversent les registres 801 à 808 qui réalisent un retard D. A la sortie de ces registres, les poids faibles sont ignorés, et les c poids forts sont introduits par leurs entrées série ESr à ESg dans les batteries de registres
Ri à Ri. Les résultats x1 à xi contenus dans les batteries Ri à Ri sont ensuite envoyés sur le port de sortie 52 successivement sur 8 cycles. A chaque cycle le registre Ri recopie le contenu du registre Ri+1.The banks of R registers are one after the other loaded in parallel mode, under the control of the signal
VP (Fig. 3), with the values Xi to Xi. The bits occupy in Figure 4 the rightmost positions of the registers of the batteries. These values are presented successively over 8 cycles on the input port Si. At each cycle, the register Ri copies the contents of the register Ri + 1. The registers then operate as 1-cycle serializers under VS signal control. (Fig 3). Taking into account the various stages of addition / subtraction and the multiplication by a coefficient of m bits, it must be possible to represent numbers on an increased scale between the column 2 and the column 9. The operation of the whole circuit in Pipeline mode requires the same number of cycles to be allocated to all operators. In order to maintain this unique rate over the entire circuit, the maximum size, p bits, is aligned with the size of the data, although the scale of the numbers to be represented is progressively increased from 2b on column 2 to 2P on column 9. To do this, after the serialization of b cycles, we select the flip-flops 101 to 108, which are connected to the input of each battery on the connection specific to the sign bit, and the sign bit of the data is replicated on k cycles. , which achieves an extension of the scale of representation of the data to 2P, with p = b + k.After all the computations, to keep only the c most significant bits, it is necessary to realize a rounding operation to avoid the default systematic error that would be caused by truncation done without precaution. This step requires a last layer Ca of adders 721 to 728. On an input of such an adder, the number to be rounded is injected and on the other a number corresponding to half of the largest number that can be ignored by truncation performed by taking into account the only c most significant bits. This number is equal to 2P-C-1 and is very easily generated. For example, in a battery of (pc-2) registers it is sufficient to place a 1 in the highest weight register and to place 0 in all the others. The contents of this battery are used to perform the operation of rounded in serial mode. This rounding value is known initially according to the desired accuracy. The results go through the registers 801 to 808 which perform a delay D. At the output of these registers, the low weights are ignored, and the strongest c are introduced by their serial inputs ESr to ESg into the banks of registers.
Ri to Ri. The results x1 to xi contained in the batteries Ri to Ri are then sent to the output port 52 successively over 8 cycles. At each cycle the register Ri copies the contents of the register Ri + 1.

Pour opérer une transformation en cosinus discrète inverse selon la structure dite en papillon représentée sur la figure 1, des opérations d'addition, de soustraction et de multiplication sont à réaliser Après leur chargement, les données d'entrée Xi à Xi sont respectivement contenues dans les batteries de registres Ri à Ri. Dans cet exemple les opérations suivantes sont effectuées
La couche C2 comprend Il opérateurs à 1 bit
l'additionneur 12t détermine X3+X7
l'additionneur 122 détermine X1+X5
l'additionneur t23 détermine X4+X6
l'additionneur 124 détermine X2+X8
. le soustracteur 13 détermine Xi-Xs
. 6cellules à retard 111 à 116 retardent d'une période d'horloge H respectivement les données X7, Xg, Xi, X2t X4, Xi. To perform an inverse discrete cosine transformation according to the so-called butterfly structure shown in FIG. 1, addition, subtraction and multiplication operations are to be performed. After their loading, the input data Xi to Xi are respectively contained in FIG. the banks of registers Ri to Ri. In this example, the following operations are performed
C2 layer includes 1-bit operators
the adder 12t determines X3 + X7
the adder 122 determines X1 + X5
the adder t23 determines X4 + X6
the adder 124 determines X2 + X8
. the subtractor 13 determines Xi-Xs
. Delay cells 111 to 116 delay the data X7, Xg, Xi, X2t X4, X1 by one clock period H, respectively.

La couche C3 comprend 12 opérateurs à 1 bit
l'additionneur 22 qui additionne la sortie des additionneurs 123 et 124,
11 cellules à retard 21t à 2111 placées après chacun des opérateurs à 1 bit de la couche C2
La couche Cs comprend 16 opérateurs à 1 bit formés par les multiplieurs 311 à 3116.The C3 layer includes 12 1-bit operators
the adder 22 which adds the output of the adders 123 and 124,
11 delay cells 21t to 2111 placed after each of the 1-bit operators of the C2 layer
The layer Cs comprises 16 1-bit operators formed by the multipliers 311 to 3116.

Chaque multiplieur effectue la multiplication de la donnée délivrée par un opérateur de la couche C3 par un coefficient déterminé. Le tableau 1 indique la sortie de l'opérateur et le coefficient qui est traité par le multiplieur concerné.

Each multiplier multiplies the data delivered by an operator of the layer C3 by a determined coefficient. Table 1 shows the output of the operator and the coefficient that is processed by the multiplier concerned.

Multiplieur <SEP> 31i <SEP> <SEP> 312 <SEP> 313 <SEP> 314 <SEP> 315 <SEP> 316 <SEP> 317
<tb> Opérateur <SEP> 211 <SEP> 212 <SEP> 213 <SEP> 214 <SEP> 215 <SEP> 216 <SEP> 217
<tb> Coefficient <SEP> -334 <SEP> 98 <SEP> 138 <SEP> 181 <SEP> 181 <SEP> 301 <SEP> 71
<tb> <SEP> 318 <SEP> 319 <SEP> 3110 <SEP> <SEP> 311r <SEP> <SEP> 3112 <SEP> 3113 <SEP> 3114 <SEP>
<tb> <SEP> 218 <SEP> <SEP> 219 <SEP> 2110 <SEP> <SEP> 22 <SEP> 219 <SEP> 211o <SEP> <SEP> 21 <SEP>
<tb> <SEP> -464 <SEP> 201 <SEP> 355 <SEP> 213 <SEP> 355 <SEP> 301 <SEP> 163
<tb> <SEP> 3115 <SEP> <SEP> 3116 <SEP>
<tb> <SEP> 21s <SEP> 217
<tb> <SEP> 71 <SEP> 201
<tb>
Tableau 1
L'exemple étant décrit avec une transformée IDCT, les coefficients sont choisis dans une table de cosinus. Ainsi par exemple, la valeur 181 est égale à 256/J2, c'est-à-dire cos fil/4, la valeur maximale 256 représentant cos 0. Multiplier <SEP> 31i <SEP><SEP> 312 <SEP> 313 <SEP> 314 <SEP> 315 <SEP> 316 <SEP> 317
<tb> Operator <SEP> 211 <SEP> 212 <SEP> 213 <SEP> 214 <SEP> 215 <SEP> 216 <SEP> 217
<tb> Coefficient <SEP> -334 <SEP> 98 <SEQ> 138 <SEQ> 181 <SEQ> 181 <SEQ> 301 <SEP> 71
<tb><SEP> 318 <SEP> 319 <SEP> 3110 <SEP><SEP> 311r <SEP><SEP> 3112 <SEP> 3113 <SEP> 3114 <SEP>
<tb><SEP> 218 <SEP><SEP> 219 <SEP> 2110 <SEP><SEP> 22 <SEP> 219 <SEP> 211o <SEP><SEP> 21 <SEP>
<tb><SEP> -464 <SEP> 201 <SEP> 355 <SEP> 213 <SEP> 355 <SEP> 301 <SEP> 163
<tb><SEP> 3115 <SEP><SEP> 3116 <SEP>
<tb><SEP> 21s <SEP> 217
<tb><SEP> 71 <SEP> 201
<Tb>
Table 1
As the example is described with an IDCT transform, the coefficients are chosen from a cosine table. For example, the value 181 is equal to 256 / J2, that is, cos wire / 4, the maximum value 256 representing cos 0.

Les données sur les sorties des multiplieurs sont délivrées aux opérateurs de la couche Cs tel que l'additionneur 421 est connecté aux multiplieurs 311 et 312 l'additionneur 422 est connecté aux multiplieurs 312 et 313
l'additionneur 423 est connecté aux multiplieurs 318 et 3111 l'additionneur 424 est connecté aux multiplieurs 3112 et 3113 .le soustracteur 43i est connecté aux multiplieurs 316(+)et 317(-) .le soustracteur 432 est connecté aux multiplieurs 31s(+)et 31to(-) .le soustracteur 433 est connecté aux multiplieurs 3111(+)et 31rs(-) .le soustracteur 434 est connecté aux multiplieurs 311s(-)et 3lis(+) les deux cellules à retard-41t et 412 retardent les données respectivement des multiplieurs 314 et 315.The data on the outputs of the multipliers are delivered to the operators of the layer Cs such that the adder 421 is connected to the multipliers 311 and 312 the adder 422 is connected to the multipliers 312 and 313
the adder 423 is connected to the multipliers 318 and 3111 the adder 424 is connected to the multipliers 3112 and 3113 .the subtractor 43i is connected to the multipliers 316 (+) and 317 (-) .the subtractor 432 is connected to the multipliers 31s (+ ) and 31to (-) .the subtractor 433 is connected to the multipliers 3111 (+) and 31rs (-) .the subtractor 434 is connected to the multipliers 311s (-) and 3lis (+) the two delay cells-41t and 412 delay the data respectively multipliers 314 and 315.

La couche C6 est connectée à la couche Cs tel que l'additionneur 52t est connecté aux opérateurs 421 et 411 l'additionneur 522 est connecté aux opérateurs 422 et 412 l'additionneur 523 est connecté aux opérateurs 431 et 423 l'additionneur -524 est connecté aux opérateurs 423 et 432 .l'additionneur 525 est connecté aux opérateurs 433 et 434 .lè soustracteur 531 est connecté aux opérateurs 421(-)et 411(+) .le soustracteur 532 est connecté aux opérateurs 412(+)et 422(-) le soustracteur 533 est connecté aux opérateurs 424(-)et 433(+)
La couche C7 est connectée à la couche C6 tel que l'additionneur 621 est connecté aux opérateurs 53t et 523 l'additionneur 622 est connecté aux opérateurs 521 et 524 l'additionneur 623 est connecté aux opérateurs 532 et 533 l'additionneur 624 est connecté aux opérateurs 522 et 525 .le soustracteur 63t est connecté aux opérateurs 53t(+) et 523(-) .le soustracteur 632 est connecté aux opérateurs 521(+) et 524(-) .le soustracteur 633 est connecté aux opérateurs 532(+) et 533(-) .le soustracteur 634 est connecté aux opérateurs 522(+) et 525(-)
Le résultat de la transformée linéaire se présente à la sortie de la couche C7.Mais seulement c bits de ce résultat doivent être conservés. il est donc nécessaire d'effectuer une opération d'arrondi pour éviter l'erreur systématique par défaut que provoquerait une simple troncature. Pour cela chaque opérateur de la couche C7 est réuni respectivement à un additionneur d'une couche Ci, 72t à 728 dont l'autre entrée reçoit une valeur d'arrondi qui est égale à la moitié de la plus grande valeur susceptible d'être ignorée par la troncature. Une manière de la générer a été décrite précédemment dans un exemple.The layer C6 is connected to the layer Cs such that the adder 52t is connected to the operators 421 and 411 the adder 522 is connected to the operators 422 and 412 the adder 523 is connected to the operators 431 and 423 the adder -524 is connected to the operators 423 and 432, the adder 525 is connected to the operators 433 and 434. The subtractor 531 is connected to the operators 421 (-) and 411 (+). The subtractor 532 is connected to the operators 412 (+) and 422 ( -) the subtractor 533 is connected to the operators 424 (-) and 433 (+)
The layer C7 is connected to the layer C6 such that the adder 621 is connected to the operators 53t and 523 the adder 622 is connected to the operators 521 and 524 the adder 623 is connected to the operators 532 and 533 the adder 624 is connected to the operators 522 and 525 .the subtractor 63t is connected to the operators 53t (+) and 523 (-) .the subtractor 632 is connected to the operators 521 (+) and 524 (-) .the subtractor 633 is connected to the operators 532 (+ ) and 533 (-) .the subtractor 634 is connected to the operators 522 (+) and 525 (-)
The result of the linear transform is at the output of the C7 layer. But only c bits of this result should be kept. it is therefore necessary to perform a rounding operation to avoid the systematic error by default that would cause a simple truncation. For this purpose, each operator of the layer C7 is joined respectively to an adder of a layer Ci, 72t to 728 whose other input receives a rounding value which is equal to half of the largest value that can be ignored. by truncation. One way to generate it has been described previously in an example.

Le résultat final apparaît sur la couche Ci. Pour que ce résultat puisse être introduit dans les batteries de registres Ri à Ri d'entrée à l'instant optimal, une couche Cg formée uniquement de cellules de retard 80t à 808 permet de retarder les données d'une valeur D de sorte que le premier bit de chaque donnée Xi à Xi se présente exactement à l'instant où le premier registre de chaque batterie est libéré par le transfert série, cet instant correspondant au début du cycle de calcul suivant. Ainsi dans chaque batterie Ri à Ri, les données circulent de manière continue.Les cellules à retard soi à 80e génèrent respectivement les données xl, x2, X4, Xi, x6, x7, xs, xi. Selon le schéma de la figure 1 les données tranformées xi à xi sont introduites chacune en série respectivement dans les batteries de registres Ri à Ri. The final result appears on the layer Ci. In order for this result to be introduced into the input register banks Ri-Ri at the optimum instant, a layer Cg formed solely of delay cells 80t at 808 makes it possible to delay the data. of a value D so that the first bit of each data Xi to Xi is exactly at the moment when the first register of each battery is released by the serial transfer, this instant corresponding to the beginning of the next calculation cycle. Thus, in each battery Ri to Ri, the data flows continuously. The delay cells 80e respectively generate the data x1, x2, X4, Xi, x6, x7, xs, xi. According to the diagram of FIG. 1, the transformed data x 1 to x 1 are each introduced in series respectively in the banks of registers Ri to Ri.

Tous les registres des additionneurs et des soustracteurs d'une couche, y compris ceux du multiplieur série doivent être remis à zéro au début du calcul opéré par cette couche. Un dispositif à décalage non représenté permet d'effectuer ces remises à zéro. All registers of adders and subtracters of a layer, including those of the serial multiplier, must be reset at the beginning of the calculation performed by this layer. An unrepresented shift device makes it possible to carry out these resets.

Le déroulement temporel des opérations est décrit sur le diagramme des temps de la figure 2 où le temps se déroule de gauche à droite. On a représenté les opérations qui sont effectuées pour une donnée d'entrée Xj qui est transformée à la fin d'un cycle complet en une donnée Xj. Un cycle complet dure de l'instant t à l'instant tB. The time course of the operations is described in the timing diagram of Figure 2 where the time runs from left to right. The operations that are performed for an input data Xj which is transformed at the end of a complete cycle into a data item Xj are represented. A complete cycle lasts from time t to time tB.

Chaque rectangle horizontal non-hachuré symbolise la période d'activité de la couche concernée. Dès qu'une couche a terminé le traitement complet des bits d'une donnée j, elle peut commencer celui relatif à la donnée j+1. Il peut être avantageux, pour des raisons pratiques, de ménager un léger intervalle G de durée r-p entre les deux traitements comme indiqué sur ce chronogramme, mais cela ne change en rien le principe du système. Les opérations en cours dans les différents opérateurs sont complètement pipelinées, la cellule d'une couche traitant un bit des opérandes pendant que celle de la couche amont effectue l'opération précédente sur le bit de poids immédiatement supérieur. Each horizontal non-hatched rectangle symbolizes the period of activity of the layer concerned. As soon as a layer has completed the complete processing of the bits of a data item j, it can begin that relating to the data j + 1. It may be advantageous, for practical reasons, to provide a slight interval G of duration r-p between the two treatments as indicated in this timing diagram, but this does not change the principle of the system. The operations in progress in the various operators are completely pipelined, the cell of a layer processing a bit of the operands while that of the upstream layer performs the preceding operation on the next higher weight bit.

Si l'on démarre le décalage des batteries de registres Ri à Ri au temps t, les b bits de Xj passent dans la couche Ci avec le bit le moins significatif LSB en tête. Après les b bits, chaque bit de signe est répliqué sur k cycles. Le signal de commande BS permet d'effectuer cette réplique. La cellule C2 délivre ses premières données au temps t+1 et la cellule C3 au temps t+2. Les données entrent dans les multiplieurs au temps t+2 et sortent à un temps t+2+d où d est la durée d'opération d'un multiplieur dépendant du multiplieur série choisi. Du temps t au temps t+b les batteries de registres se vident de leurs données Xj. If the shift of the banks of registers Ri to Ri is started at time t, the b bits of Xj pass in the layer Ci with the least significant bit LSB at the head. After the b bits, each sign bit is replicated on k cycles. The control signal BS makes it possible to perform this replica. Cell C2 delivers its first data at time t + 1 and cell C3 at time t + 2. The data enters the multipliers at time t + 2 and leaves at a time t + 2 + where d is the duration of operation of a multiplier depending on the chosen series multiplier. From time t to time t + b, the banks of registers empty their data Xj.

-En sortie de C7 le résultat est sur p bits et la partie utile du résultat est constituée par les c bits de poids fort (partie non hachurée). La couche Ci effectue l'arrondi ce qui prend un temps d'horloge pour délivrer les c bits de xi. Les données xj sortent donc de la couche C9 entre. At the output of C7, the result is on p bits and the useful part of the result is constituted by the c most significant bits (non-hatched part). The layer Ci performs the rounding which takes a clock time to deliver the c bits of xi. The data xj therefore leave the layer C9 between.

le temps t+r et t+r+c. On constate qu'à chaque période d'horloge comprise entre t et tA, un registre est libéré. On met à profit cette opportunité pour charger dans la batterie de registres, en série à la suite des données Xj, les données Xj-1 calculées au cycle précédent.Pour cela, le résultat xj (qui sera chargé dans les registres au cycle suivant) est retardé d'une durée D afin d'introduire les données Xj dans les batteries de registres R1 à Ri à l'instant t. Lorsque les données xj-1 retardées du cycle précédent ont été chargées sur C périodes d'horloge (du témps t au temps tA selon l'exemple), elles sont déchargées en parallèle (temps tA) sur un coup d'horloge pour la première batterie à l'aide du port de sortie puis de même pour les autres batteries en faisant passer les données d'une batterie dans l'autre. En utilisant les batteries de registres de cette manière on fait l'économie de registres de sortie supplémentaires. Préférentiellement, dans le cas général, le nombre 1 de registres est égal au nombre de bits le plus élevé soit de Xj (b) soit de xj (c). the time t + r and t + r + c. It is found that at each clock period between t and tA, a register is released. We take advantage of this opportunity to load in the battery of registers, in series following the data Xj, the data Xj-1 calculated in the previous cycle. For this, the result xj (which will be loaded in the registers to the next cycle) is delayed by a duration D in order to introduce the data Xj into the banks of registers R1 to Ri at time t. When the data xj-1 delayed from the previous cycle have been loaded on C clock periods (from the time t to the time tA according to the example), they are discharged in parallel (time tA) on a clock stroke for the first time. battery using the output port and the same for other batteries by passing data from one battery to the other. Using the register banks in this way saves additional output registers. Preferably, in the general case, the number 1 of registers is equal to the highest number of bits of either Xj (b) or xj (c).

Ainsi au cours d'un même cycle les batteries de registres opèrent simultanément (de t à tA) en sérialiseur des données Xj et en désérialiseur des données Xj-1. Elles permettent aussi dans une autre phase du cycle de calcul (de tA à tB) de sortir en parallèle, pour l'ensemble des batteries, les résultats xj-1 et d'entrer en parallèle, pour l'ensemble des batteries, les données Xj+1. La figure 2 indique qu'au cours de la même période d'horloge, la batterie Ri par exemple se décharge sur le port 52 des données xj-1 et se charge en provenance du port 52 par les données Xj+1. Thus, during the same cycle, the banks of registers operate simultaneously (from t to tA) in the serializer of the data Xj and in the deserializer of the data Xj-1. They also allow in another phase of the computation cycle (from tA to tB) to go out in parallel, for all the batteries, the results xj-1 and to go in parallel, for all the batteries, the data xj + 1. FIG. 2 indicates that during the same clock period, the battery Ri for example discharges on the port 52 of the data xj-1 and charges from the port 52 by the data Xj + 1.

La figure 3 représente le schéma d'une de ces batteries, par exemple la batterie Ri. Elle comprend 1 registres à décalage R11, R12,...RI1 chacun étant muni en entrée d'un sélecteur 301, 302,. ..30î qui permet de faire opérer les registresasoit en mode série soit en mode parallèle. Chaque sélecteur reçoit une commande VS pour déterminer le mode sélectionné pour l'entrée des données parallèle ou série selon l'état logique de la commande VS. L'horloge H entre sur une porte ET 32 qui reçoit la sortie d'une porte OU 33 qui reçoit elle-même la commande VS pour le transfert série et une commande VP pour le transfert parallèle (fig. 2). L'horloge qui contrôle les registres est ainsi validée lorsque l'une des commandes VS ou VP est active. Figure 3 shows the diagram of one of these batteries, for example the battery Ri. It comprises 1 shift registers R11, R12,... Ri1 each being provided at input with a selector 301, 302,. ..30I which makes it possible to operate the registers in series mode or in parallel mode. Each selector receives a VS command to determine the mode selected for parallel or serial data entry according to the logic state of the VS command. The clock H enters an AND gate 32 which receives the output of an OR gate 33 which itself receives the command VS for the serial transfer and a command VP for the parallel transfer (FIG 2). The clock that controls the registers is thus validated when one of the VS or VP commands is active.

En mode série chaque entrée d'un registre reçoit la sortie du registre précédent. La sortie série Ss des données s'effectue sur le dernier registre. In serial mode each entry of a register receives the output of the previous register. The serial output Ss of the data is effected on the last register.

La figure 4 représente le mécanisme de transfert en mode parallèle. Les sorties parallèle Spi des registres d'une batterie sont reliées aux entrées Ep-1,i des registres de la batterie précédente (i indice courant). La batterie de sortie Ri est reliée au port de sortie 52 par les c sorties les plus à gauche sur la figure 4. Dans ce cas, la batterie d'entrée Ri est reliée au port d'entrée 51 par les b entrées les plus à droite sur la figure 4. Cette présentation avec sortie à gauche et entrée à droite est liée directement à l'exemple tel qu'il a été décrit. Cette présentation des données peut être inversée. Figure 4 shows the transfer mechanism in parallel mode. The parallel outputs Spi of the registers of a battery are connected to the inputs Ep-1, i of the registers of the preceding battery (i current index). The output battery Ri is connected to the output port 52 by the left-most c-outputs in FIG. 4. In this case, the input battery Ri is connected to the input port 51 by the most-input b-inputs. This presentation with exit on the left and entry on the right is directly related to the example as it has been described. This presentation of the data can be reversed.

L'exemple qui vient d'être décrit concerne le calcul de la transformée en cosinus discrète inverse. Bien évidemment d'autres transformées peuvent être déterminées sans sortir du cadre de l'invention. Les interconnexions à réaliser sont alors adaptées en conséquence. The example that has just been described concerns the calculation of the inverse discrete cosine transform. Of course other transformations can be determined without departing from the scope of the invention. The interconnections to be made are adapted accordingly.

Claims

a sign bit expander that gives each serialized input data Xj a predetermined size expanded on p bits, related to the number of bits required for the computing means to operate with a predetermined calculation precision.

a battery of 1 registers which operates in parallel / series mode, where 1 is equal to the greatest of the numbers b or c,

Xj received in parallel and at the output the deserialization of the data xj which are delivered in parallel by the circuit, each module comprising

H, a data Xj traversing during processing the same number of layers regardless of the path followed, and the input means output comprising N serialization / deserialization modules that operate data serialization input

1. Circuit for performing a linear transformation Xj / xj of N digital data Xj of b bits in N digital data xj of c bits, comprising input / output means respectively X data; and xj and means for calculating the linear transformation Xj / xj, characterized in that, all the data being signed integers coded in binary complement-to-two representation, the calculation means determine the data xj by carrying out the entire processing in serial mode using 1-bit operators organized in a layered structure, controlled by a clock

2. Circuit according to claim 1, characterized in that the 1-bit operators comprise adders, subtracters, multipliers and unit delay cells.

3. Circuit according to one of claims 1 or 2 characterized in that the sign bit expander comprises a latch which stores the sign bit of the data Xj and a selector that extends said sign bit to give the data.

Xj the predetermined number of bits p.

4. Circuit according to claims 1 to 3 characterized in that at a given instant, in each battery of registers circulates 1 bits taken in succession and continuously in the sequence of bits Xj (LSB) to Xj (MSB), xj-1 ( LSB) to (MSB) where the letters LSB and MSB represent the bit least and most significant respectively.

5. Circuit according to claim 4, characterized in that each battery which is controlled by a control signal VP, copied in parallel mode the contents of the preceding battery so that a digital data transits successively from the last battery to the first battery, the first battery being connected to an output port and the last battery to an input port, thus realizing in N cycles the N data introductions on the input port and simultaneously the presentation of the N previous results on the output port .

Circuit according to Claim 5, characterized in that the input port and the output port are connected to a single output bus; a validation stage being placed at the output of the battery Ri to control the output of the data on the bus.