GB2238893A

GB2238893A - Convolution apparatus

Info

Publication number: GB2238893A
Application number: GB8927828A
Authority: GB
Inventors: Christopher Brian Marshall; Andrew Michael Dennis
Original assignee: Philips Electronic and Associated Industries Ltd
Current assignee: Philips Electronics UK Ltd
Priority date: 1989-12-08
Filing date: 1989-12-08
Publication date: 1991-06-12
Also published as: GB8927828D0

Abstract

Apparatus for determining the result of convolution of a first sequence of values with a second sequence of values, e.g. for the purpose of digital filtering, comprises k parallel processing channels (11,21, ... 1k, 2k) each of which performs the convolution modulo a respective integer using number theoretic transforms. The first sequence is processed in blocks and the successive partial results in each channel are combined by means of the overlap-add technique, these partial results being of different lengths in at least two of the channels. In order to reduce the processing power required the block lengths in those channels giving longer partial results are chosen to be themselves longer, preferably half the length of the corresponding partial results. <IMAGE>

Description

DESCRIPTION CONVOLUTION APPARATUS This invention relates to apparatus for determining the result of convolution of a first sequence of values with a second sequence of values.

Convolution operations can be used, for example, for the purpose of digital filtering, a sequence of input data values then being convolved with the filter coefficients. If the first and second sequences are x(n) and h(n) respectively, where x(n) is potentially non-zero only for values of n between zero and (L-1) inclusive, and h(n) is potentially non-zero only for values of n between zero and (M-1) inclusive, the linear or aperiodic convolution of the first sequence with the second sequence yields the L + M - 1 - point sequence y(n) defined as

where h(m) and x(n-m) are zero outside the appropriately defined intervals.The sequence y(n) could, of course, be computed directly using the above expression, but it is well-known that, for large values of L + M, the same result can be obtained more efficiently by forward-transforming each of the first and second sequences and then, for example, multiplying each point of the resulting transform of one sequence by the corresponding point of the resulting transform of the other sequence and inverse-transforming the set of resulting products (see, for example, pages 61-68 of the book "Theory and Application of Digital Signal Processing" by L.R. Rabiner and B. Gold (Prentice Hall, 1975)). The particular transform used may be, for example, the Discrete Fourier Transform (DFT).In order to employ such a transform method each of the forward transforms must include at least (L + M - 1) points (because the convolution result comprises (L + M - 1) sample values), which means in turn that each of the first and second sequences has to be effectively augmented by zero values so that it comprises a number of values equal to the number of points in the forward transform. The number of zero values added may be just sufficient to make each sequence length (L + M - 1) or may be more than this (see, for example, pages 61-63 of the above-mentioned book). If the first sequence is very much longer than the second it is often advantageous to perform the convolution operation in sections, each, for example, by means of the transform method outlined above, partial results being produced which can then be pieced together to produce the overall results.

Either of two techniques may be used to this end, so-called "overlap-add" or "overlap-save", as discussed, for example, at pages 63-66 of the above-mentioned book. When either technique is used the first sequence is effectively partitioned into equal-length successive blocks (contiguous or overlapping, respectively). Overlap-add is often the technique preferred in practice. If L is now the block length into which the first sequence is partitioned, and the second sequence is of length M, then when overlap-add is used the (L + l)th, (L + 2)th, (L + M - l)th potentially non-zero values of each partial result are added to, respectively, the first, second, ... (M - l)th potentially non-zero values of the next partial result, and so on, to create the overall result.

It is often advantageous to perform the calculations in a Galois field, as then all the numbers involved are integers, so that the result can be exact. The transform used when this is the case is, as is known, a Number Theoretic Transform (NTT) which is analogous to the DFT. Convolution and correlation (the latter being convolution with one of the sequences reversed) using such NTTs is discussed, for example, in the above-mentioned book at pages 419-433. The forward NTT is defined as

m = 0,1 ,N-1 and the inverse NTT is defined as

n = O,l, N-l where in each case the multiplications and additions are carried out modulo P, the number a is the Nth root of unity (modulo P), and (P-1)/N is an integer (N is the transform length).The modulus P has to be large enough to provide sufficient dynamic range for the convolution. However it can be chosen to be of the form (P1)(P2) ... (Pk) where the Pi are mutually prime factors of P, preferably the primes in the factorization of P, in which case the computations of the forward transforms, products, inverse transforms and overall result from the partial results can be carried out in parallel channels for the respective moduli P1,P2 Pk, the final result then being reconstructed utilizing the Chinese Remainder Theorem. For each forward and inverse NTT then used the relevant number ai has to be the Nth root of unity modulo the relevant Pi, and (Pi-1)/N has to be an integer.

Convolution apparatuses which operate in this way are discussed, for example, in a paper by W.K. Jenkins entitled "Composite Number Theoretic Transforms for Digital Filtering" in the Proceedings of the 9th Annual Asilomar Conference on Circuits, Systems and Computing (1975) at pages 421-425. In this paper it is pointed out that it is not essential to use the same transform length N in each channel (which with a simple forward transformimultiple multiplicationlinverse transform method of convolving a sequence of L values with a sequence of M values would put a very difficult constraint on the choice of N, and hence of L+M, because the maximum value of N, and hence of (L+M-1), which could then be employed would be the greatest common divisor of the various Pi#1). The respective transform length Ni(i=1,2,...k) employed in each of the k channels in such a case can in fact be chosen to be equal to the relevant (Pi-l) and the length - L first sequence and the length - M second sequence can be each padded with zeros to a length (Pi-l) as far as the relevant channel is concerned.

The Jenkins paper considers explicitly only the case where a first sequence of total length L is convolved with a second sequence of total length M by means of a single forward-NTTimultiple-multiplicationiinverse-NTT sequence of operations in each channel, the value of L being implicitly the same for each channel, as is the value of M, and the value of (L+M-1) being equal to the smallest Pi-l). However it will be evident that such a method may be used to calculate each partial result when a first sequence which is very much longer than a second sequence is convolved in sections with the second sequence and the partial results are pieced together using the overlap-add technique outlined above.Each set of partial results, one from each channel and the potentially non-zero part of each of which will be of length (L+M-1), whatever the transform length for the relevant channel, can be converted to a single conventionally-represented length - (L+M-1) partial result, e.g.

using the Chinese Remainder Theorem, and these conventionally-represented partial results can then be overlapped and added in the required manner to produce the overall result.

However, the generation of each output data value by means of such a procedure requires a substantial number of arithmetic etc.

operations, and hence apparatus implementing such a procedure must be provided with a substantial amount of processing power (e.g.

arithmetic units, latches etc.). It is an object of the invention to provide alternative apparatus in which the amount of processing power required can be less.

The invention provides apparatus for determining the result of convolution of a first sequence of values with a second sequence of M values, each value of the first sequence being represented by k residue values modulo respective numbers Pi, (i=l,....k) which are mutually prime, said apparatus comprising a respective data processing channel corresponding to each of the k numbers Pi, which channel comprises first calculating means for calculating, via forward and inverse Number Theoretic Transforms, the results of convolution of successive blocks of Li values of the first sequence represented modulo Pi and effectively augmented to a length Ni by the addition of zero values with the M values of the second sequence represented modulo Pi and effectively augmented to the length Ni by the addition of zero values, Ni being not less than Li+M-l, and second calculating means for adding the (Li+l)th, (Li+2)th,....(Li+M-1)th potentially non-zero values of each result calculated by the first calculating means to, respectively, the first, second,....(M-l)th potentially non-zero values of the next result calculated by the first calculating means, N1 being greater than N2 and L1 being greater than L2.

It has now been recognised that the length of the blocks into which the first sequence is effectively sectioned need not be the same for each channel, and that there can be an advantage in choosing them to be greater for those channels employing greater partial convolution lengths Ni. This is because the number of arithmetic etc. operations required to produce a partial result in a given channel i depends on the value of the partial convolution length Ni employed in that channel and in general is larger for larger values of Ni. (The length Ni may or may not correspond to the actual transform length employed. As will be evident from copending patent application 8910974.8 (PHB33554) which is incorporated herein by reference it may correspond to a multiple of the transform length employed).If the second sequence is of length M and the first sequence were sectioned into blocks of length L which is the same for each channel then each partial result from each channel would comprise the same number (L+M-1) of potentially non-zero values which would in general have been derived by means of more arithmetic etc. operations in the channels employed greater partial convolution lengths Ni than they would have been in the channels employing shorter partial convolution lengths. Thus the amount of processing power (e.g. arithmetic units, latches etc.) required per useful partial result value would in general be greater in the channels employing the larger values of Ni than it would be in the other channels.Choosing the length Li Of the blocks into which the first sequence is effectively sectioned to be greater for these channels employing greater partial convolution lengths Ni, thereby increasing the number (Li+M-1) of useful values in each partial result produced by these channels, will therefore in general reduce the amount of processing power required in these channels to produce each partial result useful value.Although the generation of partial results having different useful lengths by the various channels necessitates the performance of separate overlap-add operations in each channel, i.e. prior to the outputs of the various channels being combined to produce conventionally-represented output values, the increase in required processing power this entails will often be considerably outweighed by the reduction in required processing power referred to above.

The optimum reduction in required processing power will often be obtained if the various Li are chosen so that the value of Ni-(Li+M-l) is zero for each channel. However, particularly if each first calculating means outputs each of its results in a serial manner, a comparatively simple construction can be effected for each of the second calculating means if the various Li are chosen to be equal to half the corresponding Ni.If this is the case then half of each length - Ni partial result outputted by each first calculating means will effectively exactly overlap half of the preceding such partial result and the other half will effectively exactly overlap half of the succeeding such partial result, so the corresponding second calculating means can be constructed merely to add the sequence outputted by the first calculating means to a delayed version of itself, the delay being Ni/2, and to discard alternate length - Ni/2 groups of the resulting sums (which will have been derived from the same partial result rather than from an effective overlap of successive partial results).

An embodiment of the invention will now be described, by way of example, with reference to the accompanying diagrammatic drawings in which Figure 1 is a block diagram of the embodiment, Figure 2 shows a possible construction for part of one of the blocks of Figure 1, Figure 3 shows a possible construction for the remaining part of the said block of Figure 1, Figure 4 shows a possible architecture for part of the construction of Figure 3, and Figure 5 shows a possible construction for another of the blocks of Figure 1.

In Figure 1 apparatus for determining the result of convolution of a first sequence of values with a second sequence of M values, each value of the first sequence being represented by k residue values modulo respective numbers Pi (i = 1 k) which are mutually prime, comprises a respective data processing channel 11, 21; 12,22;....; 1k,2k corresponding to each of the k numbers Pi.

In this embodiment the conventionally represented values of the first sequence are applied in succession to the input 3 of a conventional-repres entation-to-plural-res idue-repres entation conversion circuit 4 which generates the residues of these successive values modulo the various Pi (the remainder after each value is divided by the relevant Pi) on respective outputs Si (i = 1,...,k). Conversion circuit 4 may comprise, for example, a suitably programmed look-up table having its address input connected to the input 3 and respective fields of its output connected to the outputs 51s52 5k The outputs 51s52, 5k of circuit 4 are connected to inputs 61,62,....6k respectively of the channels 11,21;12,22;....; 1k,2k respectively. The result of the data processing in each channel 1i,2i appears on a respective output 7i and takes the form of a succession of values modulo the relevant Pi. The outputs 7i are connected to respective inputs 8i of a plural-residue-representation-to-conventional-representation conversion circuit 9 which operates in the converse manner to the circuit 4, i.e. it converts each set of input values, one value from each channel li,2i, to a conventionally represented output value which appears on an output 10. To this end circuit 9 may, for example, take the form of a hardware arrangement which implements the Chinese Remainder Theorem.

The Chinese Remainder Theorem states that the number y represented in the modulus set P1,P2,----,Pk by the set of residues y1,y2 yk is given by the summation of the weighted residues, modulo P: y = (C1y1 + C2Y2 + .... + Cky) modulo P where P P (P1)(P2) .... (Pk) and Ci = [ P/Pi ] .( [ P/Pi ] ~1 modulo Pi) (i = 1,2,....,) (The coefficients Ci are in fact the numbers which are unity in the respective modulus and zero in all the others).Thus, for example, the circuit 9 may comprise a set of multipliers for weighting the various yi (i = 1,2,...,k) of each set, an addition unit to calculate the sum, and a modulo P unit such as a divide-by-P arrangement to bring the result back within range.

Each of the data processing channels li,2i comprises a first calculating means 1i and a second calculating means 2i connected in cascade. The first calculating means li is constructed to calculate, via forward and inverse Number Theoretic Transforms (NTTs) the results of convolution of successive contiguous blocks of Li successive values applied to its input 6i, each block being effectively augmented to a length Ni by the addition of zero values, with the M values of a second sequence also effectively augmented to the length Ni by the addition of zero values, and generate these results as a succession of values on its output lli. For each channel the value of Ni is chosen to be not less than the relevant (Li + M - 1).Moreover, for at least one channel for which the value of Ni is greater than the corresponding value for another channel, the value of Li is also greater than the corresponding value for the other channel. Thus, in a very simple example comprising two channels for which Pl-ll and P2=7 respectively, and for which M = 3, N1 may be equal to ten, N2 may be equal to six, L1 may be equal to five and L2 may be equal to three. (Preferably each Li is chosen to be equal to half the corresponding Ni in this manner, for reasons which will become apparent below).

Each of the second calculating means 2i has its input 12i connected to the output lli of the corresponding first calculating means and is constructed to perform an overlap-add function for the corresponding channel, i.e. to add the (Li+1)th, (Li+2)th, ...., (Li+M-1)th potentially non-zero values of each result calculated by the corresponding first calculating means li to, respectively, the first, second, ...., (M-l)th potentially non-zero values of the next result calculated by the means li.

Provided that each Ni is chosen to be equal to the product of two mutually prime integers Ai and Bi (which of course is not necessarily the case) each first calculating means li of Figure 1 may comprise, in cascade, an arrangement as shown in Figure 2 of the drawings followed by an arrangement as shown in Figure 3 of the drawings. The arrangement of Figure 2 receives the successive values of the first sequence represented modulo the respective Pi on its input 6 from the relevant output Si of the circuit 4, augments each successive block of Li of these values to the length Ni by the addition of zero values, and outputs the result on its output 61. The sequence of data values appearing on the output 61 therefore comprises blocks of Li potentially non-zero data values applied to the input 6 alternating with Ni-Li zero data values.In the present example Li is chosen to be equal to Ni/2 (although of course this is not necessarily the case) so that the data rate at the output 61 is twice that at the input 6. The arrangement of Figure 3 receives the data from the output 61 on its input 60 and determines, for each successive sequence of Ni data values comprising a block of Li input values together with the Ni-Li zeros which succeed it, the result of its convolution with a sequence of Ni reference data values (M potentially non-zero reference data values augmented by NiM zeros) also represented the respective Pi. The length - Ni result sequences appear in succession on the arrangement output 11.

The arrangement of Figure 2 comprises three data multiplexers 72,73,74, a pair of shift registers 75,76 each having a capacity Li=Ni/2, and a pair of (electronic) changeover switches 77,78. The data input 6 is connected to inputs 79 and 80 of the multiplexers 72 and 73 respectively, the other inputs of these multiplexers, 81 and 82 respectively, being fed with a zero data value on a permanent basis. Control signal inputs 83 and 84 of the multiplexers 72 and 73 are fed with control signals applied to an input 85 from a clock pulse generator (not shown). These control signals result in the data values applied to input 6 being fed to the output 86 of multiplexer 72 simultaneously with zero values being conducted by multiplexer 73 to its output 87, and vice versa.

The outputs 86 and 87 of the multiplexers 72 and 73 respectively are connected to data inputs 88 and 89 respectively of the shift registers 75 and 76 respectively. Clock signal inputs 90 and 91 of the registers 75 and 76 respectively are fed from the outputs 92 and 93 respectively of the switches 77 and 78 respectively. Changeover contacts 94 and 95 of switch 77 are fed with clock pulses from inputs 96 and 97 respectively, as are changeover contacts 98 and 99 of switch 78. The inputs 96 and 97 are connected to respective outputs of the aforementioned clock pulse generator.Control signal inputs 100 and 101 of the switches 77 and 78 respectively are connected to the input 85 and these switches are controlled by the clock pulses applied thereto in such manner that when multiplexer 72 is connecting its input 79 to its output 86 the switches 77 and 78 connect their inputs 94 and 99 to their outputs 92 and 93 respectively, and when multiplexer 72 is connecting its input 81 to its output 86 the switches 77 and 78 connect their inputs 95 and 98 to their outputs 92 and 93 respectively.

The data outputs 102 and 103 of the registers 75 and 76 respectively are connected to inputs 104 and 105 respectively of the multiplexer 74 the output 106 of which is connected to the output 61. The control input 107 of multiplexer 74 is connected to the input 85 and multiplexer 74 is controlled by the clock pulses applied thereto in such manner that it connects its input 104 to its output 106 when multiplexer 72 is connecting its input 81 to its output 86 and connects its input 105 to its output 106 when multiplexer 72 is connecting its input 79 to its output 86.

The period of the (1:1 mark-to-space ratio) signal applied to input 85 is arranged to be 2Li=Ni times the period of the data applied to input 6, the period of the signal applied to input 96 is arranged to be equal to this data period, and the period of the (1:1 mark-to-space ratio) signal applied to input 97 is arranged to be equal to half this data period. Thus successive blocks of Li data values applied to input 6 are applied to and clocked into the registers 75 and 76 alternately. While this is occurring in a given register 75 or 76 the other register is being fed with zeros and being clocked out at double the data rate to the output 61, giving blocks of Li potentially non-zero data values alternating with blocks of Li zero data values at the output 61, as required.

The arrangement of Figure 3, which corresponds substantially to Figure 1 of the aforesaid co-pending patent application 8910974.8, has a serial input 60 for the input data values from the output 61 of the arrangement of Figure 2 and a serial output 11 for the successive length - Ni convolution result sequences. The arrangement comprises, in cascade, a first set 62 of Ai multiply/accumulate arithmetic units, a second set 63 of Bi multiply/accumulate arithmetic units, and a third set 64 of Ai multiply/accumulate arithmetic units. Sets 62,63 and 64 have first serial data inputs 65,66 and 67 respectively which are connected to the serial data input 60, a serial data output 68 of set 62 and a serial data output 69 of set 63 respectively. The arrangement output 11 is connected to a serial data output 70 of set 64.The sets 62,63 and 64 also have second serial data inputs 71,13 and 14 respectively which are connected to the data outputs 15,16 and 17 respectively of memory devices 18,19 and 20 respectively. Address inputs 21,22 and 23 of the memory devices 18,19 and 20 respectively are connected to the output 24 of an address counter 25 which has a clock signal input 26 and a reset signal input 27. The sets 62,63 and 64 have clock signal inputs 28,29 and 30 respectively, control signal inputs 31,32 and 33 respectively, and reset signal inputs 34,35 and 36 respectively. Although not shown in the drawing for the sake of clarity the clock signal inputs 26,28,29 and 30 are all connected to a clock signal output 37 of a clock and control signal general 38, which generator is in fact the same one as that referred to in the description of Figure 2.Again although not shown, the reset signal inputs 27,34,35,36 and the control signal inputs 31,32,33 are connected to respective outputs 39,40,41,42,43,44 and 45 of generator 38.

The set 62 of Ai arithmetic units receives each sequence of Ni input data values (Li potentially non-zero values to which are appended Li zero values) on its input 65 and, in a manner which will be elaborated upon below, calculates for each sequence an Ai-point modulo-Pi number theoretic transform of each of the Bi length-Ai rows of the input data values mapped to a Bi x Ai matrix and outputs the resulting Ni transform points serially on its output 68. Ai is chosen so that it divides Pi-l.The set 63 of Bi arithmetic units receives the serial output of the set 62 on its input 66, calculates in a manner which will be elaborated upon below the modulo-Pi results of cyclic convolution of each lengths column of each Bi x Ai transform matrix generated by the set 62 with the corresponding column of the corresponding Bi x Ai transform matrix of the Ni reference data values, and outputs the resulting Ni convolution points serially on its output 69. The set 64 of Ai arithmetic units receives the serial output of the set 63, calculates in a manner which will be elaborated upon below an Ai-point modulo-Pi inverse number theoretic transform of each of the Bi length-Ai rows of each Bi x Ai convolution matrix calculated by the set 63, and outputs each resulting Ni inverse transform points serially mapped back to one dimension on its output 70.

The aforesaid mapping is known, for example, from a paper entitled "New Algorithms for Digital Convolution" by R.C. Agarwal and J.W. Cooley in IEEE Trans. on Acoustics, Speech and Signal Processing, vol. ASSP-25, no. 5 (October 1977) pages 392-410, particularly pages 398-400. If the calculations are carried out in a Galois field modulo Pi a one-dimensional convolution of two Ni - point sequences, where Ni divides Pi-l and is chosen to be equal to the product of two mutually prime integers Ai and Bi, maps to a two-dimensional convolution of size Ai x Bi according to n = Aim + Bil modulo Ni, where the n are the indices of the points of each one-dimensional sequence of length Ni, the 1 are the indices of the points of the corresponding Bi x Ai matrix along the dimension of length Ai, and the m are the indices of the points of the corresponding Bi x Ai matrix along the dimension of length Bi.

(The index of the first point is taken as zero). Thus, as a very simple example, if Ai = 3 and Bi = 2, the mapping of the indices of the (six-point) one-dimensional sequence is 024 3 51 because for 1 = O, m = O, n = (3.0 + 2.0) mod 6 = 0 1 = 1, m = O, n = (3.0 + 2.1) mod 6 = 2 1 = 2, m = O, n = (3.0 + 2.2) mod 6 = 4 1 = O, m = 1, n = (3.1 + 2.0) mod 6 = 3 1 = 1, m = 1, n = (3.1 + 2.1) mod 6 = 5 1 = 2, m = 1, n = (3.1 + 2.2) mod 6 = 1 If such mapping is used and the convolution is performed via forward and inverse transforms then the central operation of the sequence of operations transform/multiply/inverse-transform which would have to be performed in the one-dimension case itself becomes a convolution operation (because the Ni-point transform and inverse-transform each become Bi Ai-point transforms).

For the sake of clarity and ease of understanding the more detailed description of the arrangement of Figure 3 which follows with reference to Figure 4 will be of a very simple example for which Ai = 3 and Bi = 2 (Ni = 6 and, for example, Pi = 7) although it will be appreciated that the architecture described can be extended at will to accommodate much larger values of Ai and Bi.

Figure 4 (which corresponds substantially to Figure 6 of copending UK patent application 8900301.6; PHB33251 which is incorporated herein by reference) shows a possible construction for each of the sets of arithmetic units 62 and 64 of Figure 3, for these values of Ai and Bi. As shown in Figure 4, each of the sets 62 and 64 comprises three identical multiply/accumulate arithmetic units 46,46' and 46" respectively. Each unit 46 comprises a modulo-Pi (modulo-7) multiplier 47, a modulo-Pi (modulo-7) adder 48, two delay devices 49 and 50 respectively, for example shift registers, and a multiplexer 51. The serial data input 65 or 67 is connected to one input 52 of the multipler 47, thence via delay device 50 to the corresponding input 52' of multiplier 47', and thence via delay device 50' to the corresponding input 52" of multiplier 47". Delay device 50" is not in fact used.The other data input 71 or 14 comprises three data paths which are connected to the other inputs 53,53' and 53" of the multipliers 47,47' and 47" respectively. The control input 31 or 33 comprises three lines which are connected to changeover control inputs 54,54' and 54" respectively of the (2-input) multiplexers 51, 51' and 51" respectively. The clock signal input 28 or 30 is connected to an input 55 of each unit 46 and controls, for example, the delay devices 49 and 50 in known manner. The reset signal input 34 or 36 comprises three lines which are connected to respective reset signal inputs 56, 56' and 56" of the units 46, 46' and 46".

The output 57 of multiplexer 51 is connected to one input 58' of multiplexer 51', the output 57' of multiplexer 51' is connected to one input 58" of multiplexer 51" and the output 57" of multiplexer 51" is connected to a serial data output 68 or ~70. In each unit 46 the output of the multiplier 47 is connected to one input of the adder 48, the output of the adder 48 being connected both directly to the other input of the multiplexer 51 and also to the other input of the adder 48 via the delay device 49. The delay devices 49 are each arranged to delay data applied thereto by Bi data periods, i.e. by two data periods in the present case, so that each accumulator constituted by an adder 48 and its associated feedback delay device 49 accumulates Bi = 2 sets of data applied to it from the associated multiplexer 47 in a multiplexed manner.The delay devices 50 are also each arranged to delay data applied thereto by Bi 1 2 data periods so that the data applied to unit 65 or 67 is applied to unit 46' two data periods after it is applied to unit 46, and is applied to unit 46" two data periods after it is applied to unit 46'.

As discussed in the aforementioned U.K. patent application 8900301.6, the set 62 or 64 of Ai = 3 units 46 shown in Figure 4 essentially forms a matrix x vector product calculating apparatus with multiplexed calculations and serial output if the components of the Bi (=2) length - Ai (=3) vectors are applied serially to the input 65 or 67 in multiplexed manner during respective successive data periods, the components of the length - Ai first rows of the respective matrices are applied similarly to the input 53 during these data periods, the components of the second rows of the matrices are applied similarly to the input 53' but after a delay of Bi data periods, and finally in the present case because Ai = 3, the components of the third rows of the matrices are applied similarly to the input 53" but after a delay of 2Bi data periods, provided that the multiplexers 51 are controlled from the control input 31,33 to normally connect their input 58 to their output 57 and so that multiplexer 51 connects its other input to its output during the fifth and sixth data periods, i.e. the (AixBi-Bi+l)th.... (AixBi)th data periods, multiplexer 51' connects its other input to its output during the seventh and eighth data periods, i.e. the (AixBi+l)th .... (AixBi+Bi)th data periods, and the multiplexer 51" connects its other inputs to its output during the ninth and tenth, i.e. the (AixBi+Bi+l)th .... (AixBi+2Bi)th data periods.The evarious components of the product vectors appear serially on the output 68 or 70 in a multiplexed manner, i.e. such that every Bi(=2) components which appear in direct succession consist of one component from each of the length - Ai product vectors (which effectively constitute together a (BixAi) product matrix). The reset signals applied to the respective lines of the reset signal input 34 or 36 are arranged to momentarily reset the contents of the delay devices 50,50' and 50" to zero at instants coinciding with the instants at which the corresponding multiplexers 51, 51' and 51" respectively are switched to connect their inputs 58, 58' or 58" to their output 57, 57' or 57" after having connected their other input to their output 57, 57' or 57".

Thus, if an effectively continuous succession of data values is applied to the input 65 or 67 and the data applied to each of the lines of the input 71 or 14 is repeated in a cyclic manner, successive sets of Ni input data items applied to the input 65 or 67 will be processed in an identical manner to effectively result in the components of successive corresponding BixAi output matrices appearing on the output 68 or 70.

Referring back to the passage above where the mapping of a one-dimensional sequence to two dimensions for the purpose of convolution was discussed, it will be appreciated that the calculations performed by the arrangement of Figure 4 are precisely those required to produce the NTT points for the Bi length - Ai rows of the sequence after it has been mapped to two dimensions (block 62 of Figure 3) and to produce the inverse NTT points for the Bi length - Ai rows of the two-dimensional results sequence (block 64 of Figure 3), if the components of the matrices applied to the lines of the input 71 or 14 constitute the relevant NTT coefficients in one case and the relevant inverse NTT coefficients in the other.For example, if a length-six input sequence xg, xq, X3, xz, xl, XO is mapped to two dimensions to give xo X2 X4 x3 X5 X1 then what is required from the block 62 are the points (a00x0+a01x2+a02x#) (alOxO+allx2+a12x4) (a20x0+a21x1+a22x4) (a30x3+a31x5+a32x1) (a40x3+a41x5+a42x1) (a50x3+a51x5+a52x1) where the various a are the appropriate transform coefficients. It will be noted that the order in which the additions are carried out to calculate each point is immaterial, so that the correct mapping can be obtained by a simple choice of which of the coefficients a are applied to the lines of the input 71, 14 at any given time, so that they are always associated with the correct input point x. A similar comment applies to the calculations of the output (inverse transform) points in the block 64 of Figure 3, in that a correct choice can be made to ensure that the inverse transform points are outputted in the correct order. The transform coefficients are stored in the store 18 of Figure 3 and the inverse transform coefficients are stored in the store 20, and are outputted at the appropriate times due to the clocking of the address counter 25 and its periodic resetting.

With the simple example being discussed of Ai=3 and Bi=2, the block 63 of Figure 3 is required to cyclically convolve the columns of the array of transform points specified above with corresponding columns of a corresponding array of transform points derived from the Ni=6 point reference sequence (M=3 potentially non-zero values augmented with three zeros). It is assumed that the points of the latter array are precalculated and are stored in the store 19 of Figure 3, whence they are outputted at appropriate times (cyclically) under the control of the address counter 25 and applied to the block 63 concurrently with appropriate points of the above array from the block 62.If the above array is written as go x1 g2 g3 g4 g5 and the corresponding array for the reference sequence is Ho H1 H2 H3 H4 H5 the results required for application to the block 64 are (XOHO+X3H3) (X1H1+X4H4) (X2H2+X5H5) (X#H3+X3H0) (X1H4+X4H1) ( I2H5+X5H2 ) The block 63 of Figure 3 may therefore be constructed in a similar fashion to that shown in Figure 4 with the unit 46" omitted i.e. with Bi=2 units 46 each with suitably chosen delays in the devices 49 and 50, the input 65,67 then constituting the input 66, the input 71,14 constituting the input 13, the output 68,70 constituting the output 69, and the inputs 28,30; 31,33, and 34,36 then constituting the inputs 29,32 and 35 respectively. The order in which the points X0-X5 are outputted from the block 62 can be adjusted to some extent by adjusting the choice of which of the coefficients a are applied to which lines of the input 71 (although points with odd-numbered and even-numbered coefficients will always alternate).It is assumed in the present example that this order is such that each pair of points which have to be combined with each other in the block 63, i.e. the points Xg and X3, X1 and Xq, and X2 and Xg, occurs during two directly successive data periods, so that the total order of succession is, for example g5x2X4X1x3x0. If this is arranged to be the case the delay devices 49 in each of the two units 46 which make up the block 63 may each be arranged to give rise to a delay of one data period, the delay device 50 coupling one block to the other may be arranged to also give rise to a delay of one data period and the two multiplexers 51 may be arranged to be changed over (with opposite phases) ät the end of each data period.If the results quoted above as being required for application to the block 64 are written as YO Y1 Y2 y3 Y4 y5 these will then be outputted in such an order that points of the top line alternate with points of the bottom line.

The delays in the devices 49 and 50 in the units 46 making up the blocks 64 may be the same as those quoted for the block 62.

As an alternative, insofar as the construction of Figure 4 is used for the block 62 and 64 of Figure 3, the delay devices 50 may be omitted completely, the input 52' of unit 46' thus being fed from the output of delay device 49 and the input 52" of unit 46" thus being fed from the output of delay device 49'. This possibility is discussed in detail in the aforementioned U.K.

patent application 8900301.6, particularly with reference to Figure 7 of the drawings thereof and, if employed, necessitates modification to the values of the data fed to those lines of the input 71,14 which are connected to the units 46' and 46".

The clock and control signal generator 38 of Figure 3 may take the form of, for example, a clock pulse generator which feeds the output 37 directly or via a frequency divider and also clocks a cycling counter (not shown). If this is the case the signals required on the outputs 39-45 and for application to the inputs 85 and 96 of Figure 2 may be generated by appropriate decoders connected to the parallel output of the counter. The generator output 37 may be applied directly to the input 97 of Figure 2.

In a very simple two-channel possible construction for an apparatus as described hereinbefore with reference to Figure 1 the construction for the first calculating means 1i described with reference to Figures 3 and 4 may be used, for example, for the calculating means 12, so that for the second channel L2=M-3 and N2=6 (P2=7 for example). For the first channel one could then choose, for example, L1=5 and N1=10 (P1=11 for example), so that the calculating means 11 could be again constructed as described with reference to the block diagrams of Figures 2 and 3 but, this time, with the blocks 62 and 64 each comprising a set of five arithmetic units similar to, and interconnected in a similar way to, the three units 46 of Figure 4.For this channel the memory 19 of Figure 3 will have to be arranged to store the precalculated transform points of a two-dimensional array derived from the in this case N1=1O-point reference sequence (M=3 potentially non-zero values augmented with seven zeros). It will be appreciated that these values for Li, M, Ni and Pi are only given for the sake of illustration and, in practice, considerably larger value for these quantities and/or more than two channels will often be employed.

Provided that each Li is chosen to be equal to half the corresponding Ni, each second calculating means 2i of Figure 1 may be constructed as shown in Figure 5 of the drawings.

The second calculating means shown in Figure 5 comprises three shift registers 110, 111 and 112 each having a capacity Li=Ni/2, a digital modulo-Pi adder 113, a data demultiplexer 114, a data multiplexer 115 and a pair of (electronic) changeover switches 116 and 117. The data produced at the output 11 of the corresponding first calculating means 1, for example that described above with reference to Figures 2-4, is fed to the input 12 and thence directly to a first input 118 of adder 113 and also via register 110 to a second input 119 of adder 113. A clock signal input 120 of register 110 is fed with clock pulses applied to an input 121, for example from the output 37 of the generator 38 of Figure 3.

These clock pulses, which are also applied to the adder 113, occur at a rate equal to the data rate at the input 12, so that register 110 delays its input data by Li data periods before applying it to the adder input 119. The serial data appearing at the output 122 of adder 113 therefore consists of the result of adding the serial data applied to the input 12 to a delayed version of itself, the delay being Li. Because Li is chosen to be equal to Ni/2, i.e.

exactly half the length of each partial convolution result produced by the corresponding first calculating means and applied to the input 12, exactly half of each of these partial convolution results effectively overlaps half of the preceding partial result and the other half effectively overlaps half of the succeeding partial result, as far as the required overlap-add operations are concerned. Thus adder 113 in conjunction with register 110 performs, inter alia, the required adding together of the overlapping portions of the successive partial results. However, in between the generation by adder 113 of the Li results of adding the first half of each partial result to the last half of the preceding partial result as required, adder 113 generates the Li results of adding the second half of a given partial result to the first half thereof.These latter results do not form part of the required output signal and the remainder of the arrangement of Figure 5 is provided in order that they may be discarded and. the required results be rendered uniform in time, rather than occur in isolated blocks of Li values.

To this end blocks of Ni data values appearing at the adder output 122, each of these blocks consisting of Li unwanted values followed by Li wanted values, are directed alternately to register 111 and to register 112 by means of the demultiplexer 114. The demultiplexer 114 is accordingly controlled by a 1:1 mark-to-space ratio control signal applied to an input 123, for example from a suitable output of the generator 38 of Figure 3, this signal having a period equal to 2Ni periods of the data appearing at the adder output 122. While such a block is being directed to a given register 111 or 112 this register is supplied with clock signals from the input 121 via the changeover switch 116 or 117, these switches also being controlled by the signal applied to input 123.

Because the registers 111 and 112 each have a capacity Li only the second half of each length - Ni block, i.e. the half consisting of wanted result values, actually remains stored in the relevant register 111 or 112 at the end of each of these operations. While such a block is being directed to a given register 111 or 112 in this way the other one of these registers is supplied with clock signals from a further input 124 via the changeover switch 116 or 117 and multiplexer 115 is controlled by the signal applied to input 123 to connect the data output of this other register to the output 7. The clock signals applied to input 124 have a period equal to twice the period of the data appearing at the adder output 122 and may be derived from a suitable output of the generator 38 of Figure 3.The overall result is therefore that the second half of each successive length - Ni block of data values appearing at the adder output 122, this second half consisting of required output data values, becomes stored in the register 111 and the register 112 alternately. While a given such second half is becoming stored in this way the second half which was stored immediately previously in the other register is read out to the output 7 at half rate, i.e. with its duration extended to occupy the data periods which contained the unwanted results.

From reading the present disclosure, other modifications will be apparent to persons skilled in the art. Such modifications may involve other features which are already known in the design, manufacture and use of convolution apparatuses and component parts thereof and which may be used instead of or in addition to features already described herein. Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present application also includes any novel feature or any novel combinationof features disclosed herein either explicitly or implicitly or any generalisation thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention. The applicants hereby give notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.

Claims

CLAIM(S)

1. Apparatus for determining the result of convolution of a first sequence of values with a second sequence of M values, each value of the first sequence being represented by k residue values modulo respective numbers Pi, (i-1,....k) which are mutually prime, said apparatus comprising a respective data processing channel corresponding to each of the k numbers Pi, which channel comprises first calculating means for calculating, via forward and inverse Number Theoretic Transforms, the results of convolution of successive blocks of Li values of the first sequence represented modulo Pi and effectively augmented to a length Ni by the addition of zero values with the M values of the second sequence represented modulo Pi and effectively augmented to the length Ni by the addition of zero values, Ni being not less than Li+M#1, and second calculating means for adding the (Li+1)th, (Li+2)th,....(Li+M-l)th potentially non-zero values of each result calculated by the first calculating means to, respectively, the first, second,....(M-l)th potentially non-zero values of the next result calculated by the first calculating means, N1 being greater than N2 and L1 being greater than L2.

2. Apparatus as claimed in claim 1, wherein each Li is equal to half the corresponding Ni.

3. Apparatus for determining the result of convolution of a first sequence of values with a second sequence of M values, substantially as described herein with reference to Figure 1 of the drawings, or to said Figure 1 together with Figures 2-5 of the drawings.