SU478306A1

SU478306A1 - Matrix parallel processor for calculating the Hadamard transform

Info

Publication number: SU478306A1
Application number: SU1948110A
Authority: SU
Inventors: Анатолий Иванович Гречишников
Priority date: 1973-07-25
Filing date: 1973-07-25
Publication date: 1975-07-25

Description

в дчислительных блоков третьего столбца fi матрицы соединены с входами и выходами li процессора непосредственно.in the numerators of the third column, the fi matrix is connected to the inputs and outputs li of the processor directly.

Г На фиг. 1 представлена блок-схема процессора, на фиг. 2 - схема вычисли;тельного блока.G FIG. 1 shows a block diagram of a processor; FIG. 2 is a schematic of a computational unit.

Матричный пара 1лельный процессор со держит вычислительные блоки 1, объеди«ненные в три столбца. Цифрами 2 и 3 обозначены соответственно входы и выходы процессора.Matrix pair The 1-core processor contains computational blocks 1, combined in three columns. The numbers 2 and 3 are designated respectively the inputs and outputs of the processor.

Вычислительный блок 1 содержит сумма торы 4, 5 и инвертор 6. Цифрами 7,8 и 19, 10 рбозначены входы и выходы вычислительного блока соответственно. Информаци поступает в процессор, закодированной в степенных при Computing unit 1 contains the sum of the tori 4, 5 and inverter 6. The numbers 7.8 and 19, 10 denote the inputs and outputs of the computing unit, respectively. The information enters the processor, encoded in power at

; ,.- . ; -.... ,- . ; .-. ; -...., -.

:рашений (СП). Кодирование производ т по следующему алгоритму:: rascheny (SP). Coding is performed according to the following algorithm:

Масштабирование информации в пределах от О до 1.Scaling information ranging from O to 1.

Сложение с числом 0,5. Умножение на 2. Сложение с числом 0,5.Addition with the number 0.5. Multiplication by 2. Addition with the number 0.5.

Первые Д1ва двоичных разр да, сто щие | |слева от зап той вл ютс очередным сте-1 :;пенным приращением..; The first D1va binary bits that | | | to the left of the comma are the next ste-1:; foam increment ..;

|| В табл. 1 приведен пример кодировани , числа ,ОО10011 в виде степенных при|ращений , тогда В (А + О,5) Г|| In tab. 1 shows an example of coding, numbers, OO10011 in the form of power ones at | increments, then B (A + O, 5) T

iX.X иiX.X and

1 (O.lOlOOll) где В - кодируемое1 (O.lOlOOll) where B is coded

-,. - ,.

: число, at- номер шага кодировани .: number, at- coding step number.

Таблица 1.Table 1.

ВAT

2В2B

Декодирование представл ет собой алJ гебраическое сложение количественных I эквивалентов (весов) степенных приращений (см. последнийч; столбец табл. 1).Decoding is an algebraic addition of quantitative I equivalents (weights) of power increments (see last; column of table 1).

Как видим,выполнение операции сложэ4« нн возможно только при отсутствии пере . носов на один разр д вперед, т. е. при отсутствии комбинаций степенных прираще|ний вида 10.1О и ОО.ОО. Это условие вы;полн етс всегда в силу существовани р да теорем. As you can see, the operation of complex 4 “nn is possible only in the absence of a re. noses one bit ahead, i.e. in the absence of combinations of power increments of the form 10.1О and ОО.О. This condition is fulfilled; it is always fulfilled by the existence of a number of theorems.

Пример сложени закодированного ранее числа А О.О010011 (СП О1.1О.ОО.An example of the addition of the previously coded A number OO010011 (SP O1.1O.OO.

01.10.01.0О) и числа А 0,001010101.10.01.0О) and the numbers A 0.0010101

01;flO.00.10.OO.lO.ll) приведен 01; flO.00.10.OO.lO.ll) is provided

(СП в табл. 2.(SP in Table 2.

2В + О,52B + O, 5

СПSP

Таблица 2.Table 2.

Сложение производитс по мере поступлени приращений , в приведенном примересверху вниз, т. е. от старших приращений к младшим. Каждый вычислительный блок 1 матрич ного параллельного процессора реализует выражени : (1) Bi4..-B, (2) где i - номер столбца матричного парал лельного процессора, А,В--операнды, поступаюише на входы 7 и 8 вычислительного блока 1 соответственно. А:+1,В.+ 1 - результаты вычислений, поступающие на выходы 9 и 10 вычислительного блока 1 .матричного параллельно го процессора соответственно . Вычислительные блоки 1 соединены меж ду собой в соответствии с графом, описыи вающим быстрое преобразование Адамара. Количество входов матричного параллельного процессора всегда кратно степени двойки. Тогда количество столбцов процессора будет равно величине показател степени, а число строк - числу входов , деленному пополам. Каждый вычисли тельный блок 1 обрабатывает информацию последовательна, начина со старших разр дов . Вычислительный блок 1 реализует формулы (1) и (2), причем сумматор 4формулу (1), а сумматор 5 - формулу (2). Инвертор 6 необходим дл умножени числа ,.лВ на -1. Работа матричного параллельного про .цессора происходит следующим образом. На входы 2 поступает последовательно информаци , закодированна в виде степенных при ращений. Обработанные в первом столбце матричного параллельного процессора старщие разр ды результата поступают дл дальнейщей обработки во второй столбец, оттуда - в третий и т. д. В то врем , как с выходов 3 происходит вьщача старщих разр дов окончательного результата преобразовани Адамара, на входы 2 еще продолжают поступать младшие разр ды исход ного массива чисел. Прин в с выходов 3 достаточное дл обеспечени необходимой точности количество старших разр5щов результата , процесс вычислений можно остановить . Легко видеть, что работа вычислительного блока 1 матричного параллельной го процессора заключаетс в приеме очередных степенных приращений операндов и сложении (вычитании) их в соответствии |с формулами (1) и (2). Степенные приращени результата выдаютс из вычислительшто блока 1, а на вход одновремен- yio с выдачей поступают следующие пары приращений. Следует отметить, что каждый сумматор, вход щий в состав блока 1 матричного параллельного процессора, задерживает информацию-на один такт, что сдедует из указанного выше алгоритма сложени двух чисел, представленных в виде степенюлх приращений.Очевидно, что вычислительный блок 1 матричного параллельного процессора в целом также задерживает информацию на один такт. Оценим быстродействие матричного па|раллельного процессора. Оно определ етс задержкой всех вычислительных блоков 1, : т. е. задержкой в наиболее короткой последовательной цепи, составленной из вычнч слительных блоков 1. В данном случае длина цепи равна количеству столбцов мат-, ричного параллельного процессора, которое можно вычислить по формуле , Тогда врем , необзсрдимое дл обработки всего исходного массива,ч еел (в тактах) T to9.,, где Н - количество входов матричного параллельного процессора, равное количеству чисел в исходном об- . рабатываемом массиве, а М - j число старщих разр дов результата, „; обеспечивающее необходимую точность вын числений. Если Н 1024, длительность одного такта равна 1 мксек, а М 1О, то мксек, что в 800 раз быстрее, чем у известного процессора. Предмет изобретени Матричный параллельный процессор дл вычислени преобразовани Адамара, со .держащий в узлах матрицы вычислительные блоки, выполненные в виде сумматорой, входы каждого из которых соединены с входами вычислительного блока, а первый выход каждого вычислительного блока соединен с выходом соответствующего сумматора, отличающийс тем, что, с целью повышени быстродействи , каждый вычислительный блок содержит инвертор и дополнительный сумматор, выход Которого соединен со вторым выходом вычислительного блока, один вход которого соединен с первым входом дополнительного сумматора непосредственно, а другой вход через инвертор соединен со вторым входом дополнительного сумматора, причем входы первого вычислительного блока второго и треть- егЬ столбцов матрицы соединены с первыми выходами первого и второго вычислительных блоков первого и второго столбцов матрицы соответственно, входы второго вычислительного блока второго и третьего столбцов матрицы соединены с первыми выходами третьего и четвертого вычислитель I ных блоков первого н второго столбцов ;. матрицы соответственно, входы третьего вычислительного блока второго и третьего столбцов матрицы соединены со вторыми выходами первого и второго блоков первого и второго столбцов матрицы соответственно , входы четвертого вычислителыного блока второго и третьего столбцов матрицы соединены со вторыми выходами : третьего и четвертого вычислительных .блоков первого и второго столбцов матри;цы соответственно, а входы вычислительных блоков первого столбца и выходы вычислительных блоков третьего столбца матрицы соединены с входами и выходами процессора соответственно.Addition is performed as increments are received, in the example shown, from top to bottom, i.e., from higher increments to lower ones. Each computational unit 1 of a matrix parallel processor implements the expressions: (1) Bi4 ..- B, (2) where i is the column number of the matrix parallel processor, A, B are operands received at inputs 7 and 8 of computational unit 1, respectively . A: + 1, V. + 1 are the results of calculations arriving at the outputs 9 and 10 of the computing unit 1 of the matrix parallel processor, respectively. Computing units 1 are interconnected according to the graph describing the fast Hadamard transform. The number of inputs of a matrix parallel processor is always a multiple of a power of two. Then the number of processor columns will be equal to the value of the exponent, and the number of rows - the number of inputs divided in half. Each computational unit 1 processes information sequentially, starting with the higher bits. Computing unit 1 implements formulas (1) and (2), with adder 4 formula (1), and adder 5 - formula (2). Inverter 6 is needed to multiply the number, .lB by -1. The work of the matrix parallel pro processor is as follows. The inputs 2 are fed sequentially with information encoded as power-law during growth. The most significant bits of the result processed in the first column of the matrix parallel processor go to the second column for further processing, from there to the third column, and so on. At outputs 3, the leading bits of the final Hadamard transform result are added to inputs 2 the lower bits of the initial array of numbers continue to arrive. When the output of outputs 3 is sufficient to provide the required accuracy in the number of higher-level result, the calculation process can be stopped. It is easy to see that the operation of the computational unit 1 of the matrix parallel processor consists in receiving the next power increments of the operands and adding (subtracting) them in accordance with formulas (1) and (2). Power increments of the result are output from the computational unit of block 1, and the following pairs of increments are input to the input simultaneously with the output. It should be noted that each adder included in block 1 of the matrix parallel processor delays information by one clock cycle, which will be done from the above algorithm for adding two numbers represented as a power of increments. Obviously, the computing unit 1 of the matrix parallel processor in In general, information is also delayed by one cycle. Let us estimate the speed of the matrix parallel processor. It is determined by the delay of all computational blocks 1, i.e., the delay in the shortest serial circuit made up of computational blocks 1. In this case, the circuit length is equal to the number of columns of the matrix parallel processor, which can be calculated by the formula the time necessary to process the entire source array, times (in cycles) T to9., where H is the number of inputs of the matrix parallel processor equal to the number of numbers in the original volume-. the array being processed, and М - j is the number of most significant bits of the result, „; providing the necessary accuracy of calculations. If H is 1024, the duration of one clock cycle is 1 microsec, and M is 1O, then microsecond, which is 800 times faster than that of a known processor. Subject of the Invention Matrix parallel processor for calculating the Hadamard transform, containing computational blocks in matrix nodes made in the form of an adder, the inputs of each of which are connected to the inputs of a computational unit, and the first output of each computational unit is connected to In order to increase speed, each computing unit contains an inverter and an additional adder, whose output is connected to the second output of the computing unit, one in one of which is connected to the first input of the additional adder directly, and the other input through the inverter is connected to the second input of the additional adder, and the inputs of the first computational block of the second and third arcs of the matrix are connected to the first outputs of the first and second computational blocks of the first and second columns of the matrix, respectively, the inputs of the second computational block of the second and third columns of the matrix are connected to the first outputs of the third and fourth calculator of the first blocks of the first n second table btsov; matrices, respectively, the inputs of the third computing unit of the second and third columns of the matrix are connected to the second outputs of the first and second blocks of the first and second columns of the matrix, respectively, the inputs of the fourth calculating body block of the second and third columns of the matrix are connected to the second outputs: the third and fourth computing blocks of the first and second columns matrices, respectively, and the inputs of the computing blocks of the first column and the outputs of the computing blocks of the third column of the matrix are connected to the inputs and you processor moves, respectively.

22

хx

7- 87-8

Фиг.22

10ten