SU1388897A1

SU1388897A1 - Device for performing matrix operations

Info

Publication number: SU1388897A1
Application number: SU864134886A
Authority: SU
Inventors: Виктор Павлович Якуш; Станислав Георгиевич Седухин; Валентин Александрович Мищенко; Леонид Болеславович Авгуль; Олег Владимирович Подрубный
Original assignee: Минское Высшее Инженерное Зенитное Ракетное Училище Противовоздушной Обороны; Вычислительный Центр Со Ан Ссср
Priority date: 1986-10-15
Filing date: 1986-10-15
Publication date: 1988-04-15

Abstract

Изобретение относитс к области вычисТгительной техники и может быть использовано в специализированных вычислительных машинах и устройствах обработки данных. Цель изобретени - расширение функциональных возможностей устройства за счет вьтолнени дополнительных операций и повьш1ение быстродействи . Цель достигаетс тем, что в устройстве, содержащем тр однотипных процессорных элементов, где m и р - размерность матриц А и В соответственно , имеющих три perHctpa, умножитель и сумматор, в каждый процессорный элемент введены два регистра, ;три триггера, элемент И и элемент . И-НЕ. Особенностью функционировани устройства вл ютс параллельно-поточна организаци вычислений, синхронность исполнени различных частей алгоритма. 3 ил. с ю (ЛThe invention relates to the field of computing technology and can be used in specialized computers and data processing devices. The purpose of the invention is to expand the functionality of the device due to the implementation of additional operations and the increase in speed. The goal is achieved by the fact that in a device containing tr of the same type of processor elements, where m and p are the dimensions of the matrices A and B, respectively, having three perHctpa, a multiplier and an adder, two registers are entered into each processor element,; three triggers, the element And and the element . AND-NO. A feature of the operation of the device is the parallel-flow organization of calculations, the synchronization of the execution of various parts of the algorithm. 3 il. with y (L

Description

со: 00 00 00from: 00 00

со with

Изобретение относитс к вычислительной технике и может быть использовано в специализированных вычислительных машинах и устройствах обработки данных дл выполнени над матрицами операции С + АВ.The invention relates to computing and can be used in specialized computers and data processing devices for performing C + AB operations on matrices.

Цель изобретени - расширение функциональных возмолшостей устройства за счет выполнени Дополнительных операций и повышение его быстродействи .The purpose of the invention is to expand the functional capabilities of the device by performing Additional Operations and increasing its speed.

На фиг.1 представлена структурна схема устройства дл вьтолнени матричной операции С + АВ с размерностью матриц: А-(Зх2); В-(2x4); С-(3x4), :дл т 3, п 4ир 2;на фиг. 2 - функциональна схема соединенных 4 процессорных элементов; на фиг. 3 - временные диаграммы работы устройст- ва.Figure 1 shows a block diagram of a device for performing a matrix operation of C + AB with a matrix dimension: A- (3x2); B- (2x4); C- (3x4): for m 3, n 4ir 2; FIG. 2 - functional diagram of the connected 4 processor elements; in fig. 3 - timing charts of the device.

Устройство дл выполнени над матрицами операции С + АВ (фиг.1) дл , содержит информационные входы 1 , 1,., и 1 3 первой группы, информационные входы 2„ , 2,, 2, и 2э1 второй группы, инфор- ма ционные входы 3, и 3 третьей группы, вход 4 синхроимпульсов, процессорные элементы 5щ , 5,i ,, и информационные выходы 6, , 6j, и 63 группы устройства..The device for performing the C + AB operations on the matrixes (Fig. 1) for, contains information inputs 1, 1,., And 1 3 of the first group, information inputs 2, 2, 2, 2, and 2e1 of the second group, information inputs 3, and 3 of the third group, input 4 sync pulses, processor elements 5, 5, i ,, and information outputs 6,, 6j, and 63 device groups ..

Процессорный элемент (фиг.2) содержит первый 7, второй 8 и третий 9 информационные входы, регистр 10-14, триггеры 15-17, умножитель 18, сум- матор 19, элемент И 20, элемент НЕ 21 а также первый 22, второй 23 и третий 24 выходы.The processor element (figure 2) contains the first 7, second 8 and third 9 information inputs, register 10-14, triggers 15-17, multiplier 18, adder 19, element 20, element 21 and 21 as well as the first 22, second 23 and third 24 outs.

В основу работы устройства положен алгоритм умножени (тхр) - матрицы А ajj. на (рхп) - матрицу В b;j который определ ет результирующую матрицу D djjThe operation of the device is based on the multiplication algorithm (txp) - the matrix A ajj. on (pchp) matrix b b; j which defines the resulting matrix D djj

dii Cj. а;,Ъdii cj. a;, b

ж - Kj IWell - Kj I

1 i i m, 1 t j i n,1 i i m, 1 t j i n,

Если d°j C)j , TO на каждом следующем рекуррентном шаге k 1, р вы- зо полн етс множество вычисленийIf d ° j C) j, TO at each subsequent recurrent step k 1, p many calculations are performed

V V

а;.,Ь;к . a ;., b; k.

.J ч - .к.J h - .k

Особенност ми функционировани устройства вл ютс параллепьно-по- 55 точна организаци вычислений, синхронность исполнени различных частей алгоритма, сдвиг накапливаемых сумм:The features of the operation of the device are the parallel organization of calculations, the synchronism of the execution of various parts of the algorithm, the shift of the accumulated sums:

с 0 from 0

5 о 5 o

.. ..

5five

о about

5 five

d-- на очередном такте работы изd-- on the next tact of work from

5; -го процессорного элемента в 5,-2,4.| -и процессорный элемент (i l,m; ,p-l) и запись элементов а; в соответствующий ij-й процессорный элемент .five; -th processor element in 5, -2,4. | -and processor element (i l, m;, p-l) and writing elements a; to the appropriate ij-th processor element.

Устройство работает следующим образом .The device works as follows.

В исходном состо нии регистры 10- 14 и триггеры 15-17 устанавливаютс в нулевое состо ние. Элементы (Ь- ,1) и (cjj , 1) подаютс на входы на е-м такте вместе с дополнительным единичным разр дом. Очередность подачи элементов матриц В и С показаны в форме параллелограммов на фиг.1. Элементы в ; подаютс на входы процессорных элементов без дополнительного разр да , очередность подачи элементов матрицы А показана в форме треугольника на фиг.1. In the initial state, the registers 10-14 and the triggers 15-17 are set to the zero state. The elements (L-, 1) and (cjj, 1) are fed to the inputs on the e-clock cycle along with an additional bit. The sequence of submission of the elements of the matrices b and C are shown in the form of parallelograms in figure 1. Items in; served to the inputs of the processor elements without additional discharge; the order of feeding the elements of the matrix A is shown in the form of a triangle in FIG.

На нулевом такте элементы а, , (Ь„ , 1) и (с°, , 1) подаютс на соответствующие входы элементов 5, . По переднему фронту тактового импульса (фиг.3). элемент а,, записываетс в регистры 11 и 12, так как регистры 11 и 12 реализованы на однотактных триггерах , а триггер 16 в исходном состо нии разрешает запись в регистр 12, По заднему фронту тактового импульса триггер 16 измен ет свое состо ние на противоположное и запрещает запись в регистр 12. На выходе элемента НЕ 2 формируетс единичный сигнал , который разрешает запись элемента а° в регистр 13, который также реализован на однотактных триггерах. Регистры 10 и 14 реализованы на двухтактных триггерах, следовательно, по заднему фронту тактового импульса в них записываютс соответственно элементы с„ и Ь„ . Аналогично двухтактные триггеры 15 и 17 дополнительными единичными разр дами устанавливаютс в единичное состо ние. С выхода умножител 18 произведение а,, Ъ,, подаетс на вход сумматора 19, на второй вход которого подаетс c , На выходе сумматора формируетс At the zero stroke, the elements a,, (b, 1) and (c °,, 1) are fed to the corresponding inputs of the elements 5,. On the leading edge of the clock pulse (figure 3). element a, is written to registers 11 and 12, since registers 11 and 12 are implemented on one-stroke triggers, and trigger 16 in the initial state allows writing to register 12. On the falling edge of the clock pulse, trigger 16 changes its state to the opposite and disables writing to the register 12. At the output of the NOT 2 element, a single signal is generated, which permits writing the element a ° to the register 13, which is also implemented on one-stroke triggers. Registers 10 and 14 are implemented on push-pull triggers, therefore, on the trailing edge of the clock pulse, elements with and are recorded respectively. Similarly, the push-pull triggers 15 and 17 are set to one unit by additional unit bits. From the output of the multiplier 18, the product a ,, b, is fed to the input of the adder 19, to the second input of which c is fed. At the output of the adder

d,, с„ + а„ Ь,,.d ,, c „+ a„ b ,,.

На первом такте на соответствую- .щие входы элемента 5ц подаютс элементы а,In the first cycle, the corresponding inputs of element 5c are supplied to elements a,

, (bji , ) и (с , I), (bji,) and (c, i)

на соответствующие входы элемента 5, - (bj, ,1) и (d,, , 1), на соответствующие вхо ш элемента 5, - а , (Ь ,. , I) и (с, ,1).to the corresponding inputs of the element 5, - (bj,, 1) and (d ,,, 1), to the corresponding inputs of the element 5, - a, (b,., I) and (c, 1).

В элементе 5„ в регистр 13 записываетс элемент а., , в регистр 10 I , ,In the element 5 "in the register 13 the element a is written.,, In the register 10 I,,

элемент с, , в регистр 14 - элементelement with,, in register 14 - element

b11 , на выходе сумматора формируетс d,,,j. В элементе 5,,i в регистр 12 записываетс элемент а, в регистр 10 - d,, , в регистр 14 - b|,j на выходе сумматора формируетс d,, d,, а b,, В элементе 5, в регистр 12 записываетс элемент а, , в регистр 10 - элемент Cj, , в регистр 14 - b,, , на выходе сумматора формируетс d, с, + + а, Ь„ .b11, d ,,, j is formed at the output of the adder. In the element 5,, i, the element a is written into the register 12, and the register 10 is d-d, and the register 14 is b |, j is formed at the output of the adder d, d, a, b, in the element 5, into the register 12 the element a is written to the register 10 - the element Cj, to the register 14 - b ,,, at the output of the adder d, c, + + a, b are formed.

На втором такте в элементе 5 в регистр 12 записываетс элемент а триггер 16 измен ет свое состо ниеIn the second cycle in element 5, the element 12 is recorded in register 12 and the trigger 16 changes its state

и запрещает запись в регистр 12, на and prohibits writing to register 12, on

,-i выходе сумматора формируетс d, , -i output of the adder is formed d,

dii + 1. В элементах 5;; на последующих тактах аналогичным образом формируютс d,-; , показанные на временных диаграммах (фиг.З). dii + 1. In elements 5 ;; in subsequent cycles, d, - are formed in a similar manner; shown in time diagrams (fig.Z).

Значени соответствующих элементов d;: результирующей матрицы D формируютс на соответствующих выходах элементов . Количество тактов работыThe values of the corresponding elements d;: the resulting matrix D are formed at the corresponding outputs of the elements. The number of cycles

устройства равно (m+n+p-2). Длительность такта работы устройства определ етс выражениемdevice equals (m + n + p-2). The duration of the device operation is determined by the expression

, ,

Т tp t t,.T tp t t.

где. tp - врем записи в регистр} tn врем умножени ; tf. - врем суммировани .Where. tp is the write time in register} tn multiplication time; tf. - summation time.

Дл выполнени матричной операции С + АВ над новым потоком данных на соответствующих тактах подаютс нулевые дополнительные (К + 1)-е разр - ды на входы 1 и 3 элементов 5- дл установки в исходное состо ние триггеров 16, которые разрешают записьTo perform the matrix operation C + AB over the new data stream, zero additional (K + 1) -e bits are fed to the inputs 1 and 3 of the elements 5- to set the triggers 16 in the initial state, which allow recording

элементов a;j в регистры 12.elements a; j into registers 12.

Дл выполнени операции АВ, АВ, АВ, и т.д. в элементах 5,. регистры 12 содержат соответствующие элементы аTo perform AB, AB, AB, etc. operations. in elements 5 ,. registers 12 contain the corresponding elements a

jj, на входы 1 подаютс , аjj, inputs 1 are supplied, and

,T ч - Ъ , T h - b

на входы 3 - Ь;.to inputs 3 - b ;.

Таким образом, предлагаемое устройство обладает более широкими функциональными возможност ми по сравнению с известным, так как в последнем выполн етс только перемножение двух патриц, а в предлагаемом устройстве реализуютс матричные операции: матричное накопление С + АВ; определ 5Thus, the proposed device has more functionality than the known one, since in the latter only the multiplication of two patches is performed, and the proposed device implements matrix operations: matrix accumulation of C + AB; defined 5

д,. . «г d. . "G

-,«-, "

и and

2525

30thirty

3535

4545

5050

5555

ютс цепочки матриц АВ,, АВ, , АВ и т.д.chains of matrices AB, AB, AB, etc.

Кроме того, предлагаемое устройство обладает структурой с произвольной размерностью га, п и р.In addition, the proposed device has a structure with an arbitrary dimension of ha, n, and p.

Claims

Invention Formula

A device for performing matrix operations, where A is the size matrix (txp), B is the matrix (pxp), C is the matrix (gahp) containing txp of the same type of processor elements containing three registers, a multiplier, cyMMrftop, the information input of the i.1 st processor element (, m) is connected to the information input of the first group of the device, the second information inputs; d.1 th and lj-ro processor elements () are connected respectively to the il-M and 1.JM information inputs the second group of the device, the third information input lj-ro processor element This (, p) is connected to the jM informational input of the third group of the device, whose sync input is connected to the sync inputs ij-x of the processor elements, the first output of the iz-ro processor element (p-1) is connected to the first information input (i.zfl) -ro processor element, the second output i, j-ro of the processor element is connected to the second information input (, j + l) -ro of the processor element, the third output fj-ro of the processor element (, m-1) is connected to the third information input (f +1, j) -ro processor element, the first output of the ip-ro processor element Connected to the i-th output of the device group, characterized in that, in order to expand the functionality by performing additional operations and speeding up, two registers, three triggers, the AND element and the NOT element, and the first register information input are entered into each processing element and the information input of the first trigger is connected respectively to the bits from the first to the K-th and (K + 1) -th bit of the first information input of the processor element, where K is the matrix element size, the output of the first register the country is connected to the first input of the adder, the output of the adder and the output of the first trigger are connected respectively to the bits from the first to the K-th and (K + 1) -th bit of the first output of the processor element, the information input of the second register is connected to the second information input processor element, the enable input of the second register is connected to the input of the NOT element, the output of which is connected to the recording input of the fourth register, whose information input fo is combined with the information input of the third register and connected to you Odom second register, the fourth register output is connected to the second output of the processing element, the third write enable yhod PE t HCTpa connected to the output member and having a first input coupled to the data inputs of the first trig- fepa, and the second input - to informational

the third trigger input, the information input of the fifth register and the information input of the third trigger are connected respectively to the bits from the first to the Kth and (K + 1) -th bit of the third information input of the processor element; the outputs of the third and fifth registers are connected respectively The first and second inputs of the multiplier, the output of which is connected to the second input of the adder, the output of the fifth register and the output of the third register are connected respectively to the bits from the first to the K-th and (K + 1) -th bit of the third output of the processor ele cient, write enable inputs of the first, second and fifth registers, the first, second and third flip-flops are combined and connected to the clock processing element.

at L 4 TJ |

j-ra L - rf

iL, 4JL, i / v it

g gi2-y 4 "

/

rSb "

FI.2