SU1656560A1

SU1656560A1 - Sparse matrix multiplier

Info

Publication number: SU1656560A1
Application number: SU884498372A
Authority: SU
Inventors: Лариса Дмитриевна Елфимова; Владимир Викторович Коломейко; Игорь Григорьевич Мороз-Подворчан; Валерий Дисанович Петущак
Original assignee: Институт кибернетики им.В.М.Глушкова
Priority date: 1988-10-24
Filing date: 1988-10-24
Publication date: 1991-06-15

Abstract

Изобретение относитс к вычислительной технике и может Ьыть использовано в специализированных вычислительных машинах дл умножени разреженных и сверхрэзреженных матриц Цель изобретени - сокращение аппаратурных затрат Устройство содержит два блока пам ти дл хранени ненулевых элементов разреженных матриц, блок пам ти дл хранени ненулевых элементов i-й строки одной из исходных матриц со значени ми индексов строк, вычислительный блок, регистры, блоки элементов ИЛИ И, элементы ИЛИ, НЕ, элемент И. Цель изобретени достигаетс за счет хранени и обработки только ненулевых элементов перемножаемых матриц, что позвол ет использовать один вычислительный блок независимо от пор дка перемножаемых матриц. 3 илThe invention relates to computing and can be used in specialized computers to multiply sparse and supergenerated matrices. The purpose of the invention is to reduce hardware costs. The device contains two memory blocks for storing non-zero elements of sparse matrices, a memory block for storing non-zero elements of the i-th row of one from source matrices with values of row indices, computing block, registers, blocks of elements OR AND, elements OR, NOT, element I. The purpose of the invention is achieved and the bill storing and processing only the non-zero elements multiplicand matrix that permits the use of a computing unit, regardless of the order multiplicand matrices. 3 silt

Description

fefe

Изобретение относитс к вычислительной технике и может быть использовано в специализированных вычислительных машинах дл умножени разреженных матриц одного пор дка.The invention relates to computing and can be used in specialized computers to multiply sparse matrices of the same order.

Цель изобретени - сокращение аппаратурных затрат.The purpose of the invention is to reduce hardware costs.

На фиг.1 изображена структурна схема устройства; на фиг.2 - схема вычислительного блока; на фиг.З - функциональна схема блока управлени .1 shows a block diagram of the device; figure 2 - diagram of the computing unit; FIG. 3 is a functional block diagram of the control unit.

Устройство (фиг.1) содержит первый 1, второй 2 и третий 3 блоки пам ти, первый 4, четвертый 5, п тый 6, второй 7, шестой 8 и седьмой 9 регистры, вычислительный блок 10, третий 11, восьмой 12 и дев тый 13 регистры , четвертый 14, п тый 15, шестой 16, седьмой 17, первый 18, второй 19, третий 20, восьмой 21 и дев тый 22 блоки элементов И,The device (Fig. 1) contains the first 1, second 2 and third 3 memory blocks, first 4, fourth 5, fifth 6, second 7, sixth 8 and seventh 9 registers, computing unit 10, third 11, eighth 12 and nine the 13th registers, the fourth 14, the fifth 15, the sixth 16, the seventh 17, the first 18, the second 19, the third 20, the eighth 21 and the ninth 22 blocks of elements And,

первый 23, второй 24 и третий 25 блоки элементов ИЛИ. элемент И 26, второй 27, третий 28 и первый 29 элементы ИЛИ, первый 30 и второй 31 элементы НЕ, бпок 32 управлени , первую (33-35)и вторую (36-38) группы информационных входов устройства , вход 39 управлени записью и тактовый вход 40 устройства, а также группу выходов 41-43the first 23, the second 24 and the third 25 blocks of elements OR. the AND 26 element, the second 27, the third 28 and the first 29 OR elements, the first 30 and second 31 NOT elements, control box 32, the first (33-35) and second (36-38) groups of device information inputs, write control input 39 and clock input 40 of the device, as well as a group of outputs 41-43

Вычислительный блок (фиг 2) включает умножитель 44 и накапливающий сумматор 45.Computing unit (Fig 2) includes a multiplier 44 and accumulating adder 45.

Блок управлени (фиг. 3) образуют счетчик 46 адреса, триггеры 47-49 и элементы И 50-53.The control unit (FIG. 3) forms the address counter 46, triggers 47-49, and AND elements 50-53.

В основу работы устройства положен алгоритм умножени матрицы А faij на матрицу В bij, определ ющий матрицуThe device is based on the algorithm for multiplying the matrix A faij by the matrix B bij, which determines the matrix

ОABOUT

ел о ел о оate oh oh oh oh

С cij (I 1n; j - 1n, где п - пор докС cij (I 1n; j - 1n, where n is the order of

матрицы):matrices):

nn

cij 2 aik bkj(1)cij 2 aik bkj (1)

k - 1k - 1

В случае плотных матриц определение любого элемента Cij потребовало бы п-крат- ного выполнени операции накоплени In the case of dense matrices, the determination of any element of Cij would require π-fold execution of the operation

парных произведений cijk cij1 1 + aiK bkjpaired works cijk cij1 1 + aiK bkj

(2)(2)

В случае разреженных и сверхразреженных матриц произвольной структуры определение любого элемента cij потребует не более -кратного выполнени операций накоплени , а в общем случае « Ј, где Ј - максимальное количество ненулевых элементов в перемножаемых строке и столбце исходных матриц, и Ј п. Величина Ј отражает степень разреженности строки и столбца исходных матриц.In the case of sparse and super-rarefied matrices of arbitrary structure, the definition of any element cij will require no more than -fold accumulation operations, and in the general case ", where Ј is the maximum number of nonzero elements in the multiplied row and column of the original matrices, and Ј p. The value reflects the degree of sparseness of the row and column of the original matrices.

Кроме того, в разреженных матрицах общего вида ненулевые элементы распределены произвольно в строках и столбцах матриц, и при определении любого элемента cij результирующей матрицы необходимо находить парные сомножители. Из формулы (1) видно, что величина индекса k дл парных элементов aik и bkj матриц А и В одинакова , Таким образом, если переписать ненулевые элементы aik i-й строки разреженной матрицы А со значени ми I их индекса строки в блок пам ти по адресу, равному значению индекса k при этом элементе , то дл получени парного сомножител их этой строки дл элементов столбцов bkj матрицы В будем обращатьс к этому блоку пам ти по адресу, равному значению индекса k при элементе bkj При этом элементы 1-й строки Cij результирующей матрицы С будут иметь индекс строки, равный индексу i элемента aik. и индекс столбца, равный индексу j элемента bk|.In addition, in sparse matrices of a general form, nonzero elements are arbitrarily distributed in rows and columns of matrices, and when defining any element cij of the resulting matrix, it is necessary to find paired factors. It can be seen from formula (1) that the index k for the paired elements aik and bkj of the matrices A and B is the same. Thus, if we rewrite the nonzero elements aik of the i-th row of the sparse matrix A with the values I of their row index in the memory block by for an address equal to the value of the index k for this element, then to obtain the pair factor of this row for the elements of the columns bkj of the matrix B, we will access this memory block at an address equal to the value of the index k for the element bkj At the same time, the elements of the 1st row Cij of the resulting matrices C will have an index row, p The index i is aik. and the column index equal to the index j of the element bk |.

В данном случае будем рассматривать разреженную матрицу В с одним ненулевым элементом в столбце при любой степени разреженности матрицы А.In this case, we will consider a sparse matrix B with one nonzero element in the column for any degree of sparsity of the matrix A.

Устройство работает следующим образом .The device works as follows.

По сигналу с первого выхода блока управлени последовательно формируютс адреса, в соответствии с которыми производитс запись чисел первого и второго трех- мерных массивов, представл ющих соответственно разреженные матрицы А и В, в блоки 1 и 2 пам ти. Одновременно осуществл етс запись ненулевых элементов aik 1-й строки матрицы А с их значени ми индекса строки и индекса столбца в регистры 4-6 через блоки 14-16 элементов И. При этом блок управлени по второму выходуThe signal from the first output of the control unit sequentially generates addresses, according to which the numbers of the first and second three-dimensional arrays, representing the sparse matrices A and B, respectively, are written into blocks 1 and 2 of memory. At the same time, the non-zero elements aik of the 1st row of the matrix A with their row index and column index values are written to registers 4-6 through blocks 14-16 of elements I. At the same time, the control unit on the second output

выдает тактовые импульсы, которые открывают блоки 14 -16 элементов И. и через блоки 23 25 элементов ИЛИ производитс запись значений чисел, записанных в регистрах 4 и 5. в блок 3 пам ти по адресу, значение которого записано в регистре 6. Окончание 1-й строки первого массива чисел определ етс по влением в регистре 6 нулевого кода, который фиксируетс блокомproduces clock pulses that open blocks 14 -16 elements I. and blocks 23 25 elements OR record the values of numbers recorded in registers 4 and 5. into memory block 3 at an address whose value is recorded in register 6. End 1- The first line of the first array of numbers is determined by the appearance in register 6 of a zero code, which is fixed by the block

0 управлени по входу признака окончани строки, и по второму выходу блока управлени запрещаетс передача чисел через блоки 14-16 элементов И После записи чисел обоих массивов в блоки 1 и 2 пам ти и i-й строки0 control on the input of the sign of the end of the line, and the second output of the control block prohibits the transfer of numbers through blocks of 14-16 elements AND After writing the numbers of both arrays in blocks 1 and 2 of memory and the i-th line

5 первого массива в блок 3 пам ти блок управлени по первому выходу формирует адрес первой чейки.5 of the first array in the memory block 3; the control unit on the first output generates the address of the first cell.

В соответствии с первым адресом из второго блока пам ти в регистры 7-9 считыва0 ютс соответственно значение элемента bkj. значение индекса k и значение индекса j.In accordance with the first address from the second memory block, registers 7-9 are read out, respectively, the value of the element bkj. index value k and index value j.

Блок управлени сигналом по третьему выходу открывает блок 17 элементов И, и через блок 25 элементов ИЛИ осуществл 5 етс передача содержимого регистра 8 в регистр 6. По этому адресу из третьего блока 3 пам ти считываютс значение aik и значение его индекса i в регистры 4 и 5 соответственно . При этом значени элементов aik иThe signal control unit on the third output opens block 17 of AND elements, and through block 25 elements OR transfers 5 the contents of register 8 to register 6. At this address, the value of aik and its index i value into registers 4 are read from the third memory block 3 and 5, respectively. The values of the elements aik and

0 bkj одновременно передаютс в вычислительный блок 10 через блоки 18 и 19 элементов И.0 bkj are simultaneously transmitted to computing unit 10 via blocks 18 and 19 of elements I.

Если числа по этому адресу не оказалось в блоке 3 пам ти, о чем свидетельству5 ет нулевое значение кода в регистре 5, то сигнал с выхода регистра 5 через элемент ИЛИ 28 запирает блоки 18 и 19 элементов И. Из блока 2 пам ти считываетс следующий элемент массива чисел в регистры 7-9.If the numbers at this address did not appear in memory block 3, as evidenced by the zero code value in register 5, then the signal from the output of register 5 through the OR element 28 locks the blocks 18 and 19 of the elements I. The next element is read from memory block 2 array of numbers in registers 7-9.

0 Окончание первого столбца массива чисел, записанного в блоке 2 пам ти, определ етс по влением нулевого кода в регистре 8, сигналы с выхода которого, проход через элемент ИЛИ 27 и элемент НЕ 31, открывает0 The end of the first column of the array of numbers recorded in memory block 2 is determined by the appearance of a zero code in register 8, the output of which, passing through the element OR 27 and the element NOT 31, opens

5 блоки 20-22 элементов И, и осуществл етс передача числа cij из вычислительного блока в регистр 11 через блок 20 элементов И, значени индекса строки I элемента cij из регистра 5 в регистр 12 через блок 21 эле0 ментов И. значени индекса столбца J элемента Cij из регистра 9 в регистр 13 через блок 22 элементов И. Таким образом, с выходов 41-43 устройства снимаетс элемент результирующей матрицы cij, образующей5 с путем получени парных сомножителей в умножителе 44 и накоплени их в сумматоре 45. При этом накапливающий сумматор 45 обнул етс .5 blocks 20-22 of the AND elements, and the number cij is transferred from the computing unit to the register 11 via the AND block 20, the index value of the row I of the element cij from register 5 to the register 12 via the block 21 of the elements I. Cij from register 9 to register 13 through block 22 elements I. Thus, from outputs 41-43 of the device, an element of the resulting matrix cij is formed, forming 5 s by obtaining the paired factors in multiplier 44 and accumulating them in adder 45. At the same time, accumulating adder 45 has zero is.

Таким образом, в каждом такте работы устройства в регистры 7-9 из блока 2 пам тиThus, in each clock cycle of the device operation into registers 7-9 of block 2 of memory

считываютс элементы всех столбцов матрицы В, которые умножаютс на элементы i-й строки матрицы А, и с выходов 41-43 устройства снимаетс элемент результирующей матрицы cij. Аналогичным образом осуществл етс умножение всех столбцов матрицы В на следующую, (i+Л-ю строку матрицы Л котора считываетс из блока 1 пам ти по сигналу, поступающему с четвертого выхода блок управлени , фиксирующему сигнал по входу признака последнего столбца, поступающему с выхода последнего разр да регистра 8 через эле мент И 26.the elements of all the columns of matrix B, which are multiplied by the elements of the i-th row of matrix A, are read, and the element of the resulting matrix cij is removed from the outputs 41-43 of the device. Similarly, all columns of matrix B are multiplied by the next, (i + L-th row of matrix L, which is read from memory block 1 by a signal received from the fourth output of the control block, which captures the signal at the input of the sign of the last column coming from the last bit register 8 through the element And 26.

Claims

Invention Formula

A device for multiplying sparse matrices containing two memory blocks, a computing block, three AND blocks, three registers, a control block, the first information inputs of the first and second memory blocks being connected respectively to the first information inputs of the first and second groups of device inputs, The first output of the control unit is connected to the inputs of the address of the first and second memory blocks, the first output of the first register is connected to the first input of the first block of elements I, the output of which is connected to the first information input home computing unit, the information input of the second register is connected to the first output of the second memory block, the output of the second register is connected to the first input of the second block of elements And, the output of which is connected to the second information input of the computing block, the output of which is connected to the first input of the third block of elements And, the output of which is connected to the information input of the third register, the output of which is the first output of the device, characterized in that, in order to reduce hardware costs, the device contains t This memory block, the fourth - the ninth registers, the fourth - the ninth blocks of AND elements, three blocks of elements OR three elements of OR, two elements of NOT, the AND element, and the first information input of the first group of device inputs is connected to the first input of the fourth block of AND elements The second and third information inputs of the first memory block are combined with the first inputs of the fifth and sixth blocks of the AND elements, respectively, and are the second and third information inputs of the first group of the device, respectively. The second

the inputs of the fourth, fifth and sixth blocks of the And elements are connected to the second output of the control unit; the outputs of the fourth, fifth and sixth blocks of the And elements are connected 5 respectively to the first inputs of the first, second and third blocks of the OR elements, the second inputs of which are connected respectively to the first, the second and third outputs of the first memory block,

0 the output of the first, second and third blocks of the OR elements are connected respectively to the first information input of the first register, information inputs of the fourth and fifth registers, the second output

5 of the first register and the first output of the fourth register are connected respectively to the first and second information inputs of the third memory block, the first and second outputs of which are connected to the second information inputs of the first and fourth registers, the output of the fifth register is connected to the address input of the third memory block and the inputs of the first element OR, the output of which is connected to the input of the element

5 NOT. the output of which is connected to the input of the sign of the end of the row of the control unit, the recording control input and the clock input of which is connected to the device of the same name, the third output of the control unit is connected to the first input of the seventh

of the block of elements and the output of which is connected to the third input of the third olok

OR the fourth output of the control unit is connected to the control input of the recording and reading of the first memory block, the second and third information inputs of the second memory block are connected respectively to the second and third information inputs of the second device group, the second and third outputs

0 of the second memory block are connected respectively to the information inputs of the sixth and seventh registers, the output of the sixth register is connected to the second input of the seventh block of elements And, and the inputs of the second

5 of the OR element, the output of which is connected to the input of the second element NOT, the output of which is connected to the second input of the third block of elements AND, the first inputs of the eighth and ninth blocks of the elements AND and the first entrance of the D element and the second input and output of which are connected respectively to the output of the last bit of the seventh register and the input of the sign of the last column of the control unit; the output of the seventh register is connected to the second input of the ninth And block; the second input of the eighth And block is connected to the second output of the fourth register and and the input of the third element OR, the output of which is connected to the second inputs of the first and second blocks

elements And, the outputs of the eighth and ninth blocks of elements And are connected to the information inputs, respectively, eighthJJ 31 35

Matrix B Elements

h

Items

matrices A aih

FIG. 2

first and ninth registers whose outputs are the second and third outputs of the device respectively

36 J7 38

figure 1

Motritsb / s elements

C

v

figs