SU1737462A1

SU1737462A1 - Device for performing operations on matrices

Info

Publication number: SU1737462A1
Application number: SU904810759A
Authority: SU
Inventors: Валерий Анатольевич Грачев; Георгий Александрович Кухарев
Original assignee: Центральный научно-исследовательский институт "Морфизприбор"
Priority date: 1990-04-06
Filing date: 1990-04-06
Publication date: 1992-05-30

Abstract

Изобретение относитс к вычислительной технике и может быть использовано в специализированных системах цифровой обработки информации. Цель изобретени - расширение функциональных возможностей за счет решени задачи разложени Холецкого симметричных матриц. Цель достигаетс тем, что в устройство, содержащее п блоков пам ти и (п - 1) вычислительных блоков, введен дополнительный n-й вычислительный блок, позвол ющий выполн ть операции делени и извлечени квадратного корн и организовать вычислительный процесс, в котором на каждой итерации матрица представл етс в факторизованной форме. 1 з.п. ф-лы, 5 ил.The invention relates to computing and can be used in specialized systems of digital information processing. The purpose of the invention is to expand the functionality by solving the problem of decomposing Cholesky symmetric matrices. The goal is achieved by introducing an additional nth computing unit into the device containing n memory blocks and (n - 1) computing blocks, which allow to perform division and extraction operations of the square root and organize the computational process in which at each iteration the matrix is presented in a factorized form. 1 hp f-ly, 5 ill.

Description

Изобретение относитс к области вычислительной техники и может быть использовано в специализированных системах цифровой обработки информации.The invention relates to the field of computer technology and can be used in specialized digital information processing systems.

Известен р д устройств, реализующих операции над матрицами, например однородна параллельна вычислительна структура дл вычислени произведени матрицы на вектор матричный вычислитель; устройство дл выполнени матричных операций.A number of devices implementing matrix operations are known, for example, a homogeneous parallel computing structure for calculating the product of a matrix by a vector matrix calculator; device for performing matrix operations.

Указанные устройства содержат двумерную матрицу операционных блоков и средства организации вычислительного процесса (регистры, триггеры, блоки синхронизации ). Все эти устройства позвол ют выполн ть операции над матрицами (умножение матрицы на вектор и т.д.) и имеют следующие недостатки, большие аппаратные затраты при реализации двумерных матриц операционных блоков; сложностиThese devices contain a two-dimensional matrix of operational blocks and tools for organizing the computational process (registers, triggers, synchronization blocks). All these devices allow to perform operations on matrices (multiplication of a matrix by a vector, etc.) and have the following disadvantages, high hardware costs in the implementation of two-dimensional matrices of operational blocks; difficulties

организации многомерного ввода информации; неэффективное использование вычислительного ресурса при изменении размерности задачи; невозможность выполнени разложени матриц.organization of multidimensional information input; inefficient use of computing resources when changing the dimension of the problem; inability to perform matrix decomposition.

Известны устройства, предназначенные дл LU-разложени матриц, например, систолические процессоры дл матричных вычислений.Devices for LU matrix decomposition are known, for example, systolic processors for matrix calculations.

Эти устройства также содержат двумерные матрицы операционных блоков и, следовательно , имеют указанные недостатки. Кроме того, устройства не позвол ют выполн ть разложение матриц по алгоритму Холецкого.These devices also contain two-dimensional arrays of operational blocks and, therefore, have the indicated disadvantages. In addition, devices do not allow decomposition of matrices using the Cholesky algorithm.

Известен р д устройств дл выполнени матричных операций, например систолические процессоры дл г/атричных операций.A number of devices for performing matrix operations are known, for example, systolic processors for g / atrial operations.

Эти устройства содержат линейку операционных блоков, позвол ют более эффекVIThese devices contain a line of operating units that allow more efficient VI.

СОWITH

VIVI

N а NDN and ND

тивно использовать вычислительный ресурс , но не позвол ют выполн ть разложение Холецкого симметричных матриц.It is useful to use a computational resource, but they do not allow the decomposition of Cholesky symmetric matrices.

Наиболее близким к данному вл етс устройство, содержащее две матрицы из п запоминающих чеек и п вычислительных чеек соответственно, причем кажда запоминающа чейка представл ет собой дуальный буфер из пам ти типа FIFO, а кажда вычислительна чейка содержит регистры, умножитель и сумматор.Closest to this is a device containing two matrices of n memory cells and n computation cells, respectively, each memory cell being a dual buffer from a FIFO memory, and each computer cell contains registers, a multiplier and an adder.

Устройство обеспечивает эффективную обработку многомерных сигналов, в частности спектральный анализ. Недостатком этого устройства вл ютс ограниченные функциональные возможности в задачах цифровой обработки сигналов, не позвол ющие выполнить целый р д алгоритмов, св занных с разложением Холецкого симметричных матриц.The device provides efficient processing of multidimensional signals, in particular, spectral analysis. A disadvantage of this device is the limited functionality in problems of digital signal processing, which does not allow the execution of a number of algorithms associated with the decomposition of Cholesky symmetric matrices.

Целью изобретени вл етс расширение функциональных возможностей устройства путем выполнени разложени Холецкого симметричных матриц.The aim of the invention is to enhance the functionality of the device by performing the decomposition of Cholesky symmetric matrices.

Поставленна цель достигаетс тем, что в устройство введена дополнительна вы числительна чейка с соответствующими св з ми, позвол юща выполн ть операции извлечени квадратного корн и делени , а также новые архитектурные решени вычислительных чеек, вход щих в матрицу вычислительных чеек.The goal is achieved by adding an additional computational cell with corresponding connections to the device, which allows performing square root and division operations, as well as new architectural solutions of the computational cells included in the matrix of computational cells.

Дл обеспечени вычислени разложени Холецкого симметричных матриц в известноеустройствовведена дополнительна вычислительна чейка, котора позвол ет на каждом шаге итерационного процесса вычисл ть один вектор-столбец нижней треугольной матрицы в разложении Холецкого. Кроме того, новые архитектурные решени вычисли тельных чеек позволили организовать вычислительный процесс таким образом, чтобы на каждой итерации матрица А представл лась в факторизованной форме Aw F,+iB{H %Vi, где F,+i - матрица Фробе- ниуса, (i + 1)-й столбец которой совпадает с (i + 1)-м столбцом нижней треугольной мат- ы в разложении Холецкого, а матрицаTo ensure the calculation of the Cholesky decomposition of symmetric matrices into the known device, an additional computational cell is introduced, which allows at each step of the iterative process to calculate one column vector of the lower triangular matrix in the Cholesky decomposition. In addition, new architectural solutions of computational cells allowed organizing the computational process in such a way that at each iteration the matrix A was represented in the factorized form Aw F, + iB {H% Vi, where F, + i is the Frobenius matrix, (i The + 1) th column of which coincides with the (i + 1) th column of the lower triangular mat in the Cholesky decomposition, and the matrix

--

В1 имеет видB1 has the form

вГ в 1 11--- -0--, о 1 )wg in 1 11 --- -0--, about 1)

На фиг. 1 представлена функциональна схема предлагаемого устройства как пример конкретной реализации; на фиг. 2 - функциональна схема n-й вычислительной чейки; на фиг. 3 - функциональна схемаFIG. 1 shows a functional diagram of the proposed device as an example of a specific implementation; in fig. 2 - functional diagram of the n-th computational cell; in fig. 3 - functional scheme

j-й вычислительной чейки Q 1, п 1); на фиг. 4 - организаци потоков информации в вычислительных чейках; на фиг. 5 - временна диаграмма работы вычислительных чеек.j-th computational cell Q 1, n 1); in fig. 4 - organization of information flows in computational cells; in fig. 5 - time diagram of the work of computational cells.

5Устройство (фиг. 1) дл разложени Холецкого симметричных матриц содержит информационный вход 1, матрицу 2 из п запоминающих чеек (ЗЯ) 3, информационный выход 4, n-ю вычислительную чейку5 A device (Fig. 1) for decomposing Cholesky symmetric matrices contains information input 1, a matrix 2 of n memory cells (3) 3, information output 4, the n-th computational cell

10 (ВЯ)5, матрицу 6 из(, причем первые вход и выход j-й 0 2, п) ВЯЗ соединены с первыми выходом и входом соответственно (j - 1)-й ВЯ7, а первые вход и выход первой ЗЯЗ соединены с первыми выходом и вхо15 дом n-й ВЯ5; второй выход j-й Q ТГгРОЗЯЗ подключен к второму выходу (j + 1)-й ЗЯЗ, второй выход n-й ЗЯЗ вл етс информационным выходом 4 устройства, а второй вход первой ЗЯЗ вл етс информационным вхо20 дом 1 устройства; второй и третий выходы j-й (j 1, п-2) ВЯ7 подключены соответственно к второму и третьему входу Q + 1)-й ВЯ7, а второй и третий выходы n-й ВЯ7 подключены к второму и третьему входам первой10 (WL) 5, the matrix 6 of (, with the first input and output of the j-th 0 2, p) ANM connected to the first output and input, respectively, (j - 1) -WN7, and the first input and output of the first SNFZ connected to the first exit and entrance of the 15th nyazyi5; The second output of the jth Q THRDTNECT is connected to the second output of the (j + 1) -th PLNZ, the second output of the nth PLNZ is information output 4 of the device, and the second input of the first SNVZ is information input 20 of the device 1; The second and third outputs of the j-th (j 1, p-2) V7 are connected to the second and third inputs of Q + 1, respectively, on the V-7, and the second and third outputs of the n-th V7 connected to the second and third inputs of the first

25 ВЯ7; четвертый вход j-й (j 1, п-2) ВЯ725 VY7; fourth entrance j-th (j 1, p-2) ВЯ7

соединен с четвертым выходом 0 + 1)-й ВЯ7, а четвертый выход первой ВЯ7 подключен к второму входу n-й ВЯ5.connected to the fourth output 0 + 1) VY7, and the fourth output of the first VYA7 connected to the second input of the n-th VYa5.

Вычислительна чейка с номером п со30 держит (фиг. 2) первый вход 5.1, первый выход 5.2, третий выход 5.3, второй вход 5.4, второй выход 5.5, регистр 8, блок 9 вычислени квадратного корн , мультиплексор 10, блок 11 вычислени обратной величины,The computational cell with the number p co30 holds (Fig. 2) the first input 5.1, the first output 5.2, the third output 5.3, the second input 5.4, the second output 5.5, register 8, block 9 calculating the square root, multiplexer 10, block 11 calculating the reciprocal,

35 сумматор 12, регистр 13, умножитель 14, блок 15 изменени знака числа, регистр 16, причем вход 5.1 соединен с первым входом сумматора 12 и входом блока 9 вычислени квадратного корн , выход которого соеди40 нен с входом блока 11 вычислени обратной35 adder 12, register 13, multiplier 14, block 15 for changing the sign of the number, register 16, with input 5.1 connected to the first input of adder 12 and the input of block 9 for calculating the square root, the output of which is connected to input of block 11 for calculating

величины и первым входом мультиплексора 10; второй вход мультиплексора 10 подключен к выходу сумматора 12, а выход мультиплексора 10 соединен с входом регистра 8,magnitude and the first input of the multiplexer 10; the second input of the multiplexer 10 is connected to the output of the adder 12, and the output of the multiplexer 10 is connected to the input of the register 8,

45 выход которого вл етс выходом 5.2 п-й ВЯ5; выход блока 11 вычислени обратной величины соединен с входом регистра 13, выход которого вл етс выходом 5.5 п-й ВЯ5; второй вход сумматора 12 соединен с45, the output of which is the output of the 5.2 pth BS5; the output of the reverse magnitude calculating unit 11 is connected to the input of the register 13, the output of which is the output 5.5 of the nth VL5; the second input of the adder 12 is connected to

50 выходом умножител 14, первый вход которого объединен с входом блока 15 изменению знака числа и вл етс входом 5.4 п-й ВЯ7, а второй вход умножител 14 соединен с выходом блока 15 изменени знака числа50 is an output of a multiplier 14, the first input of which is combined with the input of block 15 by changing the sign of the number and is the input 5.4 of the nth TS7, and the second input of multiplier 14 is connected to the output of block 15 by changing the sign of

55 и входом регистра 16, выход регистра 16 вл етс выходом 5.3 n-й ВЯ5.55 and the input of register 16, the output of register 16 is output 5.3 of the nth VL5.

Вычислительна чейка 7 с номером j (j 1, п-1) содержит (фиг. 3) третий вход 7.1, первый вход 7.2, первый выход 7,3, третий выход 7.4, четвертый вход 7,5, второй выход 7,6, второй вход 7.7, четвертый выход 7.8, мультиплексоры 17-19, умножитель 20, сумматор 21, регистры 22-25 и мультиплексор 26, причем вход 7.2 подключен к первым входам мультиплексоров 17-18, выходы которых соединены с первыми входами умножител 20 и сумматора 21. Второй вход мультиплексора 17 объединен с вторым входом мультиплексора 26 и вл етс входом 7,5 ВЯ7, а второй вход мультиплексора 18 соединен с входом константы О. Первый вход мультиплексора 26 подключен к выходу умножител 20 и второму входу сумматора 21, выход которого соединен с входом регистра 22. Выход регистра 22 вл етс выходом 7.3 ВЯ7. Выход мультиплексора 26 соединен с входом регистра 25, выход которого вл етс выходом 7.8 ВЯ7. Второй вход умножител 20 подключен к выходу мультиплексора 19, первый вход которого объединен с входом регистра 23 и вл етс входом 7.7 ВЯ7, а второй вход мультиплексора 19 объединен с входом регистра 24 и вл етс входом 7.1 ВЯ7. Выходы регистров 23-24 вл ютс соответственно выходами 7.6 и 7.4 ВЯ7.Computational cell 7 with number j (j 1, p-1) contains (Fig. 3) the third input 7.1, the first input 7.2, the first output 7.3, the third output 7.4, the fourth input 7.5, the second output 7.6, second input 7.7, fourth output 7.8, multiplexers 17-19, multiplier 20, adder 21, registers 22-25 and multiplexer 26, with input 7.2 connected to the first inputs of multiplexers 17-18, the outputs of which are connected to the first inputs of multiplier 20 and adder 21 The second input of multiplexer 17 is combined with the second input of multiplexer 26 and is input 7.5 V7, and the second input of multiplexer 18 is connected to the input of constants O. The first input of multiplexer 26 is connected to the output of the multiplier 20 and the second input of the adder 21, whose output is connected to the input of register 22. The output of register 22 is output VYA7 7.3. The output of multiplexer 26 is connected to the input of register 25, the output of which is the output of 7.8 BS7. The second input of the multiplier 20 is connected to the output of multiplexer 19, the first input of which is combined with the input of register 23 and is input 7.7 BS7, and the second input of multiplexer 19 is combined with the input of register 24 and is input 7.1 VA7. The outputs of registers 23-24 are respectively outputs 7.6 and 7.4 of the BS7.

Устройство работает следующим ббра- зом.The device works as follows.

Симметрична положительно определенна матрица может быть единственным способом разложена на множителиSymmetric positive definite matrix can be the only way to factorize

A(°) L.LTA (°) L.LT

(D(D

где L- нижн треугольна матрица.where L is a lower triangular matrix.

Матрица L может быть представлена в видеThe matrix L can be represented as

... .U... .U

(2)(2)

где Е + л j , E - единична матрица; Г| - единичный вектор, вектор fi опре л етс из первого столбца матрицы котора формируетс следующим образом. Если матрицу A™, i 0, п-1 представить в видеwhere Е + л j, E - is the identity matrix; R | - the unit vector, the vector fi is determined from the first column of the matrix which is formed as follows. If the matrix A ™, i 0, p-1 is in the form

Ar, ь ,3,Ar, 3,

Л - -, IJJL - -, IJJ

Ь; В;B; AT;

где Bj - матрица пор дка (n-i-1), то матрица А пор дка (n-i-1) формируетс в соответствии с выражениемwhere Bj is a matrix of order (n-i-1), then matrix A of order (n-i-1) is formed in accordance with the expression

a-ira-ir

аbut

.(4).(four)

Элементы вектора ff определ ютс из выражени The elements of the ff vector are determined from the expression

fi(j) fi (j)

j j

aj aJ:V/2 aj aJ: V / 2

(5)(five)

00

5five

00

5five

00

5five

00

где - j-й элемент первого столбца матрицы А(И).where is the j-th element of the first column of the matrix A (I).

Таким образом, столбцы матрицы L совпадают с векторами ff, т.е. Т ff где и - i-й столбец матрицы L Формирование матрицы L производитс за п итераций. На каждой итерации формируетс один вектор-столбец матрицы L, причем на i-й итерации (I 1, п) формируетс i-й вектоо- столбец матрицы L. Так как i-й вектор-столбец матрицы L содержит (n-i + 1) отличных от нул элементов, то в формировании элементов i-ro вектор-столбца матрицы L участвуют (n-i + 1) вычислительна чейка ВЯ. причем i-й элемент вектор-столбца формирует чейка Вир, а элемент с номером i + j (j 1, n-i) формирует j- чейка ВЯ. Организаци потока данных на входе матрицы чеек ВЯ показана на фиг. 5. Рассмотрим организацию вычислительного процесса на одной из итераций, например первой, в результате которой формируетс первый вектор-столбец матрицы L и матрица . В устройстве реализуетс конвейерный принцип обработки информации. Формирование первого, вектор-столбца начинаетс с ввода в микротакте to в чейку ВЯо элемента а;г матрицы В микротакте to в чейке ВЯП формируютс с помощью блока 9 вычислени кваДратного корн и блока 11 вычислени обратной величины два параметра. Первый параметр. равный ау , вл етс вычисленным значением первой координаты вектор- столбца 1.1 и через мультиплексор 10 и ре (о)Thus, the columns of the matrix L coincide with the vectors ff, i.e. T ff where and is the i-th column of the matrix L. The formation of the matrix L is performed in n iterations. At each iteration, one column vector of the matrix L is formed, and at the ith iteration (I 1, n) the i-th vector column of the matrix L is formed. Since the i-th vector column of the matrix L contains (ni + 1) distinct from zero elements, then (ni + 1) computational cell WI is involved in the formation of the elements of the i-ro vector column of the matrix L. moreover, the i-th element of the column vector forms the cell Vir, and the element with the number i + j (j 1, n-i) forms the jV-cell. The organization of the data stream at the input of the VL cell array is shown in FIG. 5. Consider the organization of the computational process at one of the iterations, for example, the first, as a result of which the first column vector of the matrix L and the matrix is formed. The device implements the conveyor information processing principle. The formation of the first vector column begins with the insertion in the microtot to the cell of the EIT cell of the element a; g of the matrix. The first parameter. equal to ay, is the calculated value of the first coordinate of the vector column 1.1 and through multiplexer 10 and pe (o)

т 1/2t 1/2

гистр 8 передаетс в чейку ЗЯь 5 Второй параметр, равный аhistrrrrrrrrrrr is passed to cell 4

участвует в формировании остальных кооо- динат вектор-столбца 1.1 в соответствии с. выражением (5). Этот параметр с помощью регистра 13 чейки J5 ВЯП и регистров 23 0 чеек 7 ВЯ (j 1, п-1) распростран етс по матрице 6 чеек 7 ВЯЦПараметр а;, в микротакте т (к 1, п 1) поступает на вход 7.7 чейки 7 ВЯ и далее через мультиплексор 19 на вход умножител 20 чейки 7 ВЯ, 5 на второй вход умножител 20 в этот моментparticipates in the formation of the remaining coordinates of the column vector 1.1 in accordance with. expression (5). This parameter with the help of register 13 cells J5 WNP and registers 23 0 cells 7 WN (j 1, n-1) spreads across the matrix 6 cells 7 VNCParameter a ;, in micro-cycle t (to 1, item 1) enters input 7.7 of the cells 7 WN and further through multiplexer 19 to the input of the multiplier 20 cells 7 WN, 5 to the second input of the multiplier 20 at this moment

а(°)/ иотп1Л111.1 д ° преподаетс элемент аa (°) / iotn1L111.1 d ° is taught element a

г 1. g 1.

mi матрицы изведение - 2 через сумматор 21mi matrices - 2 through adder 21

и регистр 22 в следующем микротакте будет передано в чейку 3 ЗЯ|. Мультиплексор 18 в этом микротакте (ti) на второй вход сумматора 21 подключает код О. Это же произведение через мультиплексор 26 поступает на вход регистра 25, Таким образом, в чейке 7 ВЯ в микротакте ti информационные потоки коммутируютс следующим обра- зом:and register 22 in the next microtack will be transferred to the cell 3 ЗЯ |. The multiplexer 18 in this micro-clock (ti) connects the O code to the second input of the adder 21. The same product through the multiplexer 26 enters the input of the register 25, Thus, in cell 7 of the VY in the micro-clock ti, the information flows are switched as follows:

Выход мультиплексора 19 вход 7.7 Выход мультиплексора О Выход мультиплексора 26 -выход умножител 19 Выход мультиплексора 17 « вход 7.2 В остальных микротактах на выходы мультиплексоров чейки ВЯ коммутируютс вторые входы мультиплексоров. В чейке 5 ВЯП на выход мультиплексора 10 в микро- такте to подключаетс блок 9 вычислени квадратного корн . Управление мультиплексора чеек ВЯ| (I 1, п) производитс с помощью управл ющих битов, которые сопровождают элементы матриц A(i). Multiplexer output 19 input 7.7 Multiplexer output O Multiplexer output 26 multiplexer output 19 Multiplexer output 17 "input 7.2 In the remaining micro-tacts, the second multiplexer inputs are switched to the multiplexer outputs of the VY cell. In the VNP cell 5, the square root calculating unit 9 is connected to the output of the multiplexer 10 in the micro clock cycle to. Control multiplexer cells VYA | (I 1, p) is performed using the control bits that accompany the elements of the matrices A (i).

Таким образом на первых п микротактах (IK, к 0, n-Ц) формируютс элементы вектор-столбца 1.1. Элементы матрицы А 1 формируютс начина с микротакта ta по столбцам, причем формируютс только эле- менты нижнего треугольника и главной диагонали , так как матрица А симметрична . Элементы m-ro вектор-столбца матрицы А начинают формироватьс с микротакта t2(m-i), причем в j-й (j 1, ) чейке 7 формируетс (J + гп)-й элемент m-ro вектор- столбца матрицы . Из выражени (5) имеемThus, in the first n microtacs (IK, k 0, n-C), the elements of the column vector 1.1 are formed. The elements of the matrix A 1 are formed starting with the microtac ta in columns, and only the elements of the lower triangle and the main diagonal are formed, since matrix A is symmetric. The elements of the m-ro column vector of the matrix A begin to form from the microtact of t2 (m-i), and in the j-th (j 1,) cell 7 the (J + rn) -th element of the m-ro column vector of the matrix is formed. From the expression (5) we have

А, b{o )kj-l(1)kil(o)ji, ,п-1,(6), 35A, b {o) kj-l (1) kil (o) ji,, n-1, (6), 35

где 1к1 {к 1, п) определ етс из (5);where 1k1 (k 1, p) is determined from (5);

tr° kj - элементы матрицы в клеточном представлении (3).tr ° kj - elements of the matrix in the cellular representation (3).

Элементы вектор-столб 1.1, участвую- щие в вычислени х по формуле (6), распростран ютс по матрице 6 вычислительных чеек 7 в двух направлени х. Конвейер, образованный мультиплексорами 26 и регистрами 25, предназначен дл передачи элементов вектор-столбца 1.1 справо налево до чейки 5. Вычислительной чейке 5 с помощью блока 15 изменени знака числа производитс умножение каждого элемента вектор-столбца 1.1 на -1, и элементы вектор- столбца -1.1 пересылаютс в вычислительные чейки 7 по конвейеру, образованному регистром 16 чейки 5 и регистрами 24 чеек 7. В каждой вычислительной чейке 7 через микротакт с помощью умножител 20 и сум- матора 21 производ тс вычислени в соответствии с выражением (6).The elements of the vector-column 1.1 involved in the calculations by formula (6) are distributed along the matrix 6 of the computational cells 7 in two directions. The conveyor formed by multiplexers 26 and registers 25 is designed to transfer the elements of the column vector 1.1 from right to the left to the cell 5. Computing cell 5 using the block 15 changing the sign of the number multiplies each element of the column vector 1.1 by -1, and the column vector elements -1.1 are sent to the computational cells 7 via a pipeline formed by the register 16, the cells 5 and the registers 24 of the cells 7. In each computational cell 7, micro-tact is performed using the multiplier 20 and the adder 21 in the computational cell. em (6).

Элементы r1 ji поступают на вход умножител 20 каждой вычислительной чейки 7 с входа 7.1 через мультиплексор 19, а элементы - с входа 7.5 через мультиплексор 17. На сумматоре 21 каждой вычислительной чейки 7 формируетс элемент а , который через регистр 22 передаетс в соответствующую запоминающую чейку 3. Временна диаграмма, по сн юща организацию потоков данных в матрице 6 вычислительных чеек 7, приведена на фиг. 5 дл случа 4. На остальных итераци х процессор работает аналогичным образом. Отличие заключаетс в том, что на каждой следующей итерации в работу включаетс число вычислительных чеек на единицу меньше, чем на предыдущей итерации (права вычислительна чейка, участвующа в данной итерации в следующей участие не принимает).Elements r1 ji are fed to the input of multiplier 20 of each computational cell 7 from input 7.1 through multiplexer 19, and the elements from input 7.5 through multiplexer 17. At the adder 21 of each computational cell 7, an element a is formed, which is transmitted through register 22 to the corresponding storage cell 3 A timing diagram explaining the organization of the data streams in the matrix 6 of the computational cells 7 is shown in FIG. 5 for case 4. At the remaining iterations, the processor operates in the same way. The difference is that at each subsequent iteration, the number of computational cells per unit less than at the previous iteration is included in the work (the rights of the computational cell participating in this iteration do not take part in the next iteration).

Кажда запоминающа чейка 3 представл ет собой дуальный буфер ОЗУ, который позвол ет на фоне разложени текущей матрицы загружать в матрицу 2 запоминающих чеек 3 следующую матрицу через шину 1. Одновременно с загрузкой следующей матрицы из матрицы 2 запоминающих чеек 3 вывод тс элементы матрицы L, полученной в результате разложени предыдущей матрицы. Вывод производитс через выход 4.Each storage cell 3 is a dual RAM buffer which, against the background decomposition of the current matrix, loads the next matrix via bus 1 into the matrix 2 of the storage cells 3. Simultaneously with the loading of the next matrix from the matrix 2 of the storage cells 3, the elements of the L matrix obtained as a result of decomposition of the previous matrix. The output is via output 4.

Таким образом, введением в известное устройство дополнительной вычислительной чейки, новых архитектурных решений вычислительных чеек, объединенных в матрицу , и особой организации вычислительного процесса достигаетс расширение функциональных возможностей за счет решени задачи разложени Холецкого симметричных матриц.Thus, by introducing into the known device an additional computational cell, new architectural solutions of computational cells combined into a matrix, and a special organization of the computational process, the functionality is expanded by solving the problem of decomposing the Cholesky symmetric matrices.

Claims

Claim {. A device for matrix operations containing n − 1 computing blocks (n is the dimension of the matrices being processed) and n memory blocks, the first information input and output of the j-ro memory block (j 2, n) are connected respectively to the first output and information input of the (j - 1) -th computing unit, the second output of the i-ro (i 1, n-1) memory block is connected to the second information input (1 + 1) of the memory block, the second information input the first and second output of the nth memory block are respectively the information input and output m of the device, the second j / i third outputs to-ro memory block (1, p-2) are connected respectively to the second and third information inputs of the (C + 1) th computing unit, characterized in that, in order to expand the functional capabilities due to decomposition of Cholesky matrices, the nth computing unit is entered into it, the first information input and output of which are connected respectively to the first output and information input of the first memory block, the second and third outputs of the nth computing block are connected respectively to the second and third in ormatsionnym inputs of the first computing unit, the fourth information input of the k-th calculating unit is connected to the fourth output (K + 1) -th calculating unit, fourth output of the first computing unit - to the second data input of the n-th calculating unit.

2. The device according to claim 1, characterized by the fact that the i-th computing unit contains a multiplier, adder, four registers and four multiplexers, the first information inputs of the first and second multiplexers combined and connected to the first information input of the computing unit and the outputs of the first and second multiplexers to the first inputs of the multiplier and adder, respectively, the second information input of the first multiplexer is connected to the second information input of the third multiplexer and is the fourth information input The second input of the second multiplexer is with the device input zero, the output of the third multiplexer is connected to the information input of the first register whose output is the fourth output of the computing unit, the first information input of the third multiplexer is connected to the output of the multiplier and the second input of the adder, the output which is connected to the information input of the second register, the output of which is the first output of the computing unit, the second

R

the multiplier input is connected to the fourth multiplexer output, the first and second information inputs of which are connected respectively to the information

the inputs of the third and fourth registers, the outputs of which are, respectively, the second and third outputs of the computing unit, the second and third information inputs of which are

informational inputs of the third and fourth registers, respectively.

3. The device according to claim 1, characterized in that the nth computing unit contains a square root computing unit,

a node for calculating the reciprocal of a number, three registers, a multiplier, an adder, a multiplexer, a node for changing the sign of a number, the first information input of the computing unit being connected to the first

the input of the adder and the input of the square root computation node whose output is connected to the input of the calculating node of the reciprocal of the number whose output is connected to the information input of the first register; informational inputs of which are connected respectively to the outputs of the square root computing unit and the adder, the second input of which is connected to the output of the multiplier, the first input of which pogo connected to the input of the node changing the sign of the number and

the information input of the third register, the output of which is the third output of the nth computational unit, the second output of which is connected to the output of the first register, the second input of the nth computational

block is connected to the input of the node changing the sign and the second input of the multiplier.

"

5.1

ЈY

Fig 2.

.7

/

G8

1.g

.E

is - °

1.6

W

15

Ol

«R "-.

dbf-i