RU1833892C

RU1833892C - Computational node of device for solving equations in partial derivatives

Info

Publication number: RU1833892C
Application number: SU914933357A
Authority: RU
Inventors: Леонид Григорьевич Козлов
Original assignee: Институт кибернетики им.В.М.Глушкова
Priority date: 1991-05-05
Filing date: 1991-05-05
Publication date: 1993-08-15

Abstract

Изобретение относитс к области вычислительной техники и может быть использовано при построении специализированных и проблемно-ориентированных процессоров дл решени дифференциальных уравнений в частных производных эллиптического типа. Устройство содержит многовходовый сумматор, регистр сдвига, элемент И, четыре группы элементов задержки и группу элементов И. Предназначено дл решени трехмерных задач математической физики, отличаетс от из- вестных высоким быстродействием и точностью решени и позвол ет сократить Количество узлов цифровой сетки. 2 ил.The invention relates to the field of computer engineering and can be used in the construction of specialized and problem-oriented processors for solving elliptic partial differential equations. The device contains a multi-input adder, a shift register, an element AND, four groups of delay elements and a group of elements I. Designed to solve three-dimensional problems of mathematical physics, it differs from the known ones by its high speed and accuracy of solution and allows to reduce the number of nodes of the digital grid. 2 ill.

Description

слcl

сwith

Изобретение относитс к области вычислительной техники и может быть использовано при построении цифровых сеток и процессоров дл решени задач математической физики.The invention relates to the field of computer engineering and can be used in the construction of digital grids and processors for solving problems of mathematical physics.

Цель изобретени - повышение точности вычислени .The purpose of the invention is to increase the accuracy of calculations.

На фиг. 1 представлена схема предлагаемого узла.In FIG. 1 shows a diagram of the proposed site.

Он содержит многовходовый сумматор 1, регистр сдвига 2, элемент И 3, группу элементов И 4, элементы задержки 5-8, перва 9, втора 10 и треть 11 группы информационных входов, первый 12, второй 13 и третий 14 управл ющие входы, параллельный 15 и последовательный 16 выходы результата узла.It contains a multi-input adder 1, shift register 2, element And 3, group of elements And 4, delay elements 5-8, first 9, second 10 and third 11 of the group of information inputs, the first 12, second 13 and third 14 control inputs, parallel 15 and serial 16 outputs the result of the node.

Перва группа 9 информационных входов узла соединена с первой группой входов многовходового сумматора 1, выход которого соединен с информационным входом регистра сдвига 2, информационный выход которого соединен через элементы И группы 4 с параллельным выходе. результата 15 узла, последовательный выход результата 16 которого соединен через элемент И 3 с последовательным выходом регистра сдвига 2. Третий 14, второй 13 и первый 12 управл ющие входы узла соединены соответственно с входами управлени сдвигом регистра сдвига 2, вторым входом элемента И.З и вторыми входами элементов И группы 4. Втора группа информационных входов 10 узла соединена через элементы 5The first group of 9 information inputs of the node is connected to the first group of inputs of the multi-input adder 1, the output of which is connected to the information input of the shift register 2, the information output of which is connected through the AND elements of group 4 with a parallel output. the result of the node 15, the serial output of the result of 16 of which is connected through the And 3 element to the serial output of the shift register 2. The third 14, second 13 and first 12 control inputs of the node are connected respectively to the inputs of the shift register control of the shift 2, the second input of the element I.Z and the second inputs of elements And group 4. The second group of information inputs 10 of the node is connected through elements 5

0000

со ы со о юs s s s u

задержки первой группы с второй группой входов многовходового сумматора 1 и входами элементов 6 задержки второй группы, выходи которых соединены с третьей группой входов многовходового сумматора 1 и входами элементов 7 задержки третьей группы, выходы которых соединены с четвертой группой входов многовходового сумматора 1, п та группа входов которого соединена с третьей группой информационных входов 11 узла и входами элементов 8 задержки четвертой группы, выходы которых соединены с шестой группой входов многовходового сумматора 1. delays of the first group with the second group of inputs of the multi-input adder 1 and inputs of elements 6 of the delay of the second group, the outputs of which are connected to the third group of inputs of the multi-input adder 1 and the inputs of elements 7 of the delay of the third group, the outputs of which are connected to the fourth group of inputs of the multi-input adder 1, fifth group the inputs of which are connected to the third group of information inputs of the node 11 and the inputs of the delay elements 8 of the fourth group, the outputs of which are connected to the sixth group of inputs of the multi-input adder 1.

На первую 9 группу .информационных входов узла подаетс последовательный код восьми (дл уравнени Лапласа) или дев ти (дл уравнени Пуассона) чисел младшими разр дами вперед. На вторую 10 и третью 11 группы информационных входов узла подаетс последовательный код соответственно шести и двенадцати чисел млад- шими разр дами вперед. Каждый очередной разр д этих чисел поступает на входы многовходового сумматора 1, на выходе которого формируетс результат суммировани . Переносы в следующие разр ды запоминаютс на соответствующих элементах пам ти (не показаны) многовходового сумматора 1 и используютс в следующих тактах при суммировании следующих (1+1 и т.д.) разр дов чисел. При решении уравнени Пуассонана-один из информационных входов 9 первой группы подаетс последовательный код значени - 60h Fijk, где Fyk - значение первой части уравнени . На остальные информационные входы 9 первой группы подаютс (синхронно с разр дами правой части) разр ды значений (Ui± 1, J±i,k± 1) восьми ближайших угловых в двух соседних (i± 1) сло х узлов объемной сетки. На вторую 10 группу информационных входов подаютс последовательным кодом разр ды значений (Ui, ±1, k± 1: Ui ±1, j, k± 1; Ui ±1. j ±i.k) двенадцати ближайших по диагонал м в i-ом слое и двух соседних (I ±1) сло х узлов объемной сетки. На третью 11 группу информационных входов узла подаютс последовательным кодом разр ды значений .(Ui± 1. j, k; Ui. j± 1, k; Ui,j.k± О с последовательных выходов результата 16 соседних узлов по слою (i) столбцу 0) и строке (k) объемной цифровой модели-сетки (фиг.2). Эти значени чисел проход т через элементы задержки 5-8, на которых осуществл етс задержка на один такт - врем обработки одного разр да. В результате этого числа, поступающие на вторую 10 и третью 11 группы информационных входовTo the first 9 group of information inputs of the node, a sequential code of eight (for the Laplace equation) or nine (for the Poisson equation) numbers of lower order bits is supplied. The second 10 and third 11 groups of information inputs of the node are supplied with a sequential code of six and twelve numbers, respectively, in the lower order bits. Each successive bit of these numbers goes to the inputs of the multi-input adder 1, at the output of which the summation result is generated. The transfers to the next bits are stored on the corresponding memory elements (not shown) of the multi-input adder 1 and are used in the following clock cycles when summing the next (1 + 1, etc.) bit bits. When solving the Poissonan equation, one of the information inputs 9 of the first group is supplied with a serial value code - 60h Fijk, where Fyk is the value of the first part of the equation. The remaining information inputs 9 of the first group are supplied (synchronously with the bits of the right-hand side) the bits of the values (Ui ± 1, J ± i, k ± 1) of the eight nearest corner in two adjacent (i ± 1) layers of the nodes of the three-dimensional grid. The second 10 group of information inputs is supplied with a sequential code of a bit of values (Ui, ± 1, k ± 1: Ui ± 1, j, k ± 1; Ui ± 1. J ± ik) of the twelve nearest diagonally in the i-th layer and two adjacent (I ± 1) layers of the nodes of the three-dimensional grid. The third 11 group of information inputs of the node is supplied with a serial code of a bit of values. (Ui ± 1. j, k; Ui. J ± 1, k; Ui, jk ± O from the sequential outputs of the result of 16 neighboring nodes along layer (i) column 0 ) and row (k) of the 3D digital mesh model (FIG. 2). These numbers pass through delay elements 5-8, which delay by one clock cycle — processing time of one bit. As a result of this number, incoming to the second 10 and third 11 groups of information inputs

узла умножаютс на коэффициенты четырнадцать и три соответственно. Задержка на один такт эквивалентна умножению на .коэффициент два, так числа, поступающие наnodes are multiplied by fourteen and three factors, respectively. A delay of one clock cycle is equivalent to multiplying by a factor of two, so the numbers coming in

третью группу информационных входов 11, подаютс на группы входов многовходового сумматора 1 непосредственно с коэффициентом единица и, пройд через элементы задержки 8, с. коэффициентом два, т.е. суммарный коэффициент равен трем.the third group of information inputs 11 are fed to the input groups of the multi-input adder 1 directly with a coefficient of unity and, having passed through the delay elements 8, s. by a factor of two, i.e. the total coefficient is three.

Дл оперативного ввода значени - 6Qh Fijk во все узлы цифровой сетки и повышени быстродействи это значение может быть предварительно занесено параллельным кодом.на предусмотренный дл этой цели регистр, на выходе которого под воздействием управл ющих сигналов формируетс последовательный код значени - 60h Fijk на каждой итерации решени .In order to promptly enter the value - 6Qh Fijk into all nodes of the digital grid and increase the speed, this value can be preliminarily entered in parallel code. A register is provided for this purpose, at the output of which, under the influence of control signals, a serial code of the value - 60h Fijk is generated at each iteration of the solution .

Результат суммировани каждого разр да чисел, полученный на выходе многовходового сумматора 1, заноситс в регистр сдвига 2. Код значени искомой функции Uijk на предыдущей r-ой итерации, содержащийс в-регистре сдвига 2, сдвигаетс на каждом такте под управлением сигнала, по- ступающего натретий 14 управл ющий вход узла и выдаетс на последовательный выход 16 результата через элемент И 3 и на параллельный выход 15 результата через элементы И группы 4.- Элемент И 3 открываетс сигналом, поступающим на второй 13 управл ющий вход в течение всех п тактов (где п - разр дность чисел). После прохождени пThe result of the summation of each bit of numbers obtained at the output of the multi-input adder 1 is entered into shift register 2. The code of the value of the desired function Uijk at the previous rth iteration, contained in shift register 2, is shifted at each clock cycle under the control of the signal received The third input 14 is the control input of the node and is output to the serial output 16 of the result via the And 3 element and to the parallel output 15 of the result through the And elements of group 4. - The And 3 element is opened by a signal supplied to the second 13 control input during all fifth kt (where n is the bit depth of numbers). After passing n

тактов элемент И 3 закрываетс и в последующих семи тактах в узле осуществл етс сдвиг регистра сдвига 2 на семь разр дов. При этом, происходит обработка семи старших разр дов чисел, поступающих с последовательных выходов 16 результата соседних узлов за счет их прохождени в течение от одного до семи тактов через элементы задержки 5-8. В течение (п+7) тактов в регистр сдвига 2 формируетс код новогоthe clock element And 3 is closed and in the next seven clock cycles, the shift register 2 is shifted by seven bits in the node. At the same time, seven high order bits of the numbers coming from the serial outputs 16 of the result of neighboring nodes are processed due to their passage for one to seven clock cycles through delay elements 5-8. During (n + 7) clock cycles, a new code is generated in shift register 2

приближенного значени искомого решени :the approximate value of the desired solution:

- -

Ч 128Ch 128

00

fu.zfu,.fu.zfu ,.

L ;,j,xV - О НЁКт.ХГм.L;, j, xV - О НЕКт.ХГм.

+S(..m)+ S (.. m)

of.of.

()()

Jil.K Jil.k

(N M.kt,)(N M.kt,)

HJ..Hj ..

+ U;t i.it,k) + U; t i.it, k)

j4j4

что и требуетс дл реализации зависимо55 сти Ј(iX;lx,;iu.which is required to realize the dependence Ј (iX; lx,; iu.

,(СлХ-1,л,Х,л,Л uVCu,,.,,k,,,,k,., (СЛХ-1, l, X, l, Л uVCu ,,. ,, k ,,,, k ,.

iH,,,x.,,0 i-i-diC1 нЛ1 tii(rl«lЈ8 u i l.jii.fcil uin,J4(,.i.,j-i,iH ,,, x. ,, 0 i-i-diC1 nL1 tii (rl «lЈ8 u i l.jii.fcil uin, J4 (,. i., j-i,

uvuv

()()

(rl(rl

MM

,1-1л..,1,),( + 1-(о(Л- 4, 1-1l .., 1,), (+ 1- (о (Л- 4

,,i,,,,()-ih4ik, ,, i ,,,, () - ih4ik,

котора обеспечивает аппроксимацию дифференциального оператора Лапласа дл трехмерного уравнени Лапласа с погрешностью до Ь , а дл уравнени Пуассона с погрешностью до h6 или h4, в зависимости от функции правой части F..which provides an approximation of the Laplace differential operator for the three-dimensional Laplace equation with an error of up to b, and for the Poisson equation with an error of up to h6 or h4, depending on the function of the right-hand side of F ..

Последующие итерации осуществл ютс аналогично и, когда решение получено, при сравнении результатов на двух последующих итераци х, на первый 12 управл ю- цшА вход поступает сигнал, который открывает группу 4 элементов И, через кото рую результат решени выдаетс из регистра сдвига 2 на параллельный выход 15 результата узла.Subsequent iterations are carried out similarly and, when comparing the results at the next two iterations, the first 12 control inputs receive a signal that opens a group of 4 elements AND, through which the decision result is output from shift register 2 to parallel output 15 of the node result.

По сравнению с прототипом, в предлагаемом устройстве обеспечиваетс более высока точность решени , поскольку погрешность аппроксимациидифференциаль- ного оператора дл уравнени Лапласа в нем определ етс восьмым пор дком степени шага дискретизации (h ), а в прототипе - вторым пор дком (h ). Поскольку при pea; лизации устройства цифровой сетки дл ре- шени трехмерного уравнени Лапласа или Пуассона необходимое количество узлов равно 1 /h , то использование предлагаемого узла дл построени цифровой сетки позвол ет сократить аппаратурные затраты сетки за счет существенного уменьшени количества узлов, поскольку шаг дискретизации (h) дл заданной точности решени может быть значительно увеличен по сравнению с вариантом прототипа. Так, напри- мер, если требуетс получить решение уравнени Лапласа с точностью (5 10 , то шаг дл предлагаемого узла равен 0,1, а дл прототипа - 0,0001. Тогда цифрова сетка должна содержать дл предлагаемого устройства 1000 узлов или 10 узлов - прототипа . Это обусловлено тем, что точность решени в предлагаемом устройстве в (1 /h ) раз выше точности прототипа. Кроме того, существенно сокращаетс количество ите- раций дл достижени решени , поскольку количество итераций пропорциональноCompared with the prototype, the proposed device provides a higher accuracy of the solution, since the error of approximating the differential operator for the Laplace equation in it is determined by the eighth order of degree of sampling step (h), and in the prototype by the second order (h). Since at pea; If a digital grid device is used to solve the three-dimensional Laplace or Poisson equation, the required number of nodes is 1 / h, then the use of the proposed node for constructing a digital grid reduces the hardware costs of the grid by significantly reducing the number of nodes, since the sampling step (h) for a given the accuracy of the solution can be significantly increased compared to the prototype. So, for example, if you want to obtain a solution of the Laplace equation with accuracy (5 10, then the step for the proposed node is 0.1, and for the prototype 0.0001. Then the digital grid should contain 1000 nodes or 10 nodes for the proposed device - This is due to the fact that the accuracy of the solution in the proposed device is (1 / h) times higher than the accuracy of the prototype.In addition, the number of iterations to achieve a solution is significantly reduced, since the number of iterations is proportional

квадрату числа узлов, т.е. повышаетс быстродействие устройства в 1 /h6 раз.squared number of nodes, i.e. increases the speed of the device by 1 / h6 times.

Формула (1) аппроксимирует дифференциальный оператор исходного уровн Лапласа с п огрешностью 0(h8), что провер етс путем разложени дифференциального оператора в узлах объемной матрицы в р д Тейлора и суммирование их с коэффициентами по формуле (1).Formula (1) approximates the differential operator of the initial Laplace level with the error 0 (h8), which is verified by expanding the differential operator at the nodes of the volume matrix in the Taylor series and summing them with the coefficients by formula (1).

Claims

SUMMARY OF THE INVENTION A computing node of a partial differential equation solving apparatus comprising a multi-input adder, a shift register, an element And, and a tuple of elements And, wherein the first group of information inputs is connected to the first group of inputs of the multi-input adder, the information output of the shift register is connected to the first inputs of AND groups, second inputs and outputs, which are connected respectively with the first control input of the node and with the parallel output of the node result, the serial output of the shift register ha is connected to the first input of AND, the second input and output of which is connected respectively to the second control input of the node and the serial output of the result of the node, the third control input of the node is connected to the shift control input of the shift register, characterized in that, in order to increase the accuracy of calculation , it contains four groups of delay elements, and the second group of information inputs of the node is connected through the delay elements of the first group to the second group of inputs of the multi-input adder and the inputs of the delay elements the second group, the outputs of which are connected to the third group of inputs of the multi-input adder and the inputs of the delay elements of the third group, the outputs. which are connected to the fourth group of inputs of the multi-input adder, the fifth group of inputs of the ketoroid are connected to the third group of information inputs of the node and the inputs of the delay elements of the fourth, group, the outputs of which are connected to the sixth group of inputs of a multi-input adder, the output of which is connected to the information input of the shift register.

W5 Inputs

fl u VUVVOIu G,

sum g / I

7 1 Block Ex. S / jjjK Multiplication

TG Registerf X

l v c ..

11 - Exit

&

Fig /

elements, support L

5

Hh &

.-hedgehog

/ „T

Multi-way adder

I - Register 3-Zlep8nti

/ J-

WITH..

11 - Exit

l

/ Group

x I elemen yu8 And J J

Y Element And Exit

10

eleven

f

du

/4

12

. #

/ Influenza in Elento &