RU2486576C1

RU2486576C1 - Homogeneous computing environment for conveyor calculations of sum of m-n-digit numbers

Info

Publication number: RU2486576C1
Application number: RU2012114964/08A
Authority: RU
Inventors: Владимир Сергеевич Князьков; Илья Петрович Осинин
Priority date: 2012-04-17
Filing date: 2012-04-17
Publication date: 2013-06-27

Abstract

FIELD: information technologies.

SUBSTANCE: homogeneous computing environment provides for parallel conveyor summation of m n-digit operands and comprises identical cells made of 2 double-input elements AND, 2 double-input elements EXCLUSIVE OR, a double-input element OR, an element NOT, 3 information triggers, at the same time the number of columns in the homogeneous computing environment is equal to p, where p=log₂m, and the number of cells in the j column is equal to m/2^j.

EFFECT: higher efficiency due to parallel-conveyor separation of bits of transfers of a digit cut of summands into the following digit cut and generation of digits of an unknown sum at every stage of device operation.

4 dwg

Description

Изобретение относится к вычислительной технике и предназначено для построения однородных вычислительных сред, выполняющих функцию суммирования m n-разрядных операндов путем параллельно-конвейерного выделения бит переносов разрядного среза слагаемых в следующий разрядный срез и формирования разрядов искомой суммы.The invention relates to computer technology and is intended to build a homogeneous computing environment that performs the function of summing m n-bit operands by parallel-conveyor allocation of bits of the bits of the bits of the slice of the terms to the next bit slice and the formation of bits of the desired amount.

Однородной вычислительной средой называется регулярная структура, состоящая из соединенных друг с другом одинаковых ячеек, выполняющая определенную функцию.A homogeneous computing environment is a regular structure consisting of identical cells connected to each other, performing a certain function.

Ячейка однородной вычислительной среды - элемент регулярной структуры.A homogeneous computing environment cell is an element of a regular structure.

Разрядный срез - совокупность бит i-й позиции от m участвующих в операции операндов.A bit slice is a set of bits of the i-th position from m operands participating in the operation.

Известно устройство для суммирования одноразрядных чисел - авторское свидетельство SU 1023922 A1 от 27.06.2000. Устройство содержит двоичный суммирующий блок, который содержит группы одноразрядных двоичных сумматоров, причем одноразрядные двоичные сумматоры каждой группы образуют пирамиду одноразрядных двоичных сумматоров, входы одноразрядных двоичных сумматоров первой группы соединены со входами первой группы и первым входом второй группы блока, входы одноразрядных сумматоров каждой последующей группы соединены с выходами переноса одноразрядных двоичных сумматоров предыдущей группы и соответствующим входом второй группы блока, выходы каждой группы одноразрядных двоичных сумматоров соединены с соответствующими выходами блока. Недостаток устройства состоит в том, что оно обрабатывает числа в двоично-десятичной системе, а также не реализован конвейерный принцип обработки информации, что существенно снижает быстродействие устройства.A device for summing single-digit numbers is known - copyright certificate SU 1023922 A1 dated 06/27/2000. The device comprises a binary adder block that contains groups of one-bit binary adders, wherein the one-bit binary adders of each group form a pyramid of one-bit binary adders, the inputs of the one-bit binary adders of the first group are connected to the inputs of the first group and the first input of the second group of the unit, the inputs of the one-bit adders of each subsequent group are connected with the transfer outputs of one-bit binary adders of the previous group and the corresponding input of the second group of the block, outputs ka Each group of one-bit binary adders is connected to the corresponding outputs of the block. The disadvantage of this device is that it processes numbers in a binary decimal system, and the pipeline principle of processing information is not implemented, which significantly reduces the speed of the device.

Известно техническое решение параллельного асинхронного сумматора, запатентованное в качестве изобретения - патент RU 2097826 C1 от 27.11.1997. Устройство содержит: n блоков 6₁-6_n параллельной обработки разрядных срезов, n-1 формирователей 7₁-7_n-1 импульсов, запускающий формирователь 7₀ импульсов, элемент 8 ИЛИ-НЕ, каждый блок 6 содержит 4 ключа 1-4, имеющих выходы с высокоимпедансным состоянием, и арифметический полусумматор 5. Недостаток состоит в том, что в устройстве не реализован конвейерный принцип обработки информации, что существенно снижает быстродействие устройства.A technical solution is known for a parallel asynchronous adder, patented as an invention - patent RU 2097826 C1 of 11.27.1997. The device contains: n blocks 6 ₁ -6 _n parallel processing of bit slices, n-1 shapers 7 ₁ -7 _n-1 pulses, starting shaper 7 ₀ pulses, element 8 OR NOT, each block 6 contains 4 keys 1-4, having outputs with a high impedance state, and an arithmetic half-adder 5. The disadvantage is that the device does not implement the conveyor principle of information processing, which significantly reduces the speed of the device.

Наиболее близкое к заявляемому решению является техническое решение устройства, построенное на базе пирамидального способа суммирования чисел - сдваивания (http://parallel.ru/fpga/Summ2/SummColamo.html#pirsum). Являясь одним из наиболее быстродействующих устройств суммирования m операндов, устройство обладает значительными аппаратурными затратами, что приводит к снижению надежности, при этом не реализован конвейерный принцип обработки информации, что в еще большей степени повысило бы его быстродействие.Closest to the claimed solution is the technical solution of the device, built on the basis of the pyramidal method of summing numbers - doubling (http://parallel.ru/fpga/Summ2/SummColamo.html#pirsum). Being one of the fastest devices for summing m operands, the device has significant hardware costs, which leads to a decrease in reliability, while the conveyor principle of information processing has not been implemented, which would further increase its speed.

Техническим результатом от использования устройства для конвейерных вычислений суммы m n-разрядных чисел является повышение быстродействия за счет параллельно-конвейерного выделения бит переносов разрядного среза слагаемых в следующий разрядный срез и формирования разрядов искомой суммы в каждом такте работы устройства. Также при увеличении количества участвующих слагаемых количество ячеек однородной вычислительной среды возрастает линейно.The technical result of using the device for pipelining calculations of the sum of m n-bit numbers is to increase performance by parallel-pipelining the allocation of bits of the bits of the bits of the terms cut into the next bit section and the formation of bits of the desired amount in each cycle of the device. Also, as the number of participating terms increases, the number of cells in a homogeneous computing environment increases linearly.

Описание технического решения однородной вычислительной среды для конвейерных вычислений суммы m n-разрядных чисел. Заявляемое устройство состоит из m-1 ячеек однородной вычислительной среды, которые объединены регулярными связями.Description of the technical solution of a homogeneous computing environment for pipeline computing the sum of m n-bit numbers. The inventive device consists of m-1 cells of a homogeneous computing environment, which are combined by regular links.

Ячейка однородной вычислительной среды состоит из двух двухвходовых элементов И, двух двухвходовых элементов ИСКЛЮЧАЮЩЕЕ ИЛИ, одного двухвходового элемента ИЛИ, одного элемента НЕ, трех информационных триггеров.A homogeneous computing environment cell consists of two two-input AND elements, two two-input EXCLUSIVE OR elements, one two-input OR element, one NOT element, three information triggers.

Вход синхронизации ячейки соединен с входами синхронизации первого, второго триггера и элемента НЕ, выход которого соединен с входом синхронизации третьего триггера. Первый и второй информационные входы ячейки подключены соответственно к первому и второму входу элемента И, выход которого подключен к второму входу элемента ИЛИ. Первый и второй информационные входы ячейки подключены соответственно к первому и второму входу первого элемента ИСКЛЮЧАЮЩЕЕ ИЛИ, выход которого подключен к первому входу второго элемента ИСКЛЮЧАЮЩЕЕ ИЛИ и первому входу второго элемента И, выход которого подключен к первому входу элемента ИЛИ, выход которого подключен к информационному входу второго триггера, выход которого подключен к информационному входу третьего триггера, выход которого подключен к второму входу второго элемента И и второму входу второго элемента ИСКЛЮЧАЮЩЕЕ ИЛИ, выход которого подключен к информационному входу первого триггера, выход которого является информационным выходом ячейки.The cell synchronization input is connected to the synchronization inputs of the first, second trigger and the element NOT, the output of which is connected to the synchronization input of the third trigger. The first and second information inputs of the cell are connected respectively to the first and second input of the AND element, the output of which is connected to the second input of the OR element. The first and second information inputs of the cell are connected respectively to the first and second input of the first exclusive OR element, the output of which is connected to the first input of the second exclusive OR element and the first input of the second AND element, the output of which is connected to the first input of the OR element, the output of which is connected to the information input the second trigger, the output of which is connected to the information input of the third trigger, the output of which is connected to the second input of the second AND element and the second input of the second element EXCLUSIVE OR, the output of which is connected to the information input of the first trigger, the output of which is the information output of the cell.

На фиг.1 приведена функциональная схема ячейки однородной структуры, где 1, 2 - информационные входы; 3 - вход синхронизации; 4, 6 - элементы ИСКЛЮЧАЮЩЕЕ ИЛИ; 5, 7 - элементы И; 9, 10, 13 - информационные триггеры; 12 - информационный выход. Ячейка может быть реализована на любой элементной базе, имеющей возможность представления булевых функций, в том числе: сверхбольшие интегральные схемы и программируемые логические интегральные схемы (ПЛИС). Вариант реализации на языке программирования аппаратуры VHDL приведен на фиг.2.Figure 1 shows the functional diagram of a cell of a homogeneous structure, where 1, 2 are information inputs; 3 - synchronization input; 4, 6 - elements EXCLUSIVE OR; 5, 7 - elements And; 9, 10, 13 - information triggers; 12 - information output. A cell can be implemented on any element base that has the ability to represent Boolean functions, including: extra-large integrated circuits and programmable logic integrated circuits (FPGAs). An implementation option in the programming language of the equipment VHDL shown in figure 2.

В заявляемом техническом решении однородная вычислительная среда представляет собой совокупность ячеек, которые объединены следующим образом.In the claimed technical solution, a homogeneous computing environment is a collection of cells that are combined as follows.

Вход синхронизации однородной вычислительной среды соединен с входами синхронизации всех ячеек. Массив исходных m n-разрядных двоичных чисел поступает на обработку в виде n двоичных m-мерных векторов, причем m должно быть кратно степени двойки, первый и второй информационные входы ячеек первого столбца однородной вычислительной среды соединены с соответствующими разрядами m-разрядного входного вектора. Информационный выход каждой (i-1, j)-й и (i, j)-й ячейки подключен соответственно к первому и второму информационному входу (i/2, j+1)-й ячейки, причем i∈[2,m/2^j] и i принимает лишь четные значения.The synchronization input of a homogeneous computing environment is connected to the synchronization inputs of all cells. An array of initial m n-bit binary numbers is processed in the form of n binary m-dimensional vectors, where m must be a multiple of a power of two, the first and second information inputs of the cells of the first column of a homogeneous computing environment are connected to the corresponding bits of the m-bit input vector. The information output of each (i-1, j) -th and (i, j) -th cell is connected respectively to the first and second information input of the (i / 2, j + 1) -th cell, and i∈ [2, m / 2 ^j ] and i takes only even values.

Количество ячеек в j-м столбце однородной вычислительной среды равно m/2^j, количество столбцов однородной вычислительной среды равно, где p=log₂m. Информационный выход ячейки последнего столбца однородной вычислительной среды является выходом схемы, с которого снимается результат.The number of cells in the jth column of a homogeneous computing environment is m / 2 ^j , the number of columns in a homogeneous computing environment is, where p = log ₂ m. The information output of the cell of the last column of a homogeneous computing environment is the output of the circuit from which the result is taken.

Описание работы устройстваDevice Description

Устройство обеспечивает параллельно-конвейерное сложение m n-разрядных операндов, где ячейка однородной вычислительной среды (фиг.1) реализует следующую систему логических функций:The device provides parallel-conveyor addition of m n-bit operands, where a cell of a homogeneous computing environment (figure 1) implements the following system of logical functions:

s(t)=a⊕b⊕p(t-1);s (t) = a⊕b⊕p (t-1);

p(t)=(a&b)v(p(t-1)&(a⊕b)),p (t) = (a & b) v (p (t-1) & (a⊕b)),

где a, b - соответственно состояние сигналов на входах 1 и 2 ячейки;where a, b are, respectively, the state of the signals at the inputs 1 and 2 of the cell;

s(t) - состояние сигнала на выходе 12 ячейки;s (t) - signal state at the output of 12 cells;

p(t) - обратная связь сигнала переноса внутри ячейки.p (t) is the feedback of the transfer signal inside the cell.

Каждое i-e двоичное позиционное слагаемое можно представить в виде последовательности бит A_i(a_n,a_n-1,…,a₁), где n - разрядность числа, i∈[1,m]. Тогда m слагаемых можно представить в виде матрицы:Each ie binary positional term can be represented as a sequence of bits A _i (a _n , a _n-1 , ..., a ₁ ), where n is the bit capacity of the number, i∈ [1, m]. Then m terms can be represented as a matrix:

$(\begin{array}{l} a_{1, n}, a_{1, n - 1}, \dots, a_{1,1} \\ a_{2, n}, a_{2, n - 1}, \dots, a_{2,1} \\ \dots \\ a_{m, n}, a_{m, n - 1}, \dots, a_{m,1} \end{array})$

(\begin{array}{l} a_{one, n}, a_{one, n - one}, ..., a_{1,1} \\ a_{2 n}, a_{2 n - one}, ..., a_{2.1} \\ ... \\ a_{m, n}, a_{m, n - one}, ..., a_{m,one} \end{array})

Столбцы матрицы с элементами

(a_1j,a_2j,…,a_mj)

являются входными векторами, которые поступают на обработку в однородную вычислительную среду.Matrix Columns with Elements

(a _1j , a _2j , ..., a _mj )

are input vectors that are processed in a homogeneous computing environment.

В каждом такте на входы синхронизации всех триггеров подается сигнал синхронизации. На первый и второй информационные входы ячеек первого столбца подаются соответствующие биты разрядных срезов, причем каждый следующий разрядный срез подается на следующий такт работы устройства. С каждым следующим продвижением количество разрядов вектора уменьшается вдвое. Так продолжается до тех пор, пока количество разрядов передаваемого вектора не станет равным единице. Пройдя все столбцы однородной вычислительной среды, количество бит в i-м разрядном срезе сокращается до одного, данный бит является i-м разрядом искомой суммы исходных операндов. Возникающие единицы переносов посредством обратной связи в ячейке передаются на обработку в следующий разрядный срез.In each cycle, the synchronization signal is supplied to the synchronization inputs of all the triggers. The first and second information inputs of the cells of the first column are supplied with the corresponding bits of the bit slices, and each subsequent bit slice is fed to the next clock cycle of the device. With each subsequent advance, the number of bits of the vector is halved. This continues until the number of bits of the transmitted vector becomes equal to unity. Having passed all the columns of a homogeneous computing environment, the number of bits in the i-th bit slice is reduced to one, this bit is the i-th bit of the desired sum of the source operands. The resulting units of transfers by feedback in the cell are transferred for processing to the next bit cut.

В результате через log₂m тактов работы устройства формируется младший бит суммы m n-разрядных чисел, причем m должно быть кратно степени двойки. После чего конвейер является заполненным и биты результата доступны на выходе устройства каждый последующий такт работы. Так как в каждом такте работы устройства вектор передается в соседний справа столбец матрицы, на вход устройства на каждом такте может быть подан следующий вектор. Таким образом, устройство реализует конвейерный принцип обработки информации. Так как в ячейке самая длинная цепочка распространения сигнала имеет три логических элемента, время задержки распространения сигнала составляет 3·t, где t - время задержки сигнала одним логическим элементом.As a result, through the log ₂ m clock cycles of the device, the least significant bit of the sum m of n-bit numbers is formed, and m must be a multiple of a power of two. After which the pipeline is full and the result bits are available at the output of the device every subsequent clock cycle. Since in each step of the device’s operation, the vector is transferred to the matrix column adjacent to the right, the following vector can be fed to the input of the device at each clock step. Thus, the device implements the conveyor principle of information processing. Since the longest signal propagation chain in a cell has three logical elements, the signal propagation delay time is 3 · t, where t is the signal delay time by one logical element.

Если принять за время суммирования пары n-разрядных чисел n-тактов работы устройства, то время вычисления суммы в предлагаемом устройстве в конвейерном режиме равно m тактов, в то время как время суммирования пирамидальным способом равно p·n тактов, где p=log₂m. Таким образом, быстродействие устройства на базе описанного способа в log₂m раз выше по сравнению с быстродействием устройства на базе известного итерационного способа суммирования. Например, при количестве слагаемых m=64 быстродействие предлагаемого устройства больше в 8 раз.If we take for the summing time a pair of n-bit numbers of n-clock cycles of the device, then the time to calculate the sum in the proposed device in the pipelined mode is m ticks, while the summation time in the pyramidal way is p · n ticks, where p = log ₂ m . Thus, the speed of the device based on the described method is log ₂ m times higher compared to the speed of the device based on the well-known iterative summation method. For example, with the number of terms m = 64, the speed of the proposed device is more than 8 times.

На фиг.3 представлена структурная схема однородной вычислительной среды в общем виде на базе ячейки однородной структуры, предназначенная для конвейерных вычислений суммы m n-разрядных чисел, где CELL - ячейки однородной структуры, информационные входы X₁-X_m, информационный выход Y. Ячейка может быть реализована на любой элементной базе, имеющей возможность представления булевых функций, в том числе: сверхбольшие интегральные схемы и программируемые логические интегральные схемы (ПЛИС). Вариант реализации на языке программирования аппаратуры VHDL приведен на фиг.4.Figure 3 presents a structural diagram of a homogeneous computing environment in general form based on a cell of a homogeneous structure, intended for pipelining calculations of the sum of m n-bit numbers, where CELL are cells of a homogeneous structure, information inputs X ₁ -X _m , information output Y. Cell can be implemented on any element base that has the ability to represent Boolean functions, including: extra-large integrated circuits and programmable logic integrated circuits (FPGAs). An implementation option in the programming language of the VHDL equipment is shown in Fig.4.

Claims

Homogeneous computing environment for pipelining calculations of the sum of m n-bit numbers, consisting of cells made of two two-input AND elements, two two-input EXCLUSIVE elements, one two-input OR element, one NOT element, three information triggers, the cell synchronization input is connected to the synchronization inputs the first, second trigger and the element NOT, the output of which is connected to the synchronization input of the third trigger, the first and second information inputs of the cell are connected respectively to the first and second inputs the ode of the AND element, whose output is connected to the second input of the OR element, the first and second information inputs of the cell are connected respectively to the first and second input of the first EXCLUSIVE OR element, the output of which is connected to the first input of the second EXCLUSIVE OR element and the first input of the second AND element, the output of which connected to the first input of the OR element, the output of which is connected to the information input of the second trigger, the output of which is connected to the information input of the third trigger, the output of which is connected to the second input the second AND element and the second input of the second EXCLUSIVE OR element, the output of which is connected to the information input of the first trigger, the output of which is the information output of the cell, where the synchronization input is connected to the synchronization inputs of all cells, the array of initial m n-bit binary numbers is processed in the form of n binary m-dimensional vectors, and m must be a multiple of a power of two, the first information input of the cells of the first column of a homogeneous computing environment is connected to the odd bits of the n-bit input vector, the second information input of the cells of the first column of a homogeneous computing environment is connected to even bits of the m-bit input vector, the information output of each (i-1, j) and (i, j) -th cell is connected respectively to the first and second information input (i / 2, j + 1) -th cells, with i∈ [2, m / 2 ^j ] and i taking only even values, the number of cells in the j-th column of a homogeneous computing environment is m / 2 ^j , the number of columns is uniform computing environment is equal to p, where p = log ₂ m, the information output of the cell of the last column is homogeneous The computational medium is the output of the circuit from which the result is taken.