RU2449354C1

RU2449354C1 - Vector normalising apparatus

Info

Publication number: RU2449354C1
Application number: RU2010134146/08A
Authority: RU
Inventors: Виктор Николаевич Бабенко (RU); Виктор Николаевич Бабенко
Original assignee: Виктор Николаевич Бабенко
Priority date: 2010-08-13
Filing date: 2010-08-13
Publication date: 2012-04-27
Also published as: RU2010134146A

Abstract

FIELD: information technology.

SUBSTANCE: apparatus has an inverting unit, comprising shift code generating circuits, shift circuits, circuits for generating the code for setting the adder-subtractor operating mode and n normalising units, each having shift circuits and adder-subtractors.

EFFECT: faster operation.

1 dwg

Description

Изобретение относится к вычислительной (процессорной) технике и может быть использовано:The invention relates to computing (processor) technology and can be used:

1) в высокопроизводительных вычислительных системах,1) in high-performance computing systems,

2) в персональных компьютерах в качестве средства повышения их производительности, реализуемого как подсхема в составе арифметического процессора или же в составе отдельного устройства (спецпроцессора).2) in personal computers as a means of increasing their productivity, implemented as a subcircuit as part of an arithmetic processor or as part of a separate device (special processor).

Высокопроизводительные вычислительные системы имеют в своем составе n умножителей, которые предназначены прежде всего для быстрого выполнения операции умножения компонент n-мерного вектора на произвольное число, поэтому эту совокупность умножителей называют векторным умножителем [1]. На первый вход i-го умножителя подается i-я компонента вектора x (i=1, n), на второй вход каждого умножителя подается произвольное число а, на выходе векторного умножителя получают вектор y=ax, причем вычисления на всех умножителях производятся параллельно (одновременно). На векторном умножителе можно выполнять и операцию векторного деления. Для этого все умножители переводятся в режим деления, на первый вход i-го умножителя подается i-я компонента вектора x, на второй вход каждого умножителя подается произвольное число b, на выходе векторного умножителя получают вектор y=[x₁/b…x_n1/b]. Однако операция деления на умножителе выполняется медленнее, чем умножение на порядок числа m, где m - число разрядов, отводимых под мантиссу машинных чисел. В математике умножение вектора x на произвольное число а называется нормировкой вектора. В вычислительной математике операция нормировки вектора выполняется очень часто (является так называемой базовой операцией), при этом, к сожалению, обычно известен не нормирующий множитель а, а число b, связанное с а равенством а=1/b, поэтому в этом случае чтобы выполнить нормировку вектора, приходиться переводить векторный умножитель в режим деления. Известно другое устройство (Cordic) [2], в котором осуществляется нормировка двумерного вектора. Это устройство предназначено для осуществления поворота двумерного вектора. Устройство Cordic реализует два этапа вычислений: 1) этап псевдовращений и 2) этап нормировки. На первом этапе выполняются вычисления по формулам:High-performance computing systems include n multipliers, which are designed primarily for quickly performing the operation of multiplying the components of an n-dimensional vector by an arbitrary number, therefore this set of multipliers is called a vector multiplier [1]. The first input of i-th multiplier is supplied i-I x component of the vector (i = 1, n), to a second input of each multiplier is supplied random number and, at the output of the vector multipliers receive vector y = a x, wherein the calculation for all the multipliers are produced in parallel (at the same time). A vector division operation can also be performed on a vector multiplier. To do this, all multipliers are switched into division mode, the i-th component of the vector x is fed to the first input of the i-th multiplier, an arbitrary number b is fed to the second input of each multiplier, the vector y = [x ₁ / b ... x _{n1 is} obtained at the output of the vector multiplier / b]. However, the division operation on the multiplier is slower than multiplication by the order of the number m, where m is the number of digits allocated to the mantissa of machine numbers. In mathematics, multiplying the vector x by an arbitrary number a is called the normalization of the vector. In Computational Mathematics vector normalization operation is performed very frequently (is a so-called basic operation), while unfortunately, is generally known not normalizing factor a, and the number b, associated with and equation a = 1 / b, so in this case to perform normalization of the vector, you have to translate the vector multiplier into division mode. Another device (Cordic) is known [2] in which the normalization of a two-dimensional vector is carried out. This device is designed to rotate a two-dimensional vector. The Cordic device implements two stages of calculations: 1) the pseudo-rotation stage and 2) the normalization stage. At the first stage, calculations are performed according to the formulas:

Реализация последних формул выполняется в блоке псевдовращений. Второй этап осуществляет нормировку вектора [x_m-1, y_m-1] с целью устранения растяжения исходного вектора [x, y], обусловленного этапом псевдовращений. Коэффициент растяжения а определяется формулойThe implementation of the latest formulas is performed in the pseudo-rotation block. The second stage normalizes the vector [x _m-1 , y _m-1 ] in order to eliminate the stretching of the original vector [x, y], due to the pseudo-rotation stage. The tensile coefficient a is determined by the formula

Вычисления второго этапа состоят в следующемThe calculations of the second stage are as follows

Они обусловлены представлением инверсии числа а в виде произведения

Такое представление является простой задачей, так как число а фиксировано. Как мы видим, в устройстве Cordic при осуществлении нормировки вектора удалось избежать трудоемкой операции деления компонент вектора [x_m-1, y_m-1] на число а. Технически формулы (2) реализуются в блоках нормировки, каждый из которых представляет цепочку пар (регистр сдвига, вычитатель), причем выход вычитателя i-й пары соединен с первым входом вычитателя i+1-й пары, а также с входом регистра сдвига i+1-й пары, кроме того, выход регистра сдвига i+1-й пары соединен со вторым входом вычитателя i+1-й пары. Таким образом, устройство Cordic состоит из трех блоков: блока псевдовращений и двух блоков нормировки, причем первый и второй выходы блока псевдовращений соединены соответственно с входами первого и второго блоков нормировки. На вход устройства подается вектор [x, y], на выходе получают вектор [u, ν], компоненты которого связаны с вектором [x, y] соотношениями (1) и (2). Другие устройства, в которых реализуется операция нормировки вектора, а также специализированные устройства нормировки вектора автору неизвестны.They are due to the representation of the inversion of a as a product

Such a representation is a simple task, since the number a is fixed. As we see in Cordic device when implementing normalization vector avoided consuming division operation components of the vector [x _m-1, y _m-1] and the number. Technically, formulas (2) are implemented in normalization blocks, each of which represents a chain of pairs (shift register, subtractor), and the output of the i-th pair subtractor is connected to the first input of the i + 1-th pair subtractor, as well as to the input of the shift register i + 1st pair, in addition, the output of the shift register i + 1st pair is connected to the second input of the subtractor i + 1st pair. Thus, the Cordic device consists of three blocks: a pseudo-rotation block and two normalization blocks, with the first and second outputs of the pseudo-rotation block connected to the inputs of the first and second normalization blocks, respectively. The vector [x, y] is fed to the input of the device, the vector [u, ν], the components of which are connected with the vector [x, y] by the relations (1) and (2), is obtained. Other devices in which the vector normalization operation is implemented, as well as specialized devices for vector normalization, are unknown to the author.

Наиболее близким по технической сущности к заявляемому изобретению является блок нормировки, входящий в состав устройства вращения плоскости (Cordic). Недостатком этого блока является узость решаемой им задачи: 1) размерность нормируемого вектора фиксирована и равна двум, 2) делитель является фиксированным числом. С другой стороны, осуществление деления компонент вектора на умножителях дорого и недостаточно быстро (значительно медленнее умножения).The closest in technical essence to the claimed invention is a normalization unit, which is part of a plane rotation device (Cordic). The disadvantage of this block is the narrowness of the problem it solves: 1) the dimension of the normalized vector is fixed and equal to two, 2) the divisor is a fixed number. On the other hand, the implementation of the division of the components of the vector on the multipliers is expensive and not fast enough (much slower than the multiplication).

Приведенные выше характеристики аналогов и определяют цель изобретения: создание специализированного высокопроизводительного устройства нормировки n-мерного вектора, в котором вычисления выполнялись бы по формулам, структура которых указана в (2), при этом делитель был бы произвольным числом.The above characteristics of analogues determine the purpose of the invention: the creation of a specialized high-performance device for normalizing an n-dimensional vector in which calculations would be performed according to formulas whose structure is indicated in (2), and the divider would be an arbitrary number.

Поставленная цель достигается включением в состав заявляемого устройства специально разработанного блока инверсии числа. На его вход поступает делитель а. Этот блок реализует вычисления по формулам: a ₁=a,

i=1, [m/2], где [m/2] - целая часть числа m/2, σ_i∈{-1, 0, 1}. Кроме этого блока в состав устройства входят n блоков нормировки. На входы блоков нормировки поступают компоненты вектора x=[x₁…x_n]. Блоки нормировки реализуют формулы:

,

, i=1, [m/2],

, j=1, n. На выходе устройства нормировки получают вектор u=[x₁/a…x_n/a]. Блок инверсии числа и блоки нормировки представляют собой цепочки последовательно соединенных каскадов. Каждый каскад блока инверсии (кроме последнего) содержит схему формирования кода сдвига, схему сдвига, схему формирования кода установления режима работы сумматора-вычитателя и сумматор-вычитатель. Внутрикаскадные соединения: выход схемы формирования кода сдвига соединен со вторым входом схемы сдвига, выход схемы сдвига соединен со вторым входом сумматора-вычитателя, выход схемы формирования кода установки режима работы сумматора-вычитателя соединен с входом установки режима работы сумматора-вычитателя. Межкаскадные соединения: первый вход сумматора-вычитателя, вход схемы формирования кода сдвига, первый вход схемы сдвига и вход схемы формирования кода установки режима работы сумматора-вычитателя i+1-го каскада соединены с выходом сумматора-вычитателя i-го каскада. На первый вход сумматора-вычитателя, вход схемы формирования кода сдвига, первый вход схемы сдвига и вход схемы формирования кода установки режима работы сумматора-вычитателя 1-го каскада подается число а. Последний каскад (его номер [m/2]) содержит схему формирования кода сдвига и схему формирования кода установления режима работы. Входы этих схем соединены с выходом сумматора-вычитателя предпоследнего каскада. Каждый каскад блока нормировки имеет схему сдвига и сумматор-вычитатель. Внутрикаскадные соединения: выход схемы сдвига соединен со вторым входом сумматора-вычитателя. Межкаскадные соединения: первый вход сумматора-вычитателя и первый вход схемы сдвига i+1-го каскада соединены с выходом сумматора-вычитателя i-го каскада. Наконец, межблочные соединения: все блоки нормировки соединены с блоком инверсии, причем выход схемы формирования кода сдвига i-го каскада блока инверсии соединен со вторым входом схемы сдвига i-го каскада каждого блока нормировки, а выход схемы формирования кода установления режима работы сумматора-вычитателя i-го каскада блока инверсии соединен с входом установки режима работы сумматора-вычитателя i-го каскада каждого блока нормировки.This goal is achieved by including in the composition of the claimed device a specially designed number inversion unit. Supplied to its input and divisor. This block implements calculations by the formulas: a ₁ = a ,

i = 1, [m / 2], where [m / 2] is the integer part of the number m / 2, σ _i ∈ {-1, 0, 1}. In addition to this block, the device includes n normalization blocks. The inputs of the normalization blocks receive the components of the vector x = [x ₁ ... x _n ]. Normalization blocks implement the formulas:

,

, i = 1, [m / 2],

, j = 1, n. At the output of the normalization device, the vector u = [x ₁ / a ... x _n / a ] is obtained. The number inversion block and normalization blocks are chains of series-connected cascades. Each cascade of the inversion block (except the last) contains a shear code generation circuit, a shear circuit, an adder-subtractor operation mode establishment code formation circuit, and a subtractor adder. Intracascade connections: the output of the shift code generation circuit is connected to the second input of the shift circuit, the output of the shift circuit is connected to the second input of the adder-subtracter, the output of the formation code of the setup code of the adder-subtractor operation mode is connected to the input of the adder-subtractor operation mode setting input. Interstage connections: the first input of the adder-subtractor, the input of the shift code generation circuit, the first input of the shift circuit and the input of the setup code of the operation mode of the adder-subtractor of the i + 1-st stage are connected to the output of the adder-subtractor of the i-th stage. In the first adder-subtracter input code phase generating circuit input, the first input shift circuit and an input code generation circuit set the operation mode of the adder-subtracter 1-th stage is supplied and the number. The last cascade (its number [m / 2]) contains a scheme for generating a shift code and a scheme for generating a code for setting the operating mode. The inputs of these circuits are connected to the output of the adder-subtractor of the penultimate stage. Each cascade of the normalization block has a shift circuit and an adder-subtractor. Intracascade connections: the output of the shear circuit is connected to the second input of the adder-subtractor. Interstage connections: the first input of the adder-subtractor and the first input of the shift circuit of the i + 1th stage are connected to the output of the adder-subtractor of the i-th stage. Finally, interblock connections: all normalization blocks are connected to the inversion block, and the output of the shift code generation circuit of the i-th cascade of the inversion unit is connected to the second input of the i-stage cascade shift circuit of each normalization block, and the output of the adder-subtractor operation mode setting code formation circuit the i-th cascade of the inversion block is connected to the input of the installation of the operating mode of the adder-subtractor of the i-th cascade of each normalization block.

Сопоставительный анализ с прототипом показывает, что заявляемое устройство отличается как составом, так и способом соединения вычислительных элементов.Comparative analysis with the prototype shows that the inventive device differs both in composition and in the method of connecting computing elements.

Таким образом, заявляемое устройство соответствует критерию «новизна».Thus, the claimed device meets the criterion of "novelty."

Сравнение заявляемого технического решения не только с прототипом, но и с другими техническими решениями позволяет сделать вывод о соответствии заявляемого технического решения критерию «существенные отличия».Comparison of the claimed technical solution not only with the prototype, but also with other technical solutions allows us to conclude that the claimed technical solution meets the criterion of "significant differences".

Изобретение поясняется структурной схемой, изображенной на рис.1.The invention is illustrated by the structural diagram shown in Fig. 1.

Устройство нормировки вектора содержит блок инверсии и n блоков нормировки. Блок инверсии представляет собой цепочку из [m/2] каскадов, каждый из которых, за исключением последнего, содержит схему формирования кода сдвига 1, схему сдвига 2, схему формирования кода установления режима работы сумматора-вычитателя 3 и сумматор-вычитатель 4, соединенных как показано на рис.1. Последний каскад содержит схему формирования кода сдвига 1 и схему формирования кода установления режима работы сумматора-вычитателя 3. Каждый блок нормировки также представляет собой цепочку из m/2 каскадов, каждый из которых содержит схему сдвига 2 и сумматор-вычитатель 4 (см. рис.1).The vector normalization device contains an inversion block and n normalization blocks. The inversion block is a chain of [m / 2] stages, each of which, with the exception of the last, contains a circuit for generating a shift code 1, a shift circuit 2, a circuit for generating a code for setting the operating mode of the adder-subtractor 3, and the adder-subtractor 4, connected as shown in fig. 1. The last stage contains a scheme for generating a shift code 1 and a scheme for generating a code for setting the operating mode of the adder-subtractor 3. Each normalization block also represents a chain of m / 2 stages, each of which contains a shift scheme 2 and an adder-subtractor 4 (see Fig. one).

Устройство спроектировано для 32-разрядных чисел, представленных в формате с плавающей запятой (24 разряда отведено под мантиссу и 8 - под порядок). На вход заявляемого устройства подаются число а и компоненты вектора x=(x₁…x_n), где n≥2. На выход устройства подаются компоненты вектора u=(u_1…u_n), определяемые соотношениями u_i=x_i/а.The device is designed for 32-bit numbers, presented in a floating point format (24 bits are reserved for the mantissa and 8 for the order). At the input of the inventive device serves the number a and the components of the vector x = (x ₁ ... x _n ), where n≥2. On the device components are output vector u = (u _{1 ...} u _n), defined by the relations u _{_i} = x _i / well.

Для обеспечения точности выходных величин промежуточные вычисления осуществлялись на (m+r)-разрядных сумматорах, где r - число дополнительных младших разрядов, выделяемых под мантиссу. При r=5 погрешность вычисления выходных величин не превышает цены их младшего разряда.To ensure the accuracy of the output quantities, intermediate calculations were performed on (m + r) -bit adders, where r is the number of additional low order bits allocated for the mantissa. At r = 5, the error in calculating the output quantities does not exceed the price of their least significant bit.

Для обеспечения сходимость процесса вычислений в состав каждого блока нормировки входит всего лишь [m/2] сумматоров. Всего же в состав устройства входит

сумматоров.To ensure the convergence of the calculation process, each normalization block contains only [m / 2] adders. In total, the device includes

adders.

При n=2 устройство было аппаратно реализовано на программируемой логической интегральной схеме (ПЛИС) "EP1K50FC484-1" семейства АСЕХ1K производства фирмы "Altera".At n = 2, the device was implemented in hardware on a programmable logic integrated circuit (FPGA) "EP1K50FC484-1" of the ACEX1K family manufactured by Altera.

Деление компонент вектора x на число а можно произвести и на устройстве, состоящем из n умножителей, каждый из которых содержит m m-разрядных сумматоров. Таким образом, цена такого устройства будет определяться числом nm² (схемы сравнения не учитываются). Цена заявляемого устройства определяется числом (n+1)(m+r)[m/2]. Сопоставляя эти числа, мы видим, что заявляемое устройство приблизительно в 2 раза дешевле устройства, выполняемого на умножителях.Dividing component of the vector x and the number can be produced and a device consisting of n multipliers, each of which comprises m m-bit adders. Thus, the price of such a device will be determined by the number nm ² (comparison schemes are not taken into account). The price of the claimed device is determined by the number (n + 1) (m + r) [m / 2]. Comparing these numbers, we see that the claimed device is approximately 2 times cheaper than the device running on the multipliers.

Сравним быстродействие этих устройств.Compare the performance of these devices.

Алгоритм деления числа x на число а состоит в следующем:The algorithm for dividing the number x by the number a is as follows:

1.

,one.

,

2.

,2.

,

3.

4.

.four.

.

Введем обозначения: t_cp - время задержки сигнала элементарной ячейкой сравнения, t_п - время задержки сигнала переноса элементарного сумматора, t_c - время задержки сигнала суммы элементарного сумматора. На i-ом шаге алгоритма деления одного числа на другое происходит задержка сигнала, равная m(t_cp+t_п). Так как алгоритм состоит из m шагов, то время выполнения операции деления оценится величиной m²(t_cp+t_п). Считая время выполнения пункта 2 приближенно равным времени выполнения пункта 3, мы можем время выполнения операции деления на прототипе охарактеризовать числом

(задержка сигнала t_c не учитывается, так как осуществляется всего m суммирований, что в m меньше чем число переносов).We introduce the notation: t _cp is the delay time of the signal by the elementary comparison cell, t _p is the delay time of the transfer signal of the elementary adder, t _c is the delay time of the signal of the sum of the elementary adder. At the i-th step of the algorithm for dividing one number by another, a signal delay equal to m (t _cp + t _p ) occurs. Since the algorithm consists of m steps, the execution time of the division operation is estimated by the value of m ² (t _cp + t _p ). Considering the execution time of paragraph 2 to be approximately equal to the execution time of paragraph 3, we can characterize the time of the division operation on the prototype as

(the signal delay t _{c is} not taken into account, since only m summations are carried out, which is m less than the number of transfers).

Определить формулой время срабатывания заявляемого устройства затруднительно, так как разным а могут соответствовать разные последовательности сдвигов, поэтому мы представим верхнюю грань определяемой величины. Установлено, что время срабатывания заявляемого устройства Т_З (верхняя грань) определяется соотношениями:Determine the formula of the claimed response time of the device is difficult, as different as can correspond to different sequences of shifts, so we represent the upper bound defined value. It is established that the response time of the inventive device T _Z (upper bound) is determined by the ratios:

N₁=0, N_i=N_i-1+2(i-1), i=2, [m/2],N ₁ = 0, N _i = N _i-1 +2 (i-1), i = 2, [m / 2],

При m=24 и r=5 T_пp=1152t_п, Т_з=151t_п+12t_с. Учитывая, что t_c≈2t_п, мы можем записать Т_з≈175t_п. Отсюда следует, что

Таким образом, мы видим, что в данном случае (m=24 и r=5) заявляемое устройство по быстродействию более чем в 7.6 раза превосходит устройство, выполненное на умножителях.At m = 24 and r = 5 T _pp = 1152t _p , T _s = 151t _p + 12t _s . Given that t _c ≈2t _p , we can write T _s ≈175t _p . It follows that

Thus, we see that in this case (m = 24 and r = 5), the claimed device in terms of speed is more than 7.6 times higher than the device made on the multipliers.

Источники информацииInformation sources

1. Ортега Дж. Введение в параллельные и векторные методы решения линейных систем. М.: Мир, 1991.1. Ortega J. Introduction to parallel and vectorial methods for solving linear systems. M .: Mir, 1991.

2. Сверхбольшие интегральные схемы и современная обработка сигналов. Под ред. С. Гуна, Х. Уайтхауса, Т. Кайлата. М.: Радио и связь, 1989, стр.269-272.2. Extra large integrated circuits and advanced signal processing. Ed. S. Guna, H. Whitehouse, T. Kailat. M .: Radio and communications, 1989, pp. 269-272.

Claims

A device for normalizing an n-dimensional vector containing a divider inversion unit and n normalization blocks, each block being a chain of [m / 2] cascades connected in series, and each stage of the inversion block (except the last) contains a shift code generation circuit, a shift circuit , the formation scheme of the code for establishing the operating mode of the adder-subtractor and the adder-subtracter, and each cascade of the normalization block is a shift circuit and the adder-subtractor, while the elements of the device are connected as follows: 1) intracascade compound inversion unit: Yield code phase forming circuit is connected to the second input of the shift circuit, the shift circuit output is connected to the second input of the adder-subtractor, the output code generation circuit set the operation mode of the adder-subtractor connected to the input for setting the operation mode of the adder-subtracter; 2) interstage connections of the inversion unit: the first input of the adder-subtractor, the input of the shift code generation circuit, the first input of the shift circuit and the input of the setup code of the setup mode of the adder-subtractor of the i + 1 stage are connected to the output of the adder-subtractor of the i-th stage ; the divider a is supplied to the first input of the adder-subtracter, the input of the shear code generation circuit, the first input of the shear circuit and the input of the formation code of the setup code of the adder-subtractor of the i-th stage; the last stage (its number [m / 2]) contains a shift code generation circuit and an operation mode establishment code generation circuit, the inputs of these circuits being connected to the output of the adder-subtractor of the penultimate stage; 3) intracascade connections of the normalization unit: the output of the shear circuit is connected to the second input of the adder-subtractor; 4) interstage connections of the normalization block: the first input of the adder-subtractor and the first input of the shift circuit of the i + 1th stage are connected to the output of the adder-subtractor of the i-th stage; 5) interblock connections: all normalization blocks are connected to the inversion block, and the output of the shift code generation circuit of the i-th cascade of the inversion block is connected to the second input of the i-stage cascade shift circuit of each normalization block, and the output of the adder-subtractor operation mode setting code formation circuit the i-th cascade of the inversion block is connected to the input of setting the operating mode of the adder-subtractor of the i-th cascade of each normalization block; the jth component of the normalized vector x is fed to the first input of the adder-subtractor and the first input of the shift circuit of the 1st stage of the jth normalization block (j = 1, n), and the output of the last stage of the jth normalization block (its number [ m / 2]) get the jth component of the normalized vector u.