RU2276805C2

RU2276805C2 - Method and device for separating integer and fractional components from floating point data

Info

Publication number: RU2276805C2
Application number: RU2004104325/09A
Authority: RU
Inventors: Андрей Алексеевич Нарайкин (RU); Андрей Алексеевич Нарайкин; Ольга Владимировна Дрыжакова (RU); Ольга Владимировна Дрыжакова; Александр Владимирович Исаев (RU); Александр Владимирович Исаев; Роберт Семьюэл НОРИН (US); Роберт Семьюэл НОРИН
Original assignee: Интел Зао
Priority date: 2001-07-13
Filing date: 2001-07-13
Publication date: 2006-05-20
Also published as: RU2004104325A

Abstract

FIELD: data processing technologies, in particular, method and device for decreasing number of operations with floating point, required for extracting integer and fractional components.

SUBSTANCE: device has computing component, which generates first constant, second value by means of shifting truncated integer part of value with floating point, value with floating point by subtracting first constant from floating point value, extracts multiple mantissa bits from second value to produce integer value, generates remainder value from floating point value, extracts part of bits from integer value to produce integer component, stores remainder value, integer component and floating point component in memory.

EFFECT: decreased computation time without negative effect on precision of result.

7 cl, 8 dwg

Description

ПРЕДПОСЫЛКИ ИЗОБРЕТЕНИЯBACKGROUND OF THE INVENTION

Область техники, к которой относится изобретениеFIELD OF THE INVENTION

Изобретение связано с вычислительной обработкой данных и, более конкретно, относится к методу и устройству для сокращения числа операций с плавающей точкой, необходимых для извлечения целой и дробной компонент.The invention relates to computational data processing and, more specifically, relates to a method and apparatus for reducing the number of floating-point operations required to extract an integer and fractional component.

Уровень техникиState of the art

Во многих системах обработки данных, существующих в настоящий момент, таких как персональные компьютеры (ПК), математические вычисления играют важную роль. Численные алгоритмы для вычисления значений многих математических функций, таких как операция возведения в степень и тригонометрические операции, требуют разложения чисел с плавающей точкой на соответствующие целые и дробные части. Такие операции могут использоваться для редукции аргументов, указателей к значениям таблицы или для построения результата из некоторого числа составных частей. Разложения чисел с плавающей точкой на целые и дробные части часто встречаются в критических вычислительных путях. В результате время на выполнение вычислений значений математических функций часто ограничено.In many current data processing systems, such as personal computers (PCs), mathematical calculations play an important role. Numerical algorithms for calculating the values of many mathematical functions, such as exponentiation and trigonometric operations, require decomposition of floating-point numbers into corresponding integer and fractional parts. Such operations can be used to reduce arguments, pointers to table values, or to construct a result from a number of components. Decompositions of floating point numbers into integer and fractional parts are often found in critical computational paths. As a result, the time to perform calculations of the values of mathematical functions is often limited.

КРАТКОЕ ОПИСАНИЕ ЧЕРТЕЖЕЙBRIEF DESCRIPTION OF THE DRAWINGS

Настоящее изобретение иллюстрируется с помощью примера и не ограничивается приведенными поясняющими чертежами, в которых одинаковые ссылки обозначают одинаковые элементы. Необходимо отметить, что ссылки на "одну" или "некоторую" реализацию в данном описании не обязательно относятся к одной и той же реализации и подобные ссылки относятся, по меньшей мере, к одной реализации.The present invention is illustrated by way of example and is not limited to the accompanying explanatory drawings, in which like references denote like elements. It should be noted that references to “one” or “some” implementation in this description do not necessarily refer to the same implementation, and similar links refer to at least one implementation.

Фиг.1 иллюстрирует стандарт ANSI/IEEE 754-1985, стандарт IEEE для двоичной арифметики с плавающей точкой, IEEE, Нью-Йорк 1985 (IEEE) представление числа с плавающей точкой с одинарной точностью, представление с двойной точностью и представление с расширенной двойной точностью.Figure 1 illustrates the ANSI / IEEE 754-1985 standard, the IEEE standard for binary floating-point arithmetic, IEEE, New York 1985 (IEEE) single-precision floating-point representation, double-precision representation, and extended double-precision representation.

Фиг.2 изображает типовой способ вычисления целых чисел и чисел с плавающей точкой для некоторых равенств.Figure 2 depicts a typical method for calculating integers and floating point numbers for some equalities.

Фиг.3 иллюстрирует одну реализацию настоящего изобретения, в которой уменьшено число операций с плавающей точкой, необходимых для вычисления целой и дробной компонент.Figure 3 illustrates one implementation of the present invention in which the number of floating-point operations necessary to calculate the integer and fractional components is reduced.

Фиг.4 содержит одну реализацию изобретения, используемую для обобщения выбора константы S.Figure 4 contains one implementation of the invention used to generalize the choice of the constant S.

Фиг.5 иллюстрирует типовой процесс загрузки констант и вычисления необходимых коэффициентов для разложения чисел с плавающей точкой на целую и дробную части.Figure 5 illustrates a typical process of loading constants and calculating the necessary coefficients for decomposing floating-point numbers into integer and fractional parts.

Фиг.6А-Б показывает некоторую реализацию изобретения для загрузки констант и осуществления разложения чисел с плавающей точкой на целую и дробную части.6A-B show some embodiment of the invention for loading constants and decomposing floating-point numbers into integer and fractional parts.

Фиг.7 показывает некоторую реализацию настоящего изобретения, имеющую вычислительную компоненту.7 shows some implementation of the present invention having a computing component.

ПОДРОБНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

В целом изобретение касается способа и устройства уменьшения числа операций с плавающей точкой, необходимых для вычисления целой и дробной компонент. Далее со ссылками на фигуры будут описываться типичные реализации настоящего изобретения. Типичные реализации выбраны для того, чтобы проиллюстрировать изобретение и они не должны рассматриваться в качестве какого-либо ограничения рамок изобретения.In General, the invention relates to a method and apparatus for reducing the number of floating point operations necessary for calculating the integer and fractional components. Next, with reference to the figures, typical implementations of the present invention will be described. Typical implementations are selected in order to illustrate the invention and should not be construed as limiting the scope of the invention.

Фиг.1 иллюстрирует стандарт ANSI/IEEE 754-1985, стандарт IEEE для двоичной арифметики с плавающей точкой, IEEE, Нью-Йорк 1985 (IEEE) изображения для представления 105 числа с плавающей точкой с одинарной точностью, представления 106 с двойной точностью и представления 107 с расширенной двойной точностью. Для представления 105 IEEE с одинарной точностью необходимо 32-битовое слово. Данное 32-битовое слово может быть представлено битами, пронумерованными слева направо (от 0 до 31). Первый бит, помеченный как S 110, является битом для знака. Следующие восемь бит, помеченные как Е 120, являются битами показателя степени. Последние 23 бита, с 9 по 31 бит, помеченные как F 110, служат для представления значащей части числа (также называемой мантиссой).1 illustrates the ANSI / IEEE 754-1985 standard, the IEEE standard for binary floating-point arithmetic, IEEE, New York 1985 (IEEE) images for representing 105 single-precision floating-point numbers, double-precision representations 106, and 107 representations with extended double precision. A single-precision representation of IEEE 105 requires a 32-bit word. This 32-bit word can be represented by bits numbered from left to right (from 0 to 31). The first bit, marked as S 110, is a bit for the sign. The next eight bits, labeled E 120, are exponent bits. The last 23 bits, from 9 to 31 bits, labeled F 110, are used to represent a significant part of the number (also called mantissa).

Для стандарта IEEE в представлении 106 числа с двойной точностью бит S 110 является битом для знака, биты Е 140 являются битами показателя степени (11 бит), и последние биты представления F 150 являются 52-ми битами представления значащей части числа (также называемой мантиссой).For the IEEE standard in double precision representation of a 106 number, the S 110 bit is a sign bit, the E 140 bits are exponent bits (11 bits), and the last F 150 representation bits are 52 bits representing the significant part of the number (also called mantissa) .

Для стандарта IEEE в представлении 107 числа с двойной точностью бит S 110 является битом для знака, биты Е 160 являются битами показателя степени (15 бит), и последние биты представления F 170 являются 64-ми битами представления значащей части числа (также называемой мантиссой).For the IEEE standard in double precision representation of a 107 number, bit S 110 is the sign bit, bits E 160 are exponent bits (15 bits), and the last bits of the representation F 170 are 64 bits representing the significant part of the number (also called mantissa) .

В качестве примера разложения чисел с плавающей точкой на целые и дробные части предлагаются следующие равенства, которые иллюстрируют один подобный пример:As an example of decomposition of floating-point numbers into integer and fractional parts, the following equalities are proposed, which illustrate one similar example:

ДаноGiven

w=х*А (равенство 1),w = x * A (equality 1),

где А=1/В (равенство 2).where A = 1 / B (equality 2).

Найти n и r такие, что х=n*B+r (равенство 3),Find n and r such that x = n * B + r (equality 3),

где n является целым числом, а А, В, r и w - числа с плавающей точкой. Таким образом задачу можно переформулировать так: для заданных входного числа х и констант А и В надо найти число n такое, что число В ровно n раз "умещается" в числе х и какой при этом получается остаток? Более того, n часто используется в качестве указателя для поиска в таблице или в качестве показателя степени некоторой величины, такой как 2ⁿ. Следовательно, необходимо представлять число n и как целое число (n_i), и как число с плавающей точкой (n_f). Таким образом, после вычислений необходимо получить три величины: n_i (n как целое число), n_f (n как число с плавающей точкой) и r в виде числа с плавающей точкой.where n is an integer, and A, B, r and w are floating point numbers. Thus, the problem can be reformulated as follows: for a given input number x and constants A and B, it is necessary to find a number n such that the number B exactly "fits" in the number x exactly and what is the rest? Moreover, n is often used as an index for searching a table or as an exponent of a certain quantity, such as 2 ⁿ . Therefore, it is necessary to represent the number n both as an integer (n _i ) and as a floating-point number (n _f ). Thus, after the calculations, it is necessary to obtain three quantities: n _i (n as an integer), n _f (n as a floating-point number) and r as a floating-point number.

Фиг.2 изображает типовой способ вычисления n_i, n_f и r. На фиг.2 процесс 200 начинается с блока 210, где w=х*А. В блоке 220 число w преобразуется в ненормализованное целое число, полученное после округления. Значение, вычисленное в блоке 220, затем используется в блоке 230 для вычисления n_f посредством его нормализации как целого числа. В блоке 240 также используется значение из блока 220: преобразуя значение из блока 220 в целое число, вычисляется число n_i. В блоке 250 полученное значение n_i пересылается в арифметико-логическое устройство (АЛУ) или сохраняется в памяти. В блоке 260 вычисляется число r: из числа х вычитается значение n_f*В. В блоке 270 значение r можно пересылать в АЛУ или сохранять в памяти.Figure 2 depicts a typical method for calculating n _i , n _f and r. 2, process 200 begins with block 210, where w = x * A. In block 220, the number w is converted to a non-normalized integer obtained after rounding. The value calculated in block 220 is then used in block 230 to calculate n _f by normalizing it as an integer. In block 240, the value from block 220 is also used: converting the value from block 220 to an integer, the number n _{i is} calculated. In block 250, the obtained value of n _{i is} sent to the arithmetic logic unit (ALU) or stored in memory. In block 260, the number r is calculated: the value n _f * B is subtracted from the number x. At block 270, the r value can be sent to the ALU or stored in memory.

Таблица I иллюстрирует типовой способ вычисления n_i, n_f и r в терминах команд псевдокода. Как видно из Таблицы I, существует три операции с плавающей точкой, которые выполняются устройством арифметических и логических операций с плавающей точкой (Палу), и одна операция с целыми числами, выполняемая устройством арифметических и логических операций с целыми числами (Цалу). Заметим, что числа в круглых скобках являются общими количествами тактов работы при выполнении команд (задержка) для процессора, такого как процессор Intel Itanium™.Table I illustrates a typical method for computing n _i , n _f, and r in terms of pseudo-code instructions. As can be seen from Table I, there are three floating-point operations that are performed by the floating-point arithmetic and logical unit (Palu), and one integer operation performed by the integer-arithmetic and logical unit (Tsalu). Note that the numbers in parentheses are the total number of clock cycles when executing commands (delay) for a processor such as an Intel Itanium ™ processor.

Таблица ITable I Палу опер. 1Palu operas. one w=x*Аw = x * A (1)(one) Палу опер. 2Palu operas. 2 w_прсдвиг = преобразовать_в_ненормализ_округленное_целое (w)w_prismount = convert_ to_normalization_ rounded_integer (w) (6)(6) Палу опер. 3Palu operas. 3 n_f = преобразовать_в_нормализованное_целое_число (w_прсдвиг)n _f = convert_to_normalized_integer_number (w_prismount) (13)(13) Цалу опер. 1Tsalu operas. one n_i = преобразовать_в_целое (w_прсдвиг)n _i = convert_to_integer (w_prismount) (14)(fourteen) n_i доступноn _i available (18)(eighteen) Палу опер. 4Palu operas. four r=х-n_f * Вr = x-n _f * V (18)(eighteen) r доступноr available (23)(23)

Фиг.3 иллюстрирует одну реализацию настоящего изобретения, в которой уменьшено число операций с плавающей точкой, необходимых для вычисления n_i, n_f и r. Процесс 300 начинается в блоке 310, в котором вычисляется величина x*A+S, где S и А являются константами, а х - число с плавающей точкой. В одной реализации изобретения константа S выбирается так, чтобы прибавление числа S к числу х*А сдвигает округленную целую часть числа х*А в крайние правые биты мантиссы. Затем в блоке 320 вычисляется n_f посредством вычитания S из значения, вычисленного в блоке 310. Таким образом получается целое число. В блоке 330 получается n_i+S следующим образом: извлекаются биты мантиссы из результата блока 310. В блоке 340 вычисляется значение r: из х вычитается величина n_f*В. В блоке 350 из значения, вычисленного в блоке 330, извлекаются младшие биты и получается значение n_i. В блоке 360 значение n_i доступно готово для передачи в АЛУ или для сохранения в памяти. В блоке 370 значение r доступно для передачи в АЛУ или для сохранения в памяти.Figure 3 illustrates one implementation of the present invention in which the number of floating-point operations needed to calculate n _i , n _f and r is reduced. Process 300 begins at block 310, in which the value x * A + S is calculated, where S and A are constants, and x is a floating-point number. In one implementation of the invention, the constant S is selected so that adding the number S to the number x * A shifts the rounded integer part of the number x * A to the rightmost bits of the mantissa. Then, in block 320, n _{f is} calculated by subtracting S from the value calculated in block 310. Thus, an integer is obtained. In block 330, n _i + S is obtained as follows: the mantissa bits are extracted from the result of block 310. In block 340, the value r is calculated: the value n _f * B is subtracted from x. In block 350, the least significant bits are extracted from the value calculated in block 330 and the value n _{i is} obtained. In block 360, the value of n _{i is} available ready for transmission to the ALU or for storage in memory. At block 370, the value of r is available for transmission to the ALU or for storage in memory.

Таблица II иллюстрирует реализацию настоящего изобретения в виде команд псевдокода, причем в данной реализации уменьшено число операций с плавающей точкой. Заметим, что в качестве примера приведены числа в круглых скобках, которые являются общими количествами тактов работы при выполнении команд (задержка) для процессора, такого как процессор Intel Itanium™. В одной реализации изобретения константа S выбирается так, чтобы прибавление числа S к числу х*А сдвигает округленную целую часть числа х*А в крайние правые биты мантиссы. Таким образом S можно преобразовать в целое число n_i после одной операции Палу вместо двух. Более того, представление с плавающей точкой n_f может быть непосредственно получено с помощью второй операции Палу: вычитание S из результата первой операции Палу. Таким образом при получении необходимых величин используется на одну команду Палу меньше. Таким образом, реализация изобретения выливается в экономию семи тактов работы процессора, такого как процессор Intel Itanium™.Table II illustrates the implementation of the present invention in the form of pseudo-code instructions, and in this implementation, the number of floating-point operations is reduced. Note that, as an example, the numbers in parentheses are the total number of clock cycles when executing commands (delay) for a processor such as an Intel Itanium ™ processor. In one implementation of the invention, the constant S is selected so that adding the number S to the number x * A shifts the rounded integer part of the number x * A to the rightmost bits of the mantissa. Thus, S can be converted to an integer n _i after one Palu operation instead of two. Moreover, the floating-point representation n _f can be directly obtained using the second Palu operation: subtracting S from the result of the first Palu operation. Thus, when obtaining the required values, one less Palu command is used. Thus, the implementation of the invention translates into savings of seven clock cycles of the processor, such as the Intel Itanium ™ processor.

Таблица IITable II Палу опер. 1Palu operas. one w_плюс_S_сдвинутвправо = х*A+Sw_plus_S_shifted right = x * A + S (1)(one) Палу опер. 2Palu operas. 2 n_f=w_плюс_S_сдвинутвправо - Sn _f = w_plus_S_shifted right - S (6)(6) Цалу опер. 1Tsalu operas. one n_i_плюс_S = извлечь_биты_мантиссы (w_плюс_S_сдвинутвправо)n _i _plus_S = extract_bit_mantissa (w_plus_S_shifted to the right) (9)(9) Палу опер. 3Palu operas. 3 r=x-n_f*Br = xn _f * B (11)(eleven) Цалу опер. 2Tsalu operas. 2 n_i = извлечь_младшие_биты (n_i_плюс_S)n _i = extract_small_bits (n _i _ plus_S) (11)(eleven) n_i доступноn _i available (12)(12) r доступноr available (16)(16)

Выгода в производительности нарастает при использовании данной реализации изобретения в циклах программных продуктов. Многие циклы ограничены количеством команд с плавающей точкой, требуемых для вычислений. Так как данная реализация изобретения предполагает на одну операцию с плавающей точкой меньше, по сравнению с типовым методом, то максимальная производительность цикла увеличивается.The benefit in productivity increases when using this implementation of the invention in the cycles of software products. Many loops are limited by the number of floating point instructions required for calculations. Since this implementation of the invention assumes one less floating point operation than the typical method, the maximum cycle throughput is increased.

Дальнейшее обсуждение относится к выбору константы S в одном варианте осуществления изобретения. Для простоты предположим, что представление с плавающей точкой содержит b бит мантиссы (например, 64 бита), явный целый бит и b-1 бит дробной части. Поле показателя степени представления числа с плавающей точкой определяет положение двоичной точки внутри или вне значащих цифр. Таким образом целая часть нормализованного числа с плавающей точкой - это крайние справа биты мантиссы после применения операции обратной нормализации, которая сдвигает b-1 бит мантиссы вправо, округляет мантиссу и прибавляет b-1 к показателю степени. Мантисса содержит целые числа как последовательность b бит, являющуюся дополнением до 2. Младшие биты мантиссы, содержащие целую часть первоначального числа с плавающей точкой, могут быть получены с помощью прибавления к числу константы 1.10...000*2^b-1. Данная константа является одним из значений S, выбранных в одной из реализации настоящего изобретения.Further discussion relates to the selection of the constant S in one embodiment of the invention. For simplicity, suppose a floating-point representation contains b bits of the mantissa (for example, 64 bits), an explicit integer bit, and b-1 bits of the fractional part. The exponent field of the floating point representation determines the position of the binary point inside or outside the significant digits. Thus, the integer part of the normalized floating-point number is the rightmost mantissa bits after applying the reverse normalization operation, which shifts the b-1 bit of the mantissa to the right, rounds the mantissa, and adds b-1 to the exponent. The mantissa contains integers as a sequence of b bits, which is a complement to 2. The least significant mantissa bits containing the integer part of the original floating-point number can be obtained by adding the constant 1.10 ... 000 * 2 ^{b-1 to the number} . This constant is one of the values of S selected in one implementation of the present invention.

Результирующая мантисса содержит целое число в виде (b-2) бит, являющихся дополнением до 2. Бит, находящийся сразу левее b-2-x нулей в значащей части, используется для того, чтобы для отрицательных чисел убедиться, что результат не был повторно нормализован, тем самым, сдвигая целое число левее от требуемого положения в самом правом бите мантиссы. Если в последующих операциях с целыми числами используется менее b-2 бит, то команды для вычисления n_i, n_f и r в Таблице II эквивалентны соответствующим командам из Таблицы I.The resulting mantissa contains an integer in the form of (b-2) bits complementing 2. The bit immediately to the left of b-2-x zeros in the significant part is used to ensure that the result is not normalized for negative numbers , thereby shifting the integer to the left of the required position in the rightmost bit of the mantissa. If in subsequent operations with integers less than b-2 bits are used, then the instructions for calculating n _i , n _f and r in Table II are equivalent to the corresponding commands from Table I.

В одной реализации изобретения выбор числа S может быть обобщен, если требуемый результат должен быть равен m, где m=n*2^k. В этом случае показатель степени константы будет равен (b-k-1). В данной реализации выбор S полезен тогда, когда искомое целое число необходимо разделить на множества индексов для поиска в таблице с несколькими входами. Например, число n может быть разбито следующим образом n=n₀*2⁷+n₁*2⁴+n₂ для того, чтобы вычислить индексы для доступа в таблицы с 16 и 8 входами. Для данной реализации необходимо, чтобы число S было доступно тогда же, когда и константа А. В одной реализации изобретения константу S можно загружать из памяти или для процессоров, таких как Intel Itanium™, S можно легко вычислить с помощью следующих команд: 1) movI для 64-битового IEEE двоичного кода двойной точности, 2) setf.d для загрузки S в регистр для работы с числами с плавающей точкой.In one implementation of the invention, the choice of the number S can be generalized if the desired result should be equal to m, where m = n * 2 ^k . In this case, the exponent of the constant will be equal to (bk-1). In this implementation, the choice of S is useful when the desired integer must be divided into sets of indices for searching in a table with several inputs. For example, the number n can be broken down as follows n = n ₀ * 2 ⁷ + n ₁ * 2 ⁴ + n ₂ in order to calculate the indices for access to tables with 16 and 8 inputs. For this implementation, it is necessary that the number S be available at the same time that the constant A. In one implementation of the invention, the constant S can be loaded from memory or for processors such as Intel Itanium ™, S can be easily calculated using the following commands: 1) movI for 64-bit IEEE binary double precision, 2) setf.d for loading S into the register for working with floating-point numbers.

В одной реализации настоящего изобретения константа может иметь следующую форму: "1", за ней десятичная точка, j-1 бит (нули или единицы) сразу справа от десятичной точки, "1" за указанными j-1 битами, а затем b-j-1 бит нулей. Заметим, что реализация, описываемая раньше, содержала константу той же формы c j=1.In one implementation of the present invention, a constant may take the following form: “1” followed by a decimal point, j-1 bits (zeros or ones) immediately to the right of the decimal point, “1” after the indicated j-1 bits, and then bj-1 bit of zeros. Note that the implementation described earlier contained a constant of the same form with j = 1.

Последующее обсуждение относится к реализации данного изобретения, включающего порождение констант, необходимых для вычисления n_i, n_f и r. Требования алгоритмов математической библиотеки к точности обычно подразумевают, что умножение w=x*A выполняется для представлений с расширенной двойной точностью (64 битовое пространство мантиссы). Таким образом константа А при загрузке должна иметь представление с расширенной двойной точностью. Обычно это достигается следующим образом: константа сохраняется статично в памяти, а затем загружается в регистр для работы с числами с плавающей точкой (например, команда ldfe для процессора Intel Itanium™).The following discussion relates to the implementation of the present invention, including generating the constants necessary to calculate n _i , n _f and r. Accuracy requirements of mathematical library algorithms usually imply that w = x * A multiplication is performed for representations with extended double precision (64-bit mantissa space). Thus, the constant A at boot should have a representation with extended double precision. Usually this is achieved as follows: the constant is stored statically in memory and then loaded into the register for working with floating-point numbers (for example, the ldfe command for Intel Itanium ™ processor).

Из-за требования того, что библиотека должна располагаться независимо (то есть быть совместно используемой) загрузка выполняется с помощью косвенной загрузки. При данной косвенной загрузке сначала вычисляется адрес указателя на константу, затем загружается указатель на константу, а затем загружается константа. Для процессора, такого как Intel Itanium™, такая последовательность выполняется как минимум 13 тактов. Эта последовательность действий может потребовать более 13 тактов в случае, если указатель и константы не находятся в кэш-памяти.Due to the requirement that the library must be located independently (that is, be shared), loading is performed using indirect loading. With this indirect loading, the address of the pointer to the constant is first calculated, then the pointer to the constant is loaded, and then the constant is loaded. For a processor such as Intel Itanium ™, this sequence runs at least 13 clock cycles. This sequence of actions may require more than 13 clock cycles if the pointer and constants are not in the cache.

Для некоторых процессоров, таких как Intel Itanium™, не существует способа прямой загрузки константы расширенной двойной точности без использования команд памяти. Тем не менее, существует способ прямой загрузки мантиссы константы с плавающей точкой: сначала формируется 64-битная мантисса в регистре для работы с целыми числами, а затем используется команда (например, set.sig для процессора Intel Itanium™) для загрузки мантиссы в регистр с плавающей точкой. Подобная команда устанавливает значение показателя степени в 2⁶³. Для процессора, такого как Intel Itanium™, такая последовательность выполняется за 10 тактов. В одной реализации изобретения три такта можно сохранить, используя константу S, имеющую правильную мантиссу, но измененный показатель степени.For some processors, such as Intel Itanium ™, there is no way to directly load the extended double precision constant without using memory instructions. Nevertheless, there is a way to directly load the constant mantissa with a floating point: first, a 64-bit mantissa is formed in the register for working with integers, and then a command is used (for example, set.sig for the Intel Itanium ™ processor) to load the mantissa in the register with floating point. A similar team sets the exponent to 2 ⁶³ . For a processor such as Intel Itanium ™, this sequence runs in 10 cycles. In one implementation of the invention, three measures can be stored using a constant S having the correct mantissa but a modified exponent.

Фиг.4 иллюстрирует одну реализацию изобретения, используемую для обобщения выбора константы S при определении n_i, n_f и r. В процессе 400 в блоке 410 вычисляется результат х*А'+S' (где S' - это вариант S, который будет обсуждаться ниже). В блоке 420, с использованием результата блока 410, производится умножение результата блока 410 на Т (Т - это множитель, равный 2^-(b-1-j)), и из полученного результата вычитается S. В блоке 430 биты мантиссы извлекаются из результата блока 410, таким образом получается целое число. В блоке 440 вычисляется r, а именно вычисляется выражение x-n_f*B. В блоке 450 из результата блока 430 извлекаются младшие биты. В блоке 460 значение n_i доступно для передачи в АЛУ или для сохранения в памяти. В блоке 470 значение r доступно для передачи в АЛУ или для сохранения в памяти. В процессе 400 величина А равна 2^j*F, где F - это мантисса вида 1.xxxxxxxx, 1.0≤|F|<2.0. Также A'=2^b-1*F.Figure 4 illustrates one implementation of the invention used to generalize the choice of the constant S in determining n _i , n _f and r. In process 400, at block 410, the result x * A ′ + S ′ is calculated (where S ′ is the S variant, which will be discussed below). In block 420, using the result of block 410, the result of block 410 is multiplied by T (T is a factor of 2 ^{- (b-1-j)} ), and S is subtracted from the result obtained. In block 430, the mantissa bits are extracted from the result block 410, so an integer is obtained. In block 440, r is calculated, namely, the expression xn _f * B is calculated. In block 450, the least significant bits are extracted from the result of block 430. In block 460, the value of n _{i is} available for transmission to the ALU or for storage in memory. In block 470, the value of r is available for transmission to the ALU or for storage in memory. In process 400, the value of A is 2 ^{j *} F, where F is the mantissa of the form 1.xxxxxxxx, 1.0≤ | F | <2.0. Also A '= 2 ^b-1 * F.

Таблица III содержит псевдокоды шагов для процесса 400, показанного на фиг.4.Table III contains pseudo-codes of steps for the process 400 shown in FIG. 4.

Таблица IIITable III Палу опер. 1Palu operas. one w_плюс_S_сдвинутвправо = х*A'+S'w_plus_S_shifted right = x * A '+ S' (1)(one) Палу опер. 2Palu operas. 2 n_f=w_плюс_S_сдвинутвправо * Т-Sn _f = w_plus_S_shifted right * T-S (6)(6) Цалу опер. 1Tsalu operas. one n_i_плюс_S = извлечь_биты_мантиссы (w_плюс_S_сдвинутвправо)n _i _plus_S = extract_bit_mantissa (w_plus_S_shifted to the right) (9)(9) Палу опер. 3Palu operas. 3 r=х-n_f*Вr = x-n _f * B (11)(eleven) Цалу опер. 2Tsalu operas. 2 n_i = извлечь_младшие_биты (n_i_плюс_S)n _i = extract_small_bits (n _i _ plus_S) (11)(eleven) n_i доступноn _i available (12)(12) r доступноr available (16)(16)

В одной реализации изобретения для того, чтобы сдвиг происходил корректно, при выполнении команды Палу опер. 1 необходим вариант числа S - число S', где S'=S*2^b-1-j. При получении n_f во время выполнения Палу опер. 2 число w_плюс_S_сдвинутвправо масштабируют "обратно" с помощью множителя Т, где Т=2^-(b-1-j). В данной реализации изобретения генерируются четыре константы: A', S', S и Т. В одной реализации изобретения данные четыре константы задаются параллельно.In one implementation of the invention, in order for the shift to occur correctly, when executing the command of the Palais operas. 1, a variant of the number S is needed - the number S ', where S' = S * 2 ^b-1-j . Upon receipt of n _f during the execution of the Pal oper. 2, the number w_plus_S_shifted to the right is scaled “back” using the factor T, where T = 2 ^{- (b-1-j)} . In this implementation of the invention, four constants are generated: A ', S', S, and T. In one implementation of the invention, these four constants are defined in parallel.

Фиг.5 иллюстрирует типовой процесс 500 загрузки констант и вычисления коэффициентов для разложения чисел с плавающей точкой на целую и дробную части. На обычном процессоре, таком как Intel Itanium™, вся последовательность действий от загрузки констант для вычисления r требует 36 тактов. Процесс 500 начинается в блоке 510, в котором вычисляется адрес указателя на А и В. В блоке 520 загружаются адреса указателя на А и В. В блоке 530 загружаются А и В. В блоке 540 вычисляется значение w=x*A. В блоке 550 результат из блока 540 (число w) преобразуется в ненормализованное целое число. В блоке 560 результат блока 550 нормализуется как целое число и получается n_f. В блоке 570, преобразуя значение из блока 550 в целое число, вычисляется число n_i. В блоке 580 полученное значение n_i доступно для передачи в АЛУ или сохранения в памяти. В блоке 590 вычисляется число r: из числа х вычитается значение n_f * В. В блоке 595 значение r доступно для передачи в АЛУ или сохранения в памяти.5 illustrates a typical process 500 for loading constants and calculating coefficients for decomposing floating-point numbers into integer and fractional parts. On a regular processor such as Intel Itanium ™, the whole sequence of steps from loading constants to calculate r requires 36 cycles. The process 500 begins at block 510, in which the address of the pointer to A and B is calculated. At block 520, the addresses of the pointer to A and B are loaded. At block 530, A and B are loaded. At block 540, the value w = x * A is calculated. At block 550, the result from block 540 (the number w) is converted to a non-normalized integer. At block 560, the result of block 550 is normalized to an integer and n _{f is} obtained. At block 570, converting the value from block 550 to an integer, the number n _{i is} calculated. In block 580, the obtained value of n _{i is} available for transmission to the ALU or stored in memory. In block 590, the number r is calculated: the value n _f * B is subtracted from the number x. In block 595, the value of r is available for transmission to the ALU or for storage in memory.

Таблица IV иллюстрирует процесс 500 в командах псевдокода. Числа в правой части Таблицы IV представляют собой обычные такты для процессора, такого как Intel Itanium™.Table IV illustrates the process 500 in pseudo-code instructions. The numbers on the right side of Table IV are typical clock cycles for a processor such as Intel Itanium ™.

Фиг.6А-Б показывает некоторую реализацию изобретения для загрузки констант и осуществления разложения чисел с плавающей точкой на целую и дробную части. Процесс 600 начинается с блока 605, в котором в регистре для работы с целыми числами формируется двоичный код S'. В блоке 610 в регистре для работы с целыми числами формируется двоичный код мантиссы А. В блоке 615 в регистре с плавающей точкой генерируется S'. В блоке 620 в регистре с плавающей точкой генерируется А'. В блоке 625 в регистре для работы с целыми числами формируется двоичный код для S. В блоке 630 в регистре для работы с целыми числами формируется двоичный код для Т. В блоке 635 вычисляется адрес указателя на В. В блоке 640 в регистре с плавающей точкой генерируется S. В блоке 645 в регистре с плавающей точкой генерируется Т. В блоке 650 загружается адрес указателя на В. В блоке 655 загружается В. В блоке 660 вычисляется x*A'+S'. В блоке 665 производится умножение результата блока 660 на Т и из полученного значения вычитается S. Результатом операций, проводимых в блоке 665, является n_f. В блоке 670 биты мантиссы извлекаются из результата блока 660, таким образом получается целое число. В блоке 675 вычисляется r, а именно: вычисляется выражение х-n_f*В. В блоке 680 из результата блока 670 извлекаются младшие биты. Результатом операций, проводимых в блоке 680, является n_i. В блоке 685 значение n_iстановится доступным для передачи в АЛУ или для сохранения в памяти. В блоке 690 значение r становится доступным для передачи в АЛУ или для сохранения в памяти.6A-B show some embodiment of the invention for loading constants and decomposing floating-point numbers into integer and fractional parts. Process 600 begins with block 605, in which a binary code S 'is generated in the register for working with integers. In block 610, the binary code of mantissa A is generated in the register for working with integers. In block 615, S 'is generated in the floating-point register. At block 620, A ′ is generated in the floating point register. In block 625, a binary code for S is generated in the register for working with integers. In block 630, a binary code for T is generated in the register for working with integers. In block 635, the address of the pointer to B. is calculated. In block 640, a floating-point register is generated S. In block 645, a T. is generated in the floating-point register. In block 650, the address of the pointer to B is loaded. In block 655, B. is loaded. In block 660, x * A '+ S' is calculated. In block 665, the result of block 660 is multiplied by T and S is subtracted from the obtained value. The result of operations carried out in block 665 is n _f . At block 670, the mantissa bits are extracted from the result of block 660, thereby obtaining an integer. In block 675, r is calculated, namely: the expression x-n _f * B is calculated. At block 680, the least significant bits are extracted from the result of block 670. The result of operations carried out in block 680 is n _i . At block 685, the value of n _i becomes available for transmission to the ALU or for storage in memory. In block 690, the value of r becomes available for transmission to the ALU or for storage in memory.

Таблица V содержит псевдокоды шагов для процесса 600 (смотри фиг.6А-Б). Заметим, что числа в правой части Таблицы V, которые заключены в скобки, представляют такты процессора, такого как Intel Itanium™. В одной реализации настоящего изобретения процесс 300 и процесс 600 загружаются в математические библиотеки, которыми пользуются различные компиляторы. В другой реализации изобретения те же процессы, загруженные в математическую библиотеку, могут использоваться для вычисления функций, таких как скалярный с двойной точностью тангенс, синус, косинус, экспоненциальные функции, гиперболический косинус, гиперболический синус, гиперболический тангенс и так далее. Использование данной реализации позволяет уменьшить число тактов, необходимых для выполнения операций по сравнению с методами, существующими в настоящий момент. Необходимо заметить, что другие реализации данного изобретения могут использоваться для обработки функций, таких как скалярные функции одинарной точности, векторные функции двойной точности и векторные функции одинарной точности.Table V contains pseudo-codes of steps for process 600 (see FIGS. 6A-B). Note that the numbers on the right side of Table V, which are enclosed in brackets, represent clock cycles of a processor such as Intel Itanium ™. In one implementation of the present invention, process 300 and process 600 are loaded into math libraries used by various compilers. In another implementation of the invention, the same processes loaded into the mathematical library can be used to calculate functions such as double-scalar tangent, sine, cosine, exponential functions, hyperbolic cosine, hyperbolic sine, hyperbolic tangent and so on. Using this implementation allows you to reduce the number of clock cycles required to perform operations compared with the methods that currently exist. It should be noted that other implementations of the present invention can be used to process functions, such as scalar functions of single precision, vector functions of double precision and vector functions of single precision.

Таблица VTable v Цалу опер. 1Tsalu operas. one Сформировать двоичный код S' в регистре для целых чисел (movl)Generate binary code S 'in register for integers (movl) (1)(one) Цалу опер. 2Tsalu operas. 2 Сформировать двоичный код мантиссы А в регистре для целых чисел (movl)Generate binary code of mantissa A in register for integers (movl) (1)(one) Цалу опер. 3Tsalu operas. 3 Сгенерировать S' в регистре с плавающей точкой (setf.d)Generate S 'in a floating point register (setf.d) (2)(2) Цалу опер. 4Tsalu operas. four Сгенерировать А' в регистре с плавающей точкой (setf.d)Generate A 'in a floating point register (setf.d) (2)(2) Цалу опер. 5Tsalu operas. 5 Сформировать двоичный код S в регистре для целых чисел (movl)Generate binary S in register for integers (movl) (2)(2) Цалу опер. 6Tsalu operas. 6 Сформировать двоичный код Т в регистре для целых чисел (movl)Generate binary T in register for integers (movl) (2)(2) Цалу опер. 7Tsalu operas. 7 Вычислить адрес указателя на ВCalculate the address of a pointer to B (3)(3) Цалу опер. 8Tsalu operas. 8 Сгенерировать S в регистре с плавающей точкой (setf.d)Generate S in a floating point register (setf.d) (4)(four) Цалу опер. 9Tsalu operas. 9 Сгенерировать Т в регистре с плавающей точкой (setf.d)Generate T in a floating point register (setf.d) (4)(four) Цалу опер. 10Tsalu operas. 10 Загрузить адрес указателя на ВDownload pointer address to B (5)(5) Цалу опер. 11Tsalu operas. eleven Загрузить ВDownload To (8)(8) Палу опер. 1Palu operas. one w_плюс_S_сдвинутвправо = х*A'+S'w_plus_S_shifted right = x * A '+ S' (11)(eleven) Палу опер. 2Palu operas. 2 n_f = w_плюс_S_сдвинутвправо * Т-Sn _f = w_plus_S_shifted right * T-S (16)(16) Цалу опер. 12Tsalu operas. 12 n_i_плюс_S = извлечь_биты_мантиссы (w_плюс_S_сдвинутвправо)n _i _plus_S = extract_bit_mantissa (w_plus_S_shifted to the right) (19)(19) Палу опер. 3Palu operas. 3 r=х-n_f*Вr = x-n _f * V (21)(21) Цалу опер. 13Tsalu operas. 13 n_i = извлечь_младшие_биты (n_i_плюс_S)n _i = extract_small_bits (n _i _ plus_S) (21)(21) n_i доступноn _i available (22)(22) r доступноr available (26)(26)

Фиг.7 показывает некоторую реализацию настоящего изобретения, имеющую вычислительную компоненту 710. Схема 700 также содержит микропроцессор 720, кэш-память 730, память 740, дисковое запоминающее устройство 750, очередь предварительной выборки 755, декодер/присваивание/экстраполятор 760, магистраль для целых чисел А 770, магистраль для целых чисел В 775, магистраль для чисел с плавающей точкой А 780, АЛУ 781-782, АЛУ для плавающей точки 783, наборы регистров для целых чисел 785-786, набор регистров для плавающей точки 787 и шину данных 790. В одной реализации настоящего изобретения вычислительная компонента 710 включает в себя процессы 300, 400 или 600, проиллюстрированные на фиг.3, 4 и 6А-Б соответственно.7 shows some implementation of the present invention having a computing component 710. Circuit 700 also includes a microprocessor 720, cache 730, memory 740, disk storage 750, prefetch queue 755, decoder / assignment / extrapolator 760, integer trunk A 770, highway for integers B 775, highway for floating-point numbers A 780, ALU 781-782, ALU for floating point 783, sets of registers for integers 785-786, set of registers for floating point 787 and data bus 790. In one implementation of the present of the invention, computing component 710 includes processes 300, 400, or 600 illustrated in FIGS. 3, 4, and 6A-B, respectively.

Упомянутые выше реализации изобретения могут быть использованы всякий раз, когда целые и дробные компоненты чисел с плавающей точкой необходимы для выполнения редукции аргументов скалярных и векторных функций с двойной точностью, скалярных и векторных функций с одинарной точностью, различных математических функций и предварительной обработки перед вычислением математических функций. При использовании описанных выше вариантов осуществления изобретения уменьшается время вычислений без отрицательного влияния на точность результата.The above implementations of the invention can be used whenever integer and fractional components of floating-point numbers are necessary to perform the reduction of the arguments of scalar and vector functions with double precision, scalar and vector functions with single precision, various mathematical functions, and preprocessing before calculating mathematical functions . Using the above-described embodiments of the invention, computation time is reduced without adversely affecting the accuracy of the result.

Указанные выше реализации также могут храниться на устройстве или машинно-считываемом носителе и считываться машиной для выполнения команд. Машинно-считываемые носители включают в себя любые механизмы, которые содержат (то есть хранят и/или передают) информацию, которая может быть считана машиной (например, компьютером). Например, машинно-считываемый носитель может быть постоянным запоминающим устройством (ROM); оперативным запоминающим устройством или памятью с произвольным доступом (RAM); накопителем на магнитных дисках; оптическим диском; флэш-памятью; электрическим, оптическим, звуковым сигналом или любой другой формой распространяемого сигнала (например, несущие волны, инфракрасные сигналы, цифровые сигнала и так далее). Устройство или машинно-считываемый носитель может быть полупроводниковым устройством памяти и/или вращающимся магнитным или оптическим диском. Устройство или машинно-считываемый носитель может быть распределенным, когда части команд разделены между различными машинами, например между соединенными в сеть компьютерами.The above implementations can also be stored on a device or machine-readable medium and read by a machine to execute commands. Machine-readable media include any mechanisms that contain (i.e. store and / or transmit) information that can be read by a machine (e.g., computer). For example, a machine-readable medium may be read-only memory (ROM); random access memory or random access memory (RAM); magnetic disk drive; optical disk; flash memory; an electrical, optical, audible signal, or any other form of propagated signal (for example, carrier waves, infrared signals, digital signals, and so on). The device or machine-readable medium may be a semiconductor memory device and / or a rotating magnetic or optical disk. A device or machine-readable medium can be distributed when parts of the commands are divided between different machines, for example, between networked computers.

Несмотря на то, что некоторые типичные реализации описаны и показаны на сопровождающих чертежах, необходимо понять, что такие реализации являются лишь иллюстрациями и не ограничивают рамки изобретения, и что данное изобретение не ограничивается конкретными структурами и конструкциями, здесь показанными и описанными, так как специалист в соответствующей области может предложить множество других модификаций.Although some typical implementations are described and shown in the accompanying drawings, it must be understood that such implementations are merely illustrations and do not limit the scope of the invention, and that the invention is not limited to the specific structures and structures shown and described herein, as one skilled in the art many other modifications can be offered to the corresponding area.

Claims

1. A method of extracting integer and fractional components from a floating point value, comprising decomposing the floating point value into a plurality of parts, comprising shifting the rounded integer part of the floating point value to generate a second value; generating a floating point value from the second value by subtracting the first constant from the second value; extracting a plurality of mantissa bits from a second value; generating a residual value from a floating point value; extracting a portion of the bits from the plurality of mantissa bits to generate an integer component, where the floating point value, residual value, and integer component are stored in memory and transferred to an arithmetic logic unit (ALU).

2. The method according to claim 1, characterized in that the floating point values, the first constant and the second value are presented in floating point format.

3. The method according to claim 1, characterized in that the rounded integer part is shifted to the rightmost bits of the mantissa of the first value.

4. The method according to claim 1, characterized in that the shift of the rounded integer part contains such a choice of a floating point value that adding the first constant to the floating point value shifts the rounded integer part.

5. A method for decomposing a floating point value into integer and fractional components, including decomposing a floating point value into a plurality of parts, comprising generating a first constant; scaling the first constant with a multiplier to get the second constant; generating a second value by shifting the rounded integer part of the floating-point value; extracting a plurality of mantissa bits from the second value to generate an integer value; extracting a portion of bits from an integer value to generate an integer component.

6. The method according to claim 5, characterized in that the rounded integer part is shifted to the rightmost bits of the mantissa of the floating point value.

7. The method according to claim 5, characterized in that the first constant, the second constant and the floating point value and the second value are floating point numbers.

8. The method according to claim 5, characterized in that the shift of the rounded integer part includes adding a second constant to the floating point value.

9. A method for extracting integer and fractional components from a floating point value, which includes generating the first constant; scaling the first constant to get the second constant; formation of the binary code of the first, second, third and fourth constants in separate registers for processing integers; determination of the address of the pointer to the fifth constant; loading the address of the pointer to the fifth constant in memory; generating a second value by shifting the rounded integer part of the floating-point value; extracting a plurality of mantissa bits from the second value to generate an integer value; extracting a portion of bits from an integer value to generate an integer component.

10. The method according to claim 9, characterized in that the rounded integer part is shifted to the rightmost bits of the mantissa of the first value.

11. The method according to claim 9, characterized in that it further comprises creating a representation of the first constant in the first floating-point register; creating a representation of the second constant in the second floating-point register; creating a representation of a scaled third constant in a third floating-point register; creating a fourth constant representation in the fourth floating-point register.

12. The method according to claim 9, characterized in that the first, second, third and fourth constants and the first and second values are floating point numbers.

13. Machine-readable medium with instructions, the execution of which on the machine causes the machine to perform operations, including decomposition of the floating point value into many parts, instructions, the execution of which on the machine leads to the fact that the machine performs operations containing generating the first constant; shifting the rounded integer part of the floating point value to generate a second value; generating a floating point value from the second value by subtracting the first constant from the second value; extracting a plurality of mantissa bits from a second value; generating a residual value from a floating point value; extracting a portion of the bits from the plurality of mantissa bits to generate an integer component, where the floating point value, residual value, and integer component are stored in memory and transferred to an arithmetic logic unit (ALU).

14. Machine-readable medium according to item 13, wherein the rounded integer part is shifted to the rightmost bits of the mantissa floating-point values.

15. Machine-readable medium according to item 13, wherein the floating point value, the first constant and the second value are floating point numbers.

16. A machine-readable medium with instructions, the execution of which on the machine causes the machine to perform operations, including decomposition of the floating point value into many parts, commands, the execution of which on the machine leads to the fact that the machine performs operations containing generating the first constant; scaling the first constant to get the second constant; generating a second value by shifting the rounded integer part of the floating-point value; extracting a plurality of mantissa bits from the second value to generate an integer value; extracting a portion of bits from an integer value to generate an integer component.

17. The machine-readable medium of claim 16, wherein the rounded integer part is shifted to the rightmost bits of the mantissa of the floating point value.

18. Machine-readable medium according to claim 17, wherein the first and second constants, the floating point value and the second value are floating point numbers.

19. Machine-readable medium with instructions whose execution on the machine causes the machine to perform operations including generating a first constant; generating a second constant by scaling the first constant; formation of the binary code of the first, second, third and fourth constants in separate registers for processing integers; determination of the address of the pointer to the fifth constant; loading the address of the pointer to the fifth constant in memory; generating a second value by shifting the rounded integer part of the floating-point value; generating a floating point value from the second value by subtracting the constant; extracting a plurality of mantissa bits from the second value to generate an integer value; extracting part of the bits from an integer value to obtain an integer component; generating a residual value from a floating point value; storage of residual value, integer component and floating point value in memory.

20. Machine-readable medium according to claim 19, characterized in that the rounded integer part is shifted to the rightmost bits of the mantissa of the floating point value.

21. Machine-readable medium according to claim 19, characterized in that it further comprises instructions whose execution on the machine causes the machine to perform operations including creating a representation of the first constant in the first floating-point register; creating a representation of the second constant in the second floating-point register; creating a representation of a scaled third constant in a third floating-point register; creating a fourth constant representation in the fourth floating-point register.

22. Machine-readable medium according to claim 19, characterized in that the first, second, third and fourth constants, the first, second and third values are floating point numbers.

23. A computing device comprising a processor having a computing component; a bus connected to the processor; memory connected to the processor; many arithmetic logic devices (ALU) connected to the processor; a plurality of sets of registers connected to a plurality of ALUs, where the computational component generates the first constant, generates the second value by shifting the rounded integer part of the floating point value, generates a floating point value by subtracting the first constant from the floating point value, extracts the many mantissa bits from the second value to generate an integer value, generates a residual value from a floating-point value, extracts part of the bits from an integer value in order to obtain an integer ponents, stores the residual value, the integer component, and the floating-point value in memory.

24. The device according to item 23, wherein the rounded integer part is shifted to the rightmost bits of the mantissa floating-point values.

25. The device according to item 23, wherein the floating point value, the first and second constants and the second value are floating point numbers.