JP4359258B2

JP4359258B2 - Arithmetic apparatus and arithmetic method

Info

Publication number: JP4359258B2
Application number: JP2005126326A
Authority: JP
Inventors: 秀和森田; 信行浜口; 朗都倉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-04-25
Filing date: 2005-04-25
Publication date: 2009-11-04
Anticipated expiration: 2025-04-25
Also published as: JP2006302180A

Description

本発明は、浮動小数点を用いた演算を行う演算装置および演算方法に関する。 The present invention relates to a calculation device and a calculation method for performing a calculation using a floating point.

一般に、コンピュータなどの演算装置において、乗算、加算などの演算を行う場合、二進数で表現された数値について、所定の桁の切り捨てあるいは切り上げを行いながら、演算を実行することになる。 In general, when performing arithmetic operations such as multiplication and addition in an arithmetic device such as a computer, arithmetic operations are performed while rounding down or rounding up predetermined digits for numerical values expressed in binary numbers.

たとえば、ＩＥＥＥ（Institute of Electrical and Electronic Engineers）７５４では、単精度、倍精度という二進浮動小数点形式を規定している。単精度とは、１つの数値を３２ビット（符号部１ビット・指数部８ビット・仮数部２３ビット）で表現するものであり、倍精度とは、１つの数値を６４ビット（符号部１ビット・指数部１１ビット・仮数部５２ビット）で表現するものである。
そして、これら単精度および倍精度の拡張形式が演算に用いられる場合もある。 For example, IEEE (Institute of Electrical and Electronic Engineers) 754 defines binary floating point formats of single precision and double precision. Single precision expresses one numerical value with 32 bits (sign part 1 bit, exponent part 8 bits, mantissa part 23 bits), and double precision means one numerical value with 64 bits (sign part 1 bit) (Expression part 11 bits, mantissa part 52 bits)
In some cases, these single-precision and double-precision extended formats are used for computation.

このように、ＩＥＥＥ７５４では浮動小数点形式が規定されているが、これらの浮動小数点数を用いたコンピュータにおける内積演算では、丸め、情報落ち及び桁落ちの誤差が生じるため、正確な内積演算結果が得られない、という問題がある。
そこで、内積演算の精度を改善するものとして、いくつかの装置が提案されている。 As described above, IEEE 754 defines a floating-point format. However, an inner product operation in a computer using these floating-point numbers causes errors in rounding, information loss, and digit loss, so that an accurate inner product operation result can be obtained. There is a problem that it is not possible.
In view of this, several devices have been proposed to improve the accuracy of the inner product calculation.

たとえば、特許文献１では、入力ベクトルxに対し、全てのベクトル成分からスカラー数βを減じた後、スカラー数α倍したα(x-βI)なる変換を施してから内積演算を行う装置が開示されている。 For example, Patent Document 1 discloses an apparatus that performs an inner product operation on an input vector x after subtracting the scalar number β from all vector components and then performing a transformation α (x−βI) multiplied by the scalar number α. Has been.

また、特許文献２では、複数の入力データに対応するビットスライスに基づいて、ルックアップテーブルから該ビットスライスの数値配列に対応する部分内積を検出して出力し、該部分内積を初期値あるいは中間累積値に対して加算して加算出力値を出力し、該加算出力値の下位側切捨てビットの数値を保持し、上記加算処理を入力データの桁数分に対応するサイクルを繰り返して得られる最終累算結果の値に最終加算サイクルから所定サイクル前までの上記切捨てビットの数値を付加することにより、演算精度の劣化を改善する演算装置が開示されている。
特開平１１−９６１４０号公報（段落００１４〜段落００３１、図３）特開２０００−１３２５３９号公報（段落００６６〜段落００７３、図１） In Patent Document 2, based on bit slices corresponding to a plurality of input data, a partial inner product corresponding to the numerical array of the bit slice is detected and output from a lookup table, and the partial inner product is set to an initial value or an intermediate value. The final value obtained by adding the accumulated value and outputting the added output value, holding the numerical value of the lower-order truncation bit of the added output value, and repeating the above addition process for the number of digits of the input data There is disclosed an arithmetic device that improves the deterioration of the arithmetic accuracy by adding the value of the above-mentioned truncation bit from the last addition cycle to a predetermined cycle before the accumulated result value.
JP-A-11-96140 (paragraphs 0014 to 0031, FIG. 3) JP 2000-132539 A (paragraphs 0066 to 0073, FIG. 1)

しかしながら、特許文献１、特許文献２ではいずれも、ハードウェアによる内積演算の精度改善方式が採用されていて、ハードウェアの設計段階で桁数に制限が設けられるため、ＩＥＥＥ７５４に準拠している任意の浮動小数点を用いた内積演算において、丸め、情報落ち及び桁落ちの誤差が少なからず生じることがあり、正確な演算結果が得られないという問題点があった。
そこで、本発明は、前記問題点に鑑みてなされたものであり、より正確な浮動小数点を用いた演算を行うことを目的とする。 However, in both Patent Document 1 and Patent Document 2, since the accuracy improvement method of the inner product calculation by hardware is adopted and the number of digits is limited at the hardware design stage, any one conforming to IEEE 754 is arbitrary. In the inner product calculation using the floating point, there is a problem that errors of rounding, information loss and digit loss may occur, and an accurate calculation result cannot be obtained.
Therefore, the present invention has been made in view of the above problems, and an object thereof is to perform a calculation using a more accurate floating point.

前記課題を解決するために、乗算対象である第１の数と第２の数との積の演算を、二進浮動小数点を用いて行う演算装置であって、第１の数、第２の数、および、演算によって発生した数を、所定の桁数より下位の桁のビットを切り捨てることによってあるいは切り上げることによって所定の桁数内の数として記憶する記憶部と、記憶部に記憶された第１の数を、所定の上位数桁である第１の上位数と、残りの下位数桁である第１の下位数と、に分離し、記憶部に記憶された第２の数を、所定の上位数桁である第２の上位数と、残りの下位数桁である第２の下位数と、に分離し、第１の上位数と第２の上位数とを乗算して第１の乗算数とし、第１の上位数と第２の下位数とを乗算して第２の乗算数とし、第１の下位数と第２の上位数とを乗算して第３の乗算数とし、第１の下位数と第２の下位数とを乗算して第４の乗算数とし、第１の乗算数、第２の乗算数、第３の乗算数および第４の乗算数を任意の順番で３回の加算を行うことでその総和を算出するとき、各回における和とは別に各回の誤差それぞれを記憶部に記憶させ、各回の誤差すべてを加算して総誤差を算出し、３回の加算後の和に対して総誤差を加算することで総和を算出するとともに、そのときの誤差である総和誤差も算出する処理部と、を有することを特徴とする。その他の手段については後記する。 In order to solve the above-described problem, an arithmetic unit that performs an operation of a product of a first number and a second number to be multiplied using a binary floating point, the first number, the second number A storage unit that stores the number and the number generated by the operation as a number within a predetermined number of digits by rounding down or rounding up the bits of the digits lower than the predetermined number of digits, and a second number stored in the storage unit The number of 1 is separated into a first upper number that is a predetermined upper number digit and a first lower number that is a remaining lower number digit, and the second number stored in the storage unit is determined as a predetermined number Is divided into a second upper number that is the upper number digits of the second number and a second lower number that is the remaining lower number digits, and the first higher number and the second upper number are multiplied by the first higher number. Multiply the first higher number and the second lower number to obtain the second multiplication number, and multiply the first lower number and the second upper number. The third multiplication number, the first lower number and the second lower number are multiplied to obtain the fourth multiplication number, the first multiplication number, the second multiplication number, the third multiplication number, and the second multiplication number. When calculating the sum by multiplying the number of multiplications of 4 three times in an arbitrary order, the error of each time is stored in the storage unit separately from the sum at each time, and the total error is added to each time. And a processing unit that calculates an error and calculates a total sum by adding the total error to the sum after three additions, and also calculates a total error that is an error at that time. . Other means will be described later.

本発明によれば、より正確な浮動小数点を用いた演算を行うことができる。 According to the present invention, more accurate calculation using a floating point can be performed.

以下、本発明に係るデータ処理装置（演算装置）について、適宜図面を参照しながら説明する。 Hereinafter, a data processing apparatus (arithmetic apparatus) according to the present invention will be described with reference to the drawings as appropriate.

まず、図１を参照しながら、データ処理装置の構成について説明する。図１は、データ処理装置の構成図である。
データ処理装置１は、補助記憶部１０１、メモリ（記憶部）１０２、処理部としてのＣＰＵ（Central Processing Unit）１０３、入力部１０４、出力部１０５および入出力インタフェース１０６を備えて構成される。 First, the configuration of the data processing apparatus will be described with reference to FIG. FIG. 1 is a configuration diagram of a data processing apparatus.
The data processing apparatus 1 includes an auxiliary storage unit 101, a memory (storage unit) 102, a CPU (Central Processing Unit) 103 as a processing unit, an input unit 104, an output unit 105, and an input / output interface 106.

補助記憶部１０１は、各種情報を記憶するものであり、たとえば、ハードディスク、ＲＯＭ（Read Only Memory）などにより実現される。補助記憶部１０１は、ここでは、プログラム１１１、コンパイラ１１２および実行モジュール１１３を記憶している。 The auxiliary storage unit 101 stores various types of information, and is realized by, for example, a hard disk, a ROM (Read Only Memory), or the like. Here, the auxiliary storage unit 101 stores a program 111, a compiler 112, and an execution module 113.

プログラム１１１は、図２で示すフローチャート上の処理が記述されているプログラム（以下、ソースプログラムという）である。
コンパイラ１１２は、プログラム１１１をコンパイル・リンクするものである。
実行モジュール１１３は、コンパイラ１１２によってコンパイル・リンクされたモジュールである。 The program 111 is a program (hereinafter referred to as a source program) in which processing on the flowchart shown in FIG. 2 is described.
The compiler 112 compiles and links the program 111.
The execution module 113 is a module compiled and linked by the compiler 112.

メモリ１０２は、各種情報を記憶する一時記憶手段であり、たとえば、ＲＡＭ（Random Access Memory）により実現される。
ＣＰＵ１０３は、各種演算処理を行うものであり、メモリ１０２、入出力インタフェース１０６を介して実行モジュール１１３を実行する役割を果たす。
入力部１０４は、各種実行コマンドなどを入力する入力装置である。
出力部１０５は、各種実行結果などを出力する出力装置である。
入出力インタフェース１０６は、図１に示すように、各構成間の入出力のインタフェースの役割を果たすものである。 The memory 102 is temporary storage means for storing various information, and is realized by, for example, a RAM (Random Access Memory).
The CPU 103 performs various arithmetic processes and plays a role of executing the execution module 113 via the memory 102 and the input / output interface 106.
The input unit 104 is an input device for inputting various execution commands.
The output unit 105 is an output device that outputs various execution results.
As shown in FIG. 1, the input / output interface 106 serves as an input / output interface between the components.

そして、前記のように構成されるデータ処理装置１は、次のように動作する。
まず、操作者により入力部１０４から入力されたコンパイルコマンドは、入出力インタフェース１０６を介して、メモリ１０２にストアされる。メモリ１０２では、補助記憶部１０１のプログラム１１１が、コンパイラ１１２によってコンパイル・リンクされ、機械語コードである実行モジュール１１３が生成される。 The data processing apparatus 1 configured as described above operates as follows.
First, a compile command input from the input unit 104 by the operator is stored in the memory 102 via the input / output interface 106. In the memory 102, the program 111 in the auxiliary storage unit 101 is compiled and linked by the compiler 112, and an execution module 113 that is a machine language code is generated.

次に、操作者により入力部１０４から実行コマンドが入力されると、ＣＰＵ１０３がメモリ１０２に実行モジュール１１３をロードする。実行モジュール１１３がメモリ１０２にロードされると、ＣＰＵ１０３によって、図２に示すフローチャート上の各処理Ｓ２０１〜Ｓ２０８がメモリ１０２からＣＰＵ１０３に逐次呼び出され、各処理が実行された後、その実行結果がメモリ１０２にストアされる。
メモリ１０２にストアされた実行結果は、ＣＰＵ１０３によって、入出力インターフェイス１０６を介して、出力部１０５に出力される。 Next, when an execution command is input from the input unit 104 by the operator, the CPU 103 loads the execution module 113 into the memory 102. When the execution module 113 is loaded into the memory 102, each process S201 to S208 in the flowchart shown in FIG. 2 is sequentially called from the memory 102 to the CPU 103 by the CPU 103, and each process is executed. 102.
The execution result stored in the memory 102 is output to the output unit 105 by the CPU 103 via the input / output interface 106.

続いて、図２を参照しながら、データ処理装置の処理について説明する（適宜図１参照）。図２は、データ処理装置の全体的な処理を示したフローチャートである。ここでは、i＝Ｎ次元での内積演算のベクトルＡ（i）、Ｂ（i）の精度改善を行う場合を例にして説明する。 Subsequently, the processing of the data processing apparatus will be described with reference to FIG. 2 (see FIG. 1 as appropriate). FIG. 2 is a flowchart showing the overall processing of the data processing apparatus. Here, the case where the accuracy of the vectors A (i) and B (i) of the inner product calculation in i = N dimensions is improved will be described as an example.

まず、入力処理Ｓ２０１が、メモリ１０２からＣＰＵ１０３に呼び出される。この入力処理Ｓ２０１では、図３に示すように、ＣＰＵ１０３が、Ｓ３０１からＳ３０３までの処理を繰り返し、操作者により入力部１０４から入力されたi＝Ｎ次元のデータ、すなわち、Ａ（i）、Ｂ（i）を、メモリ１０２にロードさせる。 First, the input process S201 is called from the memory 102 to the CPU 103. In this input process S201, as shown in FIG. 3, the CPU 103 repeats the processes from S301 to S303, and i = N-dimensional data input from the input unit 104 by the operator, that is, A (i), B (I) is loaded into the memory 102.

図２に戻って、次に、初期化処理Ｓ２０２がメモリ１０２からＣＰＵ１０３に呼び出される。この初期化処理Ｓ２０２では、図４に示すように、Ｓ４０１において、ＣＰＵ１０３が、変数であるＳＵＭおよびＤＳＵＭの領域をメモリ１０２上に確保し、ＳＵＭおよびＤＳＵＭの値をそれぞれ０に初期化する。 Returning to FIG. 2, next, the initialization process S <b> 202 is called from the memory 102 to the CPU 103. In this initialization process S202, as shown in FIG. 4, in S401, the CPU 103 secures the SUM and DSUM areas as variables on the memory 102, and initializes the SUM and DSUM values to 0, respectively.

ＳＵＭは、最終的な誤差の補正がされていない和を代入する変数であり、ＤＳＵＭは、誤差の補正部分を代入する変数である。
このようなＳＵＭ、ＤＳＵＭを用いることにより、データ処理装置１において数値の乗算や加算などの演算を行う場合、次のような効果がある。すなわち、メモリ１０２やＣＰＵ１０３が記憶（保持）できる桁数が決まっているので、それによる数値のずれを誤差として記憶し、その後の演算においてもその誤差を使用することで、演算全体の誤差を減らしたり、なくしたりすることが可能となる。 SUM is a variable for substituting the sum that has not been corrected for the final error, and DSUM is a variable for substituting the error correction part.
By using such SUM and DSUM, when the data processing apparatus 1 performs operations such as multiplication and addition of numerical values, the following effects are obtained. In other words, since the number of digits that can be stored (held) by the memory 102 or the CPU 103 is determined, the deviation of the numerical value is stored as an error, and the error is used in subsequent calculations, thereby reducing the error of the entire calculation. Or can be eliminated.

次に、図２の処理Ｓ２０３〜Ｓ２０６において、ベクトルのＮ次元数分の繰返し処理、すなわち、内積の演算処理を行う。たとえば、２次元の場合はＮ＝２とし、その場合、Ａ（i）、Ｂ（i）は、それぞれＡ(Ａ（１）,Ａ（２）)、Ｂ(Ｂ（１）,Ｂ（２）)と表現することとする。 Next, in the processes S203 to S206 in FIG. 2, an iterative process for the N-dimensional number of vectors, that is, an inner product calculation process is performed. For example, in the two-dimensional case, N = 2, and in this case, A (i) and B (i) are A (A (1), A (2)) and B (B (1), B (2), respectively. )).

具体的には、まず、ＣＰＵ１０３は、i＝ＮのときのＡ（ｉ）＊Ｂ（i）における誤差を補正する（Ｓ２０４）。次に、ＣＰＵ１０３は、i＝ＮのときのΣＡ（ｉ）＊Ｂ（i）における誤差を補正する（Ｓ２０５）。 Specifically, first, the CPU 103 corrects an error in A (i) * B (i) when i = N (S204). Next, the CPU 103 corrects an error in ΣA (i) * B (i) when i = N (S205).

図５は、図２に示したＳ２０４の詳細図である。まず、Ｓ５０１において、ＣＰＵ１０３は、変数Ａ（i）、Ａ１（ｉ）、Ａ２（ｉ）、Ｂ（ｉ）、Ｂ１（ｉ）およびＢ２（ｉ）の領域を、メモリ１０２上に確保する。
そして、ＣＰＵ１０３は、Ａ（ｉ）を、１のビットの最上位桁と最下位桁が分かれるようにＡ１（ｉ）とＡ２（ｉ）に分離する。 FIG. 5 is a detailed view of S204 shown in FIG. First, in step S <b> 501, the CPU 103 secures areas for variables A (i), A1 (i), A2 (i), B (i), B1 (i), and B2 (i) on the memory 102.
Then, the CPU 103 separates A (i) into A1 (i) and A2 (i) so that the most significant digit and the least significant digit of one bit are separated.

また、ＣＰＵ１０３は、Ｂ（ｉ）についても同様に、１のビットの最上位桁と最下位桁が分かれるようにＢ（ｉ）とＢ２（ｉ）に分離する。
なお、この分離は、Ａ（ｉ），Ｂ（ｉ）のいずれかのみに行うようにしてもよい。 Similarly, for B (i), the CPU 103 separates B (i) and B2 (i) so that the most significant digit and the least significant digit of one bit are separated.
Note that this separation may be performed only in one of A (i) and B (i).

Ｓ５０２において、ＣＰＵ１０３は、変数Ｓ１、Ｓ２、Ｓ３およびＳ４の領域を、メモリ１０２上に確保し、Ａ１（ｉ）＊Ｂ１（ｉ）をＳ１、Ａ１（ｉ）＊Ｂ２（ｉ）をＳ２、Ａ２（ｉ）＊Ｂ１（ｉ）をＳ３、Ａ２（ｉ）＊Ｂ２（ｉ）をＳ４にそれぞれ代入する。
これにより、Ａ（ｉ）＊Ｂ（ｉ）は、Ｓ１＋Ｓ２＋Ｓ３＋Ｓ４で表現されることとなる。 In S502, the CPU 103 secures areas of variables S1, S2, S3, and S4 on the memory 102, A1 (i) * B1 (i) is S1, and A1 (i) * B2 (i) is S2, A2. (I) Substitute * B1 (i) for S3 and A2 (i) * B2 (i) for S4.
As a result, A (i) * B (i) is expressed as S1 + S2 + S3 + S4.

Ｓ５０３において、ＣＰＵ１０３は、変数Ｔ１、Ｔ２の領域をメモリ１０２上に確保し、前記したＳ１とＳ２の和をＴ１に代入する。また、Ｓ１とＳ２の和の誤差をＳ２-(Ｔ１-Ｓ１)から計算し、Ｔ２に代入する。
これにより、和Ｓ１＋Ｓ２で生じた丸めや情報落ちによる誤差をフォローすることができる。 In S503, the CPU 103 secures the areas of the variables T1 and T2 on the memory 102, and substitutes the sum of S1 and S2 described above for T1. Further, an error of the sum of S1 and S2 is calculated from S2- (T1-S1) and substituted for T2.
Thereby, it is possible to follow an error caused by rounding or information loss caused by the sum S1 + S2.

Ｓ５０４において、ＣＰＵ１０３は、変数Ｔ３、Ｔ４の領域をメモリ１０２上に確保し、前記したＳ３とＳ４の和をＴ３に代入する。また、Ｓ３とＳ４の和の誤差をＳ４-(Ｔ３-Ｓ３)から計算し、Ｔ４に代入する。
これにより、和Ｓ３＋Ｓ４で生じた丸めや情報落ちによる誤差をフォローすることができる。 In S504, the CPU 103 secures the areas of the variables T3 and T4 on the memory 102, and substitutes the sum of S3 and S4 described above for T3. Further, an error of the sum of S3 and S4 is calculated from S4- (T3-S3) and substituted for T4.
Thereby, it is possible to follow an error caused by rounding or information loss caused by the sum S3 + S4.

Ｓ５０５において、ＣＰＵ１０３は、変数Ｔ５、Ｔ６の領域をメモリ１０２上に確保し、前記したＴ１とＴ３の和をＴ５に代入する。また、Ｔ１とＴ３の和の誤差をＴ３-(Ｔ５-Ｔ１)から計算し、Ｔ６に代入する。
これにより、和Ｔ１＋Ｔ３で生じた丸めや情報落ちによる誤差をフォローすることができる。 In S505, the CPU 103 secures the areas of variables T5 and T6 on the memory 102, and substitutes the sum of T1 and T3 described above for T5. Also, the error of the sum of T1 and T3 is calculated from T3- (T5-T1) and substituted for T6.
Thereby, it is possible to follow an error caused by rounding or information loss caused by the sum T1 + T3.

Ｓ５０６において、ＣＰＵ１０３は、変数Ｔ７の領域をメモリ１０２上に確保し、前記したＴ２とＴ４とＴ６の和をＴ７に代入する。
なお、このＴ７は、大きな誤差を生じる可能性が少ないので、その誤差についてフォローしなくても問題はない。 In step S506, the CPU 103 secures an area for the variable T7 on the memory 102, and substitutes the sum of the above-described T2, T4, and T6 for T7.
Since T7 is less likely to cause a large error, there is no problem even if the error is not followed.

Ｓ５０７において、ＣＰＵ１０３は、変数Ｄ、Ｅの領域をメモリ１０２上に確保し、前記したＴ５とＴ７の和をＤに代入する。また、Ｔ５とＴ７の和の誤差をＴ７-(Ｄ-Ｔ５)から計算し、Ｅに代入する。
このように、ＣＰＵ１０３は、与えられたすべてのiについて、Ｓ５０１〜Ｓ５０７の処理を行う。 In S507, the CPU 103 secures the areas of the variables D and E on the memory 102 and substitutes the sum of T5 and T7 described above for D. Further, an error of the sum of T5 and T7 is calculated from T7− (D−T5) and substituted for E.
Thus, the CPU 103 performs the processing of S501 to S507 for all given i.

図６は、図２に示したＳ２０５の詳細図である。なお、図６のＳ６０１〜Ｓ６０５は、図５のＳ５０３〜Ｓ５０７と同様の処理である。 FIG. 6 is a detailed view of S205 shown in FIG. Note that S601 to S605 in FIG. 6 are the same processes as S503 to S507 in FIG.

Ｓ６０１において、ＣＰＵ１０３は、変数Ｓ１１、Ｓ１２の領域をメモリ１０２上に確保し、前記したＳＵＭとＤの和をＳ１１に代入する。また、ＳＵＭとＤの和の誤差をＳ１２に代入する。 In S601, the CPU 103 secures the areas of the variables S11 and S12 on the memory 102, and substitutes the sum of the above SUM and D into S11. Further, the error of the sum of SUM and D is substituted into S12.

Ｓ６０２において、ＣＰＵ１０３は、変数Ｓ１３、Ｓ１４の領域をメモリ１０２上に確保し、前記したＤＳＵＭとＥの和をＳ１３に代入する。また、ＤＳＵＭとＥの和の誤差をＳ１４に代入する。 In S602, the CPU 103 secures the areas of variables S13 and S14 on the memory 102, and substitutes the sum of DSUM and E into S13. Further, the error of the sum of DSUM and E is substituted into S14.

Ｓ６０３において、ＣＰＵ１０３は、変数Ｓ１５、Ｓ１６の領域をメモリ１０２上に確保し、前記したＳ１１とＳ１３の和をＳ１５に代入する。また、Ｓ１１とＳ１３の和の誤差をＳ１６に代入する。 In S603, the CPU 103 secures the areas of the variables S15 and S16 on the memory 102, and substitutes the sum of S11 and S13 described above for S15. Further, the error of the sum of S11 and S13 is substituted into S16.

Ｓ６０４において、ＣＰＵ１０３は、変数Ｓ１７の領域をメモリ１０２上に確保し、前記したＳ１２とＳ１４とＳ１６の和をＳ１７に代入する。 In S604, the CPU 103 secures an area for the variable S17 on the memory 102, and substitutes the sum of S12, S14, and S16 described above for S17.

Ｓ６０５において、ＣＰＵ１０３は、前記したＳ１５とＳ１７の和をＳＵＭに代入し、Ｓ１５とＳ１７の和の誤差をＤＳＵＭに代入する。
このように、ＣＰＵ１０３は、与えられたすべてのiについて、Ｓ６０１〜Ｓ６０５の処理を行う。 In S605, the CPU 103 substitutes the sum of S15 and S17 described above for SUM, and substitutes the error of the sum of S15 and S17 for DSUM.
As described above, the CPU 103 performs the processing of S601 to S605 for all given i.

図２に戻り、Ｓ２０７において、ＣＰＵ１０３は、ＳＵＭと補正項ＤＳＵＭの和を演算し、ＳＵＭに代入する。
Ｓ２０８において、ＣＰＵ１０３は、出力処理として、Ｓ２０７で代入したＳＵＭの値を演算結果として出力部１０５に出力する。 Returning to FIG. 2, in S207, the CPU 103 calculates the sum of the SUM and the correction term DSUM, and substitutes the sum into the SUM.
In S208, as an output process, the CPU 103 outputs the SUM value substituted in S207 to the output unit 105 as a calculation result.

このように、Ｓ２０１〜Ｓ２０８の各処理を行うことで、内積演算Ａ・Ｂについての正確な演算結果を得ることができる。
また、このように正確な演算結果を得ることにより、高精度の内積計算を必要とする産業（例えば、ニューラルネットワーク、ベクトル量子化、敏感な数値シミュレーションを必要とする物理学分野）に本発明を応用することができる。 As described above, by performing each processing of S201 to S208, it is possible to obtain an accurate calculation result for the inner product calculation A / B.
In addition, by obtaining accurate calculation results in this way, the present invention is applied to industries that require high-precision inner product calculation (for example, physics that requires neural network, vector quantization, and sensitive numerical simulation). Can be applied.

続いて、前記した内積演算の具体例について説明する（適宜各図参照）。なお、ここでは、理解を容易にするため、２次元ベクトルに関する内積演算の場合を例にとり説明する。また、データ処理装置１（図１参照）が各演算で使用する数値について、仮数部として保持できる桁数を８ビットとする（指数部を除く）。そして、演算結果の数値が８ビットを超えた場合、上位８ビットのみを保持し、残りのビットは切り捨てるものとする。 Subsequently, specific examples of the inner product calculation described above will be described (refer to each figure as appropriate). Here, in order to facilitate understanding, a case of inner product calculation regarding a two-dimensional vector will be described as an example. Further, regarding the numerical values used by the data processing device 1 (see FIG. 1), the number of digits that can be held as the mantissa part is 8 bits (excluding the exponent part). When the numerical value of the operation result exceeds 8 bits, only the upper 8 bits are retained, and the remaining bits are discarded.

前記条件のもとで、２次元ベクトルＡ及びＢを次のように設定する。
Ａ＝(Ａ（１），Ａ（２）)、Ｂ＝(Ｂ（１），Ｂ（２）)
Ａ（１）、Ａ（２）、Ｂ（１）、Ｂ（２）は、二進数であり、それぞれの値は、次の通りである。
Ａ（１）＝１０１１０１１１（十進数表記で「１８３」：以下同様に、かっこ内は十進数表記）
Ａ（２）＝１００１０１０１（１４９）
Ｂ（１）＝１１０１１０１０（２１８）
Ｂ（２）＝１１１０００１１（２２７） Under the above conditions, the two-dimensional vectors A and B are set as follows.
A = (A (1), A (2)), B = (B (1), B (2))
A (1), A (2), B (1), and B (2) are binary numbers, and their values are as follows.
A (1) = 10110111 (“183” in decimal notation: Similarly, parenthesized notation)
A (2) = 10010101 (149)
B (1) = 111011010 (218)
B (2) = 11100011 (227)

前記条件のとき、内積演算Ａ・Ｂの正確な答えであるＡ・Ｂ（正解）は、次のようになる。
Ａ・Ｂ（正解）＝１００１１０１１１１０１０１１０（３９８９４）＋
１００００１０００００１１１１１（３３８２３）
＝１０００１１１１１１１１１０１０１（７３７１７） Under the above conditions, A · B (correct answer) which is the correct answer of the inner product operation A · B is as follows.
A / B (correct answer) = 1001101111010110 (39894) +
10,00010000011111 (33823)
= 100001111111110101 (73717)

そして、本実施形態のデータ処理装置１（図１参照）を用いない場合、すなわち、従来のように切り捨てによって生じた誤差を補正しない場合、内積演算Ａ・Ｂの結果であるＡ・Ｂ（従来）は、次のようになる。
Ａ・Ｂ（従来）＝１００１１０１１*２⁸（３９６８０）＋
１００００１００*２⁸（３３７９２）
＝１０００１１１１*２⁹（７３２１６） When the data processing apparatus 1 (see FIG. 1) of the present embodiment is not used, that is, when the error caused by truncation is not corrected as in the prior art, A / B (conventional) is the result of the inner product calculation A / B. ) Is as follows.
A · B (conventional) = 10011011 * 2 ⁸ (39680) +
10,000 100 * 2 ⁸ (33792)
= 100001111 * 2 ⁹ (73216)

つまり、従来の演算方式では、７３７１７−７３２１６＝５０１の誤差が生じてしまう。 That is, in the conventional calculation method, an error of 73717−73216 = 501 occurs.

次に、前記条件のときに、本実施形態のデータ処理装置１（図１参照）を用いて、内積演算Ａ・Ｂの結果であるＡ・Ｂ（本願）を算出する演算について説明する（適宜各図参照）。 Next, an operation for calculating A · B (this application) as a result of the inner product operation A · B using the data processing device 1 (see FIG. 1) of the present embodiment under the above conditions will be described (as appropriate). (See each figure).

まず、Ｓ２０１において、Ｓ３０１に示すように、入力部１０４からデータを入力する。その結果、以下のように配列が初期化される。
Ａ（１）＝１０１１０１１１（１８３）
Ａ（２）＝１００１０１０１（１４９）
Ｂ（１）＝１１０１１０１０（２１８）
Ｂ（２）＝１１１０００１１（２２７） First, in S201, data is input from the input unit 104 as shown in S301. As a result, the array is initialized as follows.
A (1) = 10110111 (183)
A (2) = 10010101 (149)
B (1) = 111011010 (218)
B (2) = 11100011 (227)

次に、Ｓ２０２において、ＣＰＵ１０３は、変数ＳＵＭ、ＤＳＵＭのＳ４０１に示すように、変数ＳＵＭ及びＤＳＵＭの値を０に初期化する。
また、この例では２次元での内積演算を実施するため、Ｓ２０３における変数Ｎを２に初期化する。 Next, in S202, the CPU 103 initializes the values of the variables SUM and DSUM to 0 as shown in S401 of the variables SUM and DSUM.
In this example, in order to perform a two-dimensional inner product calculation, the variable N in S203 is initialized to 2.

そして、Ｓ２０２が実行された後、i＝１としてＳ２０４が実行される。次に、積Ａ(１)*Ｂ(１)における誤差を補正するために、Ｓ５０１において、ＣＰＵ１０３は、ベクトル成分を以下のように分離する。
Ａ（１）＝Ａ１(１)＋Ａ２(１)
Ｂ（１）＝Ｂ１(１)＋Ｂ２(１)
Ａ１(１)＝１０１１００００（１７６）
Ａ２(１)＝０００００１１１（７）
Ｂ１(１)＝１１０１００００（２０８）
Ｂ２(１)＝００００１０１０（１０） Then, after S202 is executed, S204 is executed with i = 1. Next, in order to correct the error in the product A (1) * B (1), in S501, the CPU 103 separates the vector components as follows.
A (1) = A1 (1) + A2 (1)
B (1) = B1 (1) + B2 (1)
A1 (1) = 10110000 (176)
A2 (1) = 00000111 (7)
B1 (1) = 11010000 (208)
B2 (1) = 000001010 (10)

なお、ここでは、Ａ(１)およびＢ(１)の各々について、Ａ１(１)およびＢ１(１)のそれぞれ下４桁が０になるように分離したが、下３桁や下２桁などが０になるように分離してもよい。また、Ａ(１)およびＢ(１)のいずれかだけを分離するようにしてもよい。
いずれにしても、このように、Ａ(１)およびＢ(１)を、１のビットの最上位桁と最下位桁が分かれるように２つ以上の数に分離することで、以下の演算における誤差を少なくすることができるのである。 In this example, A (1) and B (1) are separated so that the lower 4 digits of A1 (1) and B1 (1) are 0, but the lower 3 digits, lower 2 digits, etc. May be separated so that becomes zero. Further, only one of A (1) and B (1) may be separated.
In any case, by separating A (1) and B (1) into two or more numbers so that the most significant digit and the least significant digit of one bit are separated in this way, The error can be reduced.

そして、Ｓ５０１が実行された後、Ｓ５０２において、変数Ｓ１、Ｓ２、Ｓ３、Ｓ４は以下のようになる。
Ｓ１＝１０００１１１１＊２⁸（３６６０８）
Ｓ２＝１１０１１１００＊２³（１７６０）
Ｓ３＝１０１１０１１０＊２³（１４５６）
Ｓ４＝０１０００１１０（７０） After S501 is executed, the variables S1, S2, S3, and S4 are as follows in S502.
S1 = 100001111 * 2 ⁸ (36608)
S2 = 111011100 * 2 ³ (1760)
S3 = 10110110 * 2 ³ (1456)
S4 = 0100011010 (70)

Ｓ５０２が実行された後、Ｓ５０３において、変数Ｔ１、Ｔ２は以下のようになる。
Ｔ１＝１００１０１０１＊２⁸（３８１４４）
Ｔ２＝１１１０００００（２２４） After S502 is executed, the variables T1 and T2 are as follows in S503.
T1 = 10010101 * 2 ⁸ (38144)
T2 = 111100000 (224)

また、Ｓ５０４において、変数Ｔ３、Ｔ４は以下のようになる。
Ｔ３＝１０１１１１１０＊２³（１５２０）
Ｔ４＝０００００１１０（６） In S504, the variables T3 and T4 are as follows.
T3 = 10111110 * 2 ³ (1520)
T4 = 00000110 (6)

さらに、Ｓ５０５において、変数Ｔ５、Ｔ６は以下のようになる。
Ｔ５＝１００１１０１０＊２⁸（３９４２４）
Ｔ６＝１１１１００００（２４０） Further, in S505, the variables T5 and T6 are as follows.
T5 = 10011010 * 2 ⁸ (39424)
T6 = 11110000 (240)

また、Ｓ５０６において、変数Ｔ７は以下のようになる。
Ｔ７＝１１１０１０１１＊２¹（４７０） In S506, the variable T7 is as follows.
T7 = 11101011 * 2 ¹ (470)

さらに、Ｓ５０７において、変数Ｄ、Ｅは以下のようになる。
Ｄ＝１００１１０１１＊２⁸（３９６８０）
Ｅ＝１１０１０１１０（２１４） In S507, the variables D and E are as follows.
D = 10011011 * 2 ⁸ (39680)
E = 111010110 (214)

続いて、i＝１としてＳ２０５が実行される。なお、ここでは、ＳＵＭとＤＳＵＭは初期値のままであるので、ともに０である。 Subsequently, S205 is executed with i = 1. Here, since SUM and DSUM remain at their initial values, both are 0.

まず、Ｓ６０１において、変数Ｓ１１、Ｓ１２は以下のようになる。
Ｓ１１＝１００１１０１１＊２⁸（３９６８０）
Ｓ１２＝０００００００００（０） First, in S601, variables S11 and S12 are as follows.
S11 = 10011011 * 2 ⁸ (39680)
S12 = 000000000000 (0)

また、Ｓ６０２において、変数Ｓ１３、Ｓ１４は以下のようになる。
Ｓ１３＝１１０１０１１０（２１４）
Ｓ１４＝００００００００（０） In S602, the variables S13 and S14 are as follows.
S13 = 11010110 (214)
S14 = 00000000 (0)

さらに、Ｓ６０３において、変数Ｓ１５、Ｓ１６は以下のようになる。
Ｓ１５＝１００１１０１１＊２⁸（３９６８０）
Ｓ１６＝１１０１０１１０（２１４） Further, in S603, the variables S15 and S16 are as follows.
S15 = 10011011 * 2 ⁸ (39680)
S16 = 11010110 (214)

また、Ｓ６０４において、変数Ｓ１７は以下のようになる。
Ｓ１７＝１１０１０１１０（２１４） In S604, the variable S17 is as follows.
S17 = 111010110 (214)

そして、Ｓ６０５において、変数ＳＵＭ、ＤＳＵＭは以下のようになる。
ＳＵＭ＝１００１１０１１＊２⁸（３９６８０）
ＤＳＵＭ＝１１０１０１１０（２１４） In S605, the variables SUM and DSUM are as follows.
SUM = 10011011 * 2 ⁸ (39680)
DSUM = 111010110 (214)

次に、i＝２としてＳ２０４が実行される。まず、積Ａ(２)*Ｂ(２)における誤差を補正するために、Ｓ５０１において、ベクトル成分を以下のように分離する。
Ａ（２）＝Ａ１（２）＋Ａ２（２）
Ｂ（２）＝Ｂ１（２）＋Ｂ２（２）
Ａ１（２）＝１００１００００（１４４）
Ａ２（２）＝０００００１０１（５）
Ｂ１（２）＝１１１０００００（２２５）
Ｂ２（２）＝００００００１１（３） Next, S204 is executed with i = 2. First, in order to correct an error in the product A (2) * B (2), in S501, vector components are separated as follows.
A (2) = A1 (2) + A2 (2)
B (2) = B1 (2) + B2 (2)
A1 (2) = 10010000 (144)
A2 (2) = 00000101 (5)
B1 (2) = 111100000 (225)
B2 (2) = 00000011 (3)

そして、Ｓ５０１が実行された後、Ｓ５０２において、変数Ｓ１、Ｓ２、Ｓ３、Ｓ４は以下のようになる。
Ｓ１＝１１１１１１００＊２⁷（３２２５６）
Ｓ２＝１１０１１０００＊２¹（４３２）
Ｓ３＝１０００１１００＊２³（１１２０）
Ｓ４＝００００１１１１（１５） After S501 is executed, the variables S1, S2, S3, and S4 are as follows in S502.
S1 = 11111100 * 2 ⁷ (32256)
S2 = 11011000 * 2 ¹ (432)
S3 = 1000101100 * 2 ³ (1120)
S4 = 00001111 (15)

Ｓ５０２が実行された後、Ｓ５０３において、変数Ｔ１、Ｔ２は以下のようになる。
Ｔ１＝１１１１１１１１＊２⁷（３２６４０）
Ｔ２＝００１１００００（４８） After S502 is executed, the variables T1 and T2 are as follows in S503.
T1 = 11111111 * 2 ⁷ (32640)
T2 = 00110000 (48)

また、Ｓ５０４において、変数Ｔ３、Ｔ４は以下のようになる。
Ｔ３＝１０００１１０１＊２³（１１２８）
Ｔ４＝０００００１１０（７） In S504, the variables T3 and T4 are as follows.
T3 = 10001101 * 2 ³ (1128)
T4 = 00000110 (7)

さらに、Ｓ５０５において、変数Ｔ５、Ｔ６は以下のようになる。
Ｔ５＝１０００００１１＊２⁸（３３５３６）
Ｔ６＝１１１０１０００（２３２） Further, in S505, the variables T5 and T6 are as follows.
T5 = 10000011 * 2 ⁸ (33536)
T6 = 11101000 (232)

また、Ｓ５０６において、変数Ｔ７は以下のようになる。
Ｔ７＝１０００１１１１＊２¹（２８６） In S506, the variable T7 is as follows.
T7 = 100001111 * 2 ¹ (286)

さらに、Ｓ５０７において、変数Ｄ、Ｅは以下のようになる。
Ｄ＝１００００１００＊２⁸（３３７９２）
Ｅ＝０００１１１１０（３０） In S507, the variables D and E are as follows.
D = 10000100 * 2 ⁸ (33792)
E = 00011110 (30)

続いて、i＝２としてＳ２０５が実行される。なお、ここでは、ＳＵＭとＤＳＵＭの値は、次のようになっている。
ＳＵＭ＝１００１１０１１＊２⁸（３９６８０）
ＤＳＵＭ＝１１０１０１１０（２１４） Subsequently, S205 is executed with i = 2. Here, the values of SUM and DSUM are as follows.
SUM = 10011011 * 2 ⁸ (39680)
DSUM = 111010110 (214)

まず、Ｓ６０１において、変数Ｓ１１、Ｓ１２は以下のようになる。
Ｓ１１＝１０００１１１１＊２⁹（７３２１６）
Ｓ１２＝１０００００００＊２¹（２５６） First, in S601, variables S11 and S12 are as follows.
S11 = 100001111 * 2 ⁹ (73216)
S12 = 10000000 * 2 ¹ (256)

また、Ｓ６０２において、変数Ｓ１３、Ｓ１４は以下のようになる。
Ｓ１３＝１１１１０１００（２４４）
Ｓ１４＝００００００００（０） In S602, the variables S13 and S14 are as follows.
S13 = 11110100 (244)
S14 = 00000000 (0)

さらに、Ｓ６０３において、変数Ｓ１５、Ｓ１６は以下のようになる。
Ｓ１５＝１０００１１１１＊２⁹（７３２１６）
Ｓ１６＝１１１１０１００（２４４） Further, in S603, the variables S15 and S16 are as follows.
S15 = 100001111 * 2 ⁹ (73216)
S16 = 11110100 (244)

また、Ｓ６０４において、変数Ｓ１７は以下のようになる。
Ｓ１７＝１１１１１０１０＊２¹（５００） In S604, the variable S17 is as follows.
S17 = 11111010 * 2 ¹ (500)

そして、Ｓ６０５において、変数ＳＵＭ、ＤＳＵＭは以下のようになる。
ＳＵＭ＝１０００１１１１＊２⁹（７３２１６）
ＤＳＵＭ＝１１１１１０１０＊２¹（５００） In S605, the variables SUM and DSUM are as follows.
SUM = 100111111 * 2 ⁹ (73216)
DSUM = 11111010 * 2 ¹ (500)

最後に、Ｓ２０７において、ＣＰＵ１０３は、ＳＵＭ＋ＤＳＵＭをＳＵＭに代入し、ＳＵＭ＝７３７１６を得る。
なお、Ｓ２０７においては、実際には、ＳＵＭの値とＤＳＵＭの値を別々にメモリ１０２に記憶するので、丸め誤差などが生じることはない。 Finally, in S207, the CPU 103 substitutes SUM + DSUM for SUM to obtain SUM = 73716.
In S207, the SUM value and the DSUM value are actually stored in the memory 102 separately, so that no rounding error or the like occurs.

そして、出力処理Ｓ２０８において、ＣＰＵ１０３は、演算結果、すなわち、Ａ・Ｂ（本願）＝１０００１１１１１１１１１０１００（７３７１６）を出力する。 In the output process S208, the CPU 103 outputs a calculation result, that is, A · B (this application) = 100111111110110100 (73716).

つまり、Ａ・Ｂ（正解）、Ａ・Ｂ（従来）およびＡ・Ｂ（本願）の演算結果の十進数表記を並べて記載すると、次のようになり、Ａ・Ｂ（本願）の誤差がＡ・Ｂ（従来）の誤差に比べて、極めて小さくなることがわかる。
Ａ・Ｂ（正解）＝７３７１７
Ａ・Ｂ（従来）＝７３２１６（誤差５０１）
Ａ・Ｂ（本願）＝７３７１６（誤差１） That is, when the decimal notation of the calculation results of A · B (correct answer), A · B (conventional) and A · B (this application) is written side by side, the error is as follows. -It turns out that it becomes very small compared with the error of B (conventional).
A ・ B (Correct) = 73717
A · B (conventional) = 73216 (error 501)
A · B (this application) = 73716 (error 1)

具体的な別の例として、ＩＥＥＥ７５４で規定された倍精度形式を用いた３次元での内積演算Ａ・Ｂにおける精度改善処理を以下に示す。なお、以下の例では、数値は十進数表記とする。
３次元ベクトルＡ及びＢを以下のように設定するとき、内積演算Ａ・Ｂ（正解）は以下のようになる。
Ａ=(a00,a01,a02)、Ｂ=(b00,b10,b20)
a00=1738663799
a01=773694423
a02=112614455
b00=1506009561
b10=2117293945
b20=421597465
Ａ・Ｂ（正解）=a00*b00+a01*b10+a02*b20=4304060790507107549 As another specific example, an accuracy improvement process in a three-dimensional inner product operation A / B using a double precision format defined in IEEE754 is shown below. In the following example, the numerical value is expressed in decimal.
When the three-dimensional vectors A and B are set as follows, the inner product calculation A · B (correct answer) is as follows.
A = (a00, a01, a02), B = (b00, b10, b20)
a00 = 1738663799
a01 = 773694423
a02 = 112614455
b00 = 1506009561
b10 = 2117293945
b20 = 421597465
A ・ B (Correct) = a00 * b00 + a01 * b10 + a02 * b20 = 4304060790507107549

まず、上記の条件で本発明を用いない場合、すなわち、丸め、情報落ち及び桁落ちで生じた誤差を補正しない場合、内積演算Ａ・Ｂ（従来）は以下に示すような結果となる。
Ａ・Ｂ（従来）=0.43040607905071073*10¹⁹ First, when the present invention is not used under the above-described conditions, that is, when errors caused by rounding, information loss, and digit loss are not corrected, the inner product calculation A / B (conventional) has the following results.
A ・ B (conventional) = 0.43040607905071073 * 10 ¹⁹

一方、上記の条件で、本発明を用いた場合、以下に示すような結果となる。まず、入力処理Ｓ２０１により、Ｓ３０１のように、入力装置１０４からデータを入力する。その結果、以下のように配列が初期化される。
Ａ（１）=1738663799
Ａ（２）=773694423
Ａ（３）=112614455
Ｂ（１）=1506009561
Ｂ（２）=2117293945
Ｂ（３）=421597465 On the other hand, when the present invention is used under the above conditions, the following results are obtained. First, in the input process S201, data is input from the input device 104 as in S301. As a result, the array is initialized as follows.
A (1) = 1738663799
A (2) = 773694423
A (3) = 112614455
B (1) = 1506009561
B (2) = 2117293945
B (3) = 421597465

次に、初期化処理Ｓ２０２により、変数ＳＵＭ、Ｓ４０１のように、変数ＳＵＭ及びＤＳＵＭを０に初期化する。また、今回の例では３次元での内積演算を実施するため、Ｓ２０３における変数Ｎを３に初期化する。 Next, the initialization process S202 initializes the variables SUM and DSUM to 0 like the variables SUM and S401. In this example, the variable N in S203 is initialized to 3 in order to perform the inner product calculation in three dimensions.

初期化処理が実行された後、i＝１としてＳ２０４処理が実行される。まず、積Ａ（１）*Ｂ（１）における誤差を補正するために、Ｓ５０１により、ベクトル成分を以下のように分離する。なお、ここで、Dは十進数であることを示し、変数Ｄとは無関係である。また、右辺の末尾の２桁の数は、１０を底とする指数の指数部を示す。
Ａ1（１）=0.173866377600000000000000000000D+10
Ａ2（１）=0.230000000000000000000000000000D+02
Ｂ1（１）=0.150600953600000000000000000000D+10
Ｂ2（１）=0.250000000000000000000000000000D+02 After the initialization process is executed, S204 is executed with i = 1. First, in order to correct an error in the product A (1) * B (1), in S501, vector components are separated as follows. Here, D indicates a decimal number and is irrelevant to the variable D. Also, the last two digits on the right side indicate the exponent part of the exponent with base 10.
A1 (1) = 0.173866377600000000000000000000D + 10
A2 (1) = 0.230000000000000000000000000000D + 02
B1 (1) = 0.150600953600000000000000000000D + 10
B2 (1) = 0.250000000000000000000000000000D + 02

Ｓ５０１が実行された後、Ｓ５０２により、変数Ｓ１、Ｓ２、Ｓ３、Ｓ４は以下のようになる。
Ｓ１=0.261844422655376793600000000000D+19
Ｓ２=0.434665944000000000000000000000D+11
Ｓ３=0.346382193280000000000000000000D+11
Ｓ４=0.575000000000000000000000000000D+03 After S501 is executed, the variables S1, S2, S3, and S4 are as follows by S502.
S1 = 0.261844422655376793600000000000D + 19
S2 = 0.434665944000000000000000000000D + 11
S3 = 0.346382193280000000000000000000D + 11
S4 = 0.575000000000000000000000000000D + 03

Ｓ５０２が実行された後、Ｓ５０３により、変数Ｔ１、Ｔ２は以下のようになる。
Ｔ１=0.261844427002036224000000000000D+19
Ｔ２=0.960000000000000000000000000000D+02 After S502 is executed, the variables T1 and T2 are as follows by S503.
T1 = 0.261844427002036224000000000000D + 19
T2 = 0.960000000000000000000000000000D + 02

Ｓ５０３が実行された後、Ｓ５０４により、変数Ｔ３、Ｔ４は以下のようになる。
Ｔ３=0.346382199030000000000000000000D+11
Ｔ４=0.000000000000000000000000000000D+00 After S503 is executed, the variables T3 and T4 are as follows by S504.
T3 = 0.346382199030000000000000000000D + 11
T4 = 0.000000000000000000000000000000D + 00

Ｓ５０４が実行された後、Ｓ５０５により、変数Ｔ５、Ｔ６は以下のようになる。
Ｔ５=0.261844430465858201600000000000D+19
Ｔ６=0.127000000000000000000000000000D+03 After S504 is executed, the variables T5 and T6 are as follows by S505.
T5 = 0.261844430465858201600000000000D + 19
T6 = 0.127000000000000000000000000000D + 03

Ｓ５０５が実行された後、Ｓ５０６により、変数Ｔ７は以下のようになる。
Ｔ７=0.223000000000000000000000000000D+03 After S505 is executed, the variable T7 becomes as follows by S506.
T7 = 0.223000000000000000000000000000D + 03

Ｓ５０６が実行された後、Ｓ５０７により、変数Ｄ、Ｅは以下のようになる。
Ｄ=0.261844430465858201600000000000D+19
Ｅ=0.223000000000000000000000000000D+03 After S506 is executed, the variables D and E are as follows by S507.
D = 0.261844430465858201600000000000D + 19
E = 0.223000000000000000000000000000D + 03

i＝１としてＳ２０４が実行された後、次にi＝１としてＳ２０５が実行される。
まず、Ｓ６０１により、変数Ｓ１１、Ｓ１２は以下のようになる。
Ｓ１１=0.261844430465858201600000000000D+19
Ｓ１２=0.000000000000000000000000000000D+00 After S204 is executed with i = 1, S205 is executed with i = 1.
First, the variables S11 and S12 are as follows by S601.
S11 = 0.261844430465858201600000000000D + 19
S12 = 0.000000000000000000000000000000D + 00

Ｓ６０１が実行された後、Ｓ６０２により、変数Ｓ１３、Ｓ１４は以下のようになる。
Ｓ１３=0.223000000000000000000000000000D+03
Ｓ１４=0.000000000000000000000000000000D+00 After S601 is executed, the variables S13 and S14 are as follows by S602.
S13 = 0.223000000000000000000000000000D + 03
S14 = 0.000000000000000000000000000000D + 00

Ｓ６０２が実行された後、Ｓ６０３により、変数Ｓ１５、Ｓ１６は以下のようになる。
Ｓ１５=0.261844430465858201600000000000D+19
Ｓ１６=0.223000000000000000000000000000D+03 After S602 is executed, the variables S15 and S16 are as follows by S603.
S15 = 0.261844430465858201600000000000D + 19
S16 = 0.223000000000000000000000000000D + 03

Ｓ６０３が実行された後、Ｓ６０４により、変数Ｓ１７は以下のようになる。
Ｓ１７=0.223000000000000000000000000000D+03 After S603 is executed, the variable S17 becomes as follows by S604.
S17 = 0.223000000000000000000000000000D + 03

Ｓ６０４が実行された後、Ｓ６０５により、変数ＳＵＭ、ＤＳＵＭは以下のようになる。
ＳＵＭ= 0.261844430465858201600000000000D+19
ＤＳＵＭ=0.223000000000000000000000000000D+03 After S604 is executed, the variables SUM and DSUM are as follows by S605.
SUM = 0.261844430465858201600000000000D + 19
DSUM = 0.223000000000000000000000000000D + 03

i＝１としてＳ２０５が実行された後、i＝２としてＳ２０４処理が実行される。まず、積Ａ（２）*Ｂ（２）における誤差を補正するために、Ｓ５０１により、ベクトル成分を以下のように分離する。
Ａ1（２）=0.773694416000000000000000000000D+09
Ａ2（２）=0.700000000000000000000000000000D+01
Ｂ1（２）=0.211729392000000000000000000000D+10
Ｂ2（２）=0.250000000000000000000000000000D+02 After S205 is executed with i = 1, S204 processing is executed with i = 2. First, in order to correct an error in the product A (2) * B (2), in S501, vector components are separated as follows.
A1 (2) = 0.773694416000000000000000000000D + 09
A2 (2) = 0.700000000000000000000000000000D + 01
B1 (2) = 0.211729392000000000000000000000D + 10
B2 (2) = 0.250000000000000000000000000000D + 02

Ｓ５０１が実行された後、Ｓ５０２により、変数Ｓ１、Ｓ２、Ｓ３、Ｓ４は以下のようになる。
Ｓ１=0.163813848293475072000000000000D+19
Ｓ２=0.193423604000000000000000000000D+11
Ｓ３=0.148210574400000000000000000000D+11
Ｓ４=0.175000000000000000000000000000D+03 After S501 is executed, the variables S1, S2, S3, and S4 are as follows by S502.
S1 = 0.163813848293475072000000000000D + 19
S2 = 0.193423604000000000000000000000D + 11
S3 = 0.148210574400000000000000000000D + 11
S4 = 0.175000000000000000000000000000D + 03

Ｓ５０２が実行された後、Ｓ５０３により、変数Ｔ１、Ｔ２は以下のようになる。
Ｔ１=0.163813850227711104000000000000D+19
Ｔ２=0.800000000000000000000000000000D+02 After S502 is executed, the variables T1 and T2 are as follows by S503.
T1 = 0.163813850227711104000000000000D + 19
T2 = 0.800000000000000000000000000000D + 02

Ｓ５０４が実行された後、Ｓ５０５により、変数Ｔ５、Ｔ６は以下のようになる。
Ｔ５=0.163813851709816857600000000000D+19
Ｔ６=0.790000000000000000000000000000D+02 After S504 is executed, the variables T5 and T6 are as follows by S505.
T5 = 0.163813851709816857600000000000D + 19
T6 = 0.790000000000000000000000000000D + 02

Ｓ５０５が実行された後、Ｓ５０６により、変数Ｔ７は以下のようになる。
Ｔ７=0.159000000000000000000000000000D+03 After S505 is executed, the variable T7 becomes as follows by S506.
T7 = 0.159000000000000000000000000000D + 03

Ｓ５０６が実行された後、Ｓ５０７により、変数Ｄ、Ｅは以下のようになる。
Ｄ=0.163813851709816883200000000000D+19
Ｅ=-0.970000000000000000000000000000D+02 After S506 is executed, the variables D and E are as follows by S507.
D = 0.163813851709816883200000000000D + 19
E = -0.970000000000000000000000000000D + 02

i＝２としてＳ２０４が実行された後、次にi＝２としてＳ２０５が実行される。まず、Ｓ６０１により、変数Ｓ１１、Ｓ１２は以下のようになる。
Ｓ１１=0.425658282175675084800000000000D+19
Ｓ１２=0.000000000000000000000000000000D+00 After S204 is executed with i = 2, then S205 is executed with i = 2. First, the variables S11 and S12 are as follows by S601.
S11 = 0.425658282175675084800000000000D + 19
S12 = 0.000000000000000000000000000000D + 00

Ｓ６０１が実行された後、Ｓ６０２により、変数Ｓ１３、Ｓ１４は以下のようになる。
Ｓ１３=0.126000000000000000000000000000D+03
Ｓ１４=0.000000000000000000000000000000D+00 After S601 is executed, the variables S13 and S14 are as follows by S602.
S13 = 0.126000000000000000000000000000D + 03
S14 = 0.000000000000000000000000000000D + 00

Ｓ６０２が実行された後、Ｓ６０３により、変数Ｓ１５、Ｓ１６は以下のようになる。
Ｓ１５=0.425658282175675084800000000000D+19
Ｓ１６=0.126000000000000000000000000000D+03 After S602 is executed, the variables S15 and S16 are as follows by S603.
S15 = 0.425658282175675084800000000000D + 19
S16 = 0.126000000000000000000000000000D + 03

Ｓ６０３が実行された後、Ｓ６０４により、変数Ｓ１７は以下のようになる。
Ｓ１７=0.126000000000000000000000000000D+03 After S603 is executed, the variable S17 becomes as follows by S604.
S17 = 0.126000000000000000000000000000D + 03

Ｓ６０４が実行された後、Ｓ６０５により、変数ＳＵＭ、ＤＳＵＭは以下のようになる。
ＳＵＭ= 0.425658282175675084800000000000D+19
ＤＳＵＭ=0.126000000000000000000000000000D+03 After S604 is executed, the variables SUM and DSUM are as follows by S605.
SUM = 0.425658282175675084800000000000D + 19
DSUM = 0.126000000000000000000000000000D + 03

i＝２としてＳ２０５が実行された後、i＝３としてＳ２０４処理が実行される。まず、積Ａ（３）*Ｂ（３）における誤差を補正するために、Ｓ５０１により、ベクトル成分を以下のように分離する。
Ａ1（３）=0.773694416000000000000000000000D+09
Ａ2（３）=0.700000000000000000000000000000D+01
Ｂ1（３）=0.211729392000000000000000000000D+10
Ｂ2（３）=0.250000000000000000000000000000D+02 After S205 is executed with i = 2, S204 processing is executed with i = 3. First, in order to correct an error in the product A (3) * B (3), in S501, vector components are separated as follows.
A1 (3) = 0.773694416000000000000000000000D + 09
A2 (3) = 0.700000000000000000000000000000D + 01
B1 (3) = 0.211729392000000000000000000000D + 10
B2 (3) = 0.250000000000000000000000000000D + 02

Ｓ５０１が実行された後、Ｓ５０２により、変数Ｓ１、Ｓ２、Ｓ３、Ｓ４は以下のようになる。
Ｓ１=0.474779682161446560000000000000D+17
Ｓ２=0.112614454000000000000000000000D+09
Ｓ３=0.421597464000000000000000000000D+09
Ｓ４=0.100000000000000000000000000000D+01 After S501 is executed, the variables S1, S2, S3, and S4 are as follows by S502.
S1 = 0.474779682161446560000000000000D + 17
S2 = 0.112614454000000000000000000000D + 09
S3 = 0.421597464000000000000000000000D + 09
S4 = 0.100000000000000000000000000000D + 01

Ｓ５０２が実行された後、Ｓ５０３により、変数Ｔ１、Ｔ２は以下のようになる。
Ｔ１=0.474779683287591120000000000000D+17
Ｔ２=-0.200000000000000000000000000000D+01 After S502 is executed, the variables T1 and T2 are as follows by S503.
T1 = 0.474779683287591120000000000000D + 17
T2 = -0.200000000000000000000000000000D + 01

Ｓ５０３が実行された後、Ｓ５０４により、変数Ｔ３、Ｔ４は以下のようになる。
Ｔ３=0.421597465000000000000000000000D+09
Ｔ４=0.000000000000000000000000000000D+00 After S503 is executed, the variables T3 and T4 are as follows by S504.
T3 = 0.421597465000000000000000000000D + 09
T4 = 0.000000000000000000000000000000D + 00

Ｓ５０４が実行された後、Ｓ５０５により、変数Ｔ５、Ｔ６は以下のようになる。
Ｔ５=0.474779687503565760000000000000D+17
Ｔ６=-0.100000000000000000000000000000D+01 After S504 is executed, the variables T5 and T6 are as follows by S505.
T5 = 0.474779687503565760000000000000D + 17
T6 = -0.100000000000000000000000000000D + 01

Ｓ５０６が実行された後、Ｓ５０７により、変数Ｄ、Ｅは以下のようになる。
Ｄ=0.474779687503565760000000000000D+17
Ｅ=-0.100000000000000000000000000000D+01 After S506 is executed, the variables D and E are as follows by S507.
D = 0.474779687503565760000000000000D + 17
E = -0.100000000000000000000000000000D + 01

i＝３としてＳ２０４が実行された後、次にi＝３としてＳ２０５が実行される。まず、Ｓ６０１により、変数Ｓ１１、Ｓ１２は以下のようになる。
Ｓ１１=0.430406079050710732800000000000D+19
Ｓ１２=0.960000000000000000000000000000D+02 After S204 is executed with i = 3, S205 is executed with i = 3. First, the variables S11 and S12 are as follows by S601.
S11 = 0.430406079050710732800000000000D + 19
S12 = 0.960000000000000000000000000000D + 02

Ｓ６０１が実行された後、Ｓ６０２により、変数Ｓ１３、Ｓ１４は以下のようになる。
Ｓ１３=0.125000000000000000000000000000D+03
Ｓ１４=0.000000000000000000000000000000D+00 After S601 is executed, the variables S13 and S14 are as follows by S602.
S13 = 0.125000000000000000000000000000D + 03
S14 = 0.000000000000000000000000000000D + 00

Ｓ６０２が実行された後、Ｓ６０３により、変数Ｓ１５、Ｓ１６は以下のようになる。
Ｓ１５=0.430406079050710732800000000000D+19
Ｓ１６=0.125000000000000000000000000000D+03 After S602 is executed, the variables S15 and S16 are as follows by S603.
S15 = 0.430406079050710732800000000000D + 19
S16 = 0.125000000000000000000000000000D + 03

Ｓ６０３が実行された後、Ｓ６０４により、変数Ｓ１７は以下のようになる。
Ｓ１７=0.221000000000000000000000000000D+03 After S603 is executed, the variable S17 becomes as follows by S604.
S17 = 0.221000000000000000000000000000D + 03

Ｓ６０４が実行された後、Ｓ６０５により、変数ＳＵＭ、ＤＳＵＭは以下のようになる。
ＳＵＭ= 0.430406079050710732800000000000D+19
ＤＳＵＭ=0.221000000000000000000000000000D+03 After S604 is executed, the variables SUM and DSUM are as follows by S605.
SUM = 0.430406079050710732800000000000D + 19
DSUM = 0.221000000000000000000000000000D + 03

最後にＳ２０６において、変数ＳＵＭとＤＳＵＭの和をＳＵＭに代入することにより、変数ＳＵＭは以下のようになる。
ＳＵＭ=0.430406079050710754900000000000D+19 Finally, in S206, by substituting the sum of the variables SUM and DSUM into SUM, the variable SUM becomes as follows.
SUM = 0.430406079050710754900000000000D + 19

以上より、本発明を用いた場合、内積演算Ａ・Ｂ（本願）は以下に示すような結果となる。
Ａ・Ｂ（本願）=0.4304060790507107549*10¹⁹ From the above, when the present invention is used, the inner product calculation A · B (this application) has the following results.
A ・ B (this application) = 0.4304060790507107549 * 10 ¹⁹

つまり、Ａ・Ｂ（正解）、Ａ・Ｂ（従来）およびＡ・Ｂ（本願）の演算結果を並べて記載すると、次のようになり、Ａ・Ｂ（本願）の精度が極めて高いことがわかる。
Ａ・Ｂ（正解）= 4304060790507107549
Ａ・Ｂ（従来）=0.43040607905071073*10¹⁹ （誤差0.249*10³）
Ａ・Ｂ（本願）=0.4304060790507107549*10¹⁹（誤差０） In other words, when the calculation results of A · B (correct answer), A · B (conventional) and A · B (this application) are described side by side, it becomes as follows, and it can be seen that the accuracy of A · B (this application) is extremely high. .
A ・ B (Correct) = 4304060790507107549
A / B (conventional) = 0.43040607905071073 * 10 ¹⁹ (error 0.249 * 10 ³ )
A ・ B (this application) = 0.4304060790507107549 * 10 ¹⁹ (error 0)

以上で実施形態の説明を終えるが、本発明の態様はこれらに限定されるものではない。
たとえば、ＩＥＥＥ７５４で規定された単精度形式などを用いて演算を行ってもよい。
また、内積演算について説明したが、加算や乗算などの演算のみに適用してもよい。
さらに、ベクトルの次元数は、３次元以上であってもよい。 This is the end of the description of the embodiments, but the aspects of the present invention are not limited to these.
For example, the calculation may be performed using a single precision format defined by IEEE754.
Further, although the inner product operation has been described, the present invention may be applied only to operations such as addition and multiplication.
Furthermore, the number of dimensions of the vector may be three or more.

また、数値を所定の桁数以内にする場合は、切り捨てによるものでなくても、切り上げなどによるものであってもよい。
さらに、数値を所定の桁数以内にする場合は、十進数に変換したときに四捨五入する、などの別の方法を用いたものであってもよい。 Further, when the numerical value is within a predetermined number of digits, it may not be due to rounding down but may be due to rounding up.
Furthermore, when the numerical value is within a predetermined number of digits, another method such as rounding off when converted to a decimal number may be used.

また、減算の演算を行う場合も、加算の場合と同様の演算を行うことで、より正確な演算結果を得ることができる。
その他、ハードウェア構成や処理手順などの具体的な構成について、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。たとえば、２台以上の演算装置を組み合わせて使う場合に適用してもよい。 Also, when performing the subtraction operation, a more accurate calculation result can be obtained by performing the same operation as in the addition operation.
In addition, a specific configuration such as a hardware configuration and a processing procedure can be appropriately changed without departing from the gist of the present invention. For example, the present invention may be applied when two or more arithmetic devices are used in combination.

データ処理装置の構成図である。It is a block diagram of a data processor. データ処理装置の処理を示したフローチャートである。It is the flowchart which showed the process of the data processor. 入力処理Ｓ２０１の詳細図である。It is a detailed view of input processing S201. 初期化処理Ｓ２０２の詳細図である。It is a detailed view of the initialization process S202. 処理Ｓ２０４の詳細図である。It is a detailed view of process S204. 処理Ｓ２０５の詳細図である。It is a detailed figure of processing S205.

Explanation of symbols

１０１補助記憶部
１０２メモリ
１０３ＣＰＵ
１０４入力部
１０５出力部
101 Auxiliary storage unit 102 Memory 103 CPU
104 Input section 105 Output section

Claims

An arithmetic device that performs an operation of a product of a first number and a second number to be multiplied using a binary floating point,
A storage for storing the first number, the second number, and the number generated by the operation as a number within a predetermined number of digits by rounding down or rounding up the bits of a digit lower than the predetermined number of digits. And
Separating the first number stored in the storage unit into a first upper number which is a predetermined upper number digit and a first lower number which is a remaining lower number digit ;
Separating the second number stored in the storage unit into a second upper number that is the predetermined higher-order digits and a second lower number that is the remaining lower-order digits;
Multiplying the first upper number and the second upper number to obtain a first multiplication number;
Multiplying the first upper number and the second lower number to obtain a second multiplication number;
Multiplying the first lower number and the second upper number to obtain a third multiplication number;
Multiplying the first lower number and the second lower number to obtain a fourth multiplication number;
When the sum is calculated by adding the first multiplication number, the second multiplication number, the third multiplication number, and the fourth multiplication number three times in an arbitrary order, the sum at each time Separately, each error of each time is stored in the storage unit, and the total error is calculated by adding all the errors of each time, and the total error is added to the sum after the three times of addition. And a processing unit that calculates a total error that is an error at that time ,
An arithmetic device comprising:

When performing an inner product operation of two vectors of one dimension or more,
  The processor is
  Calculate the sum and the sum error for each dimension, and add them to perform the inner product operation of the two vectors
  The arithmetic unit according to claim 1.

An operation method in an operation device that performs an operation of a product of a first number and a second number to be multiplied using a binary floating point,
  The arithmetic unit may calculate the first number, the second number, and the number generated by the calculation within a predetermined number of digits by rounding down or rounding up the bits of the digits lower than the predetermined number of digits. A storage unit for storing numbers and a processing unit;
  The processor is
  Separating the first number stored in the storage unit into a first upper number which is a predetermined upper number digit and a first lower number which is a remaining lower number digit;
  Separating the second number stored in the storage unit into a second upper number that is the predetermined higher-order digits and a second lower number that is the remaining lower-order digits;
  Multiplying the first upper number and the second upper number to obtain a first multiplication number;
  Multiplying the first upper number and the second lower number to obtain a second multiplication number;
  Multiplying the first lower number and the second upper number to obtain a third multiplication number;
  Multiplying the first lower number and the second lower number to obtain a fourth multiplication number;
  When the sum is calculated by adding the first multiplication number, the second multiplication number, the third multiplication number, and the fourth multiplication number three times in an arbitrary order, the sum at each time Separately, each error of each time is stored in the storage unit, and the total error is calculated by adding all the errors of each time, and the total error is added to the sum after the three times of addition. And the total error, which is the error at that time, is also calculated
  An arithmetic method characterized by the above.

When performing an inner product operation of two vectors of one dimension or more,
  The processor is
  Calculate the sum and the sum error for each dimension, and add them to perform the inner product operation of the two vectors
  The calculation method according to claim 3.