JP5301675B2

JP5301675B2 - Method and apparatus for performing RAID processing

Info

Publication number: JP5301675B2
Application number: JP2011533440A
Authority: JP
Inventors: グエロン，シャイ
Original assignee: インテルコーポレイション
Priority date: 2008-12-19
Filing date: 2009-12-04
Publication date: 2013-09-25
Anticipated expiration: 2029-12-04
Also published as: KR101245056B1; US20100158241A1; EP2359234A4; CN102171646A; WO2010080263A3; WO2010080263A2; US8150031B2; CN102171646B; KR20110050723A; EP2359234A2; JP2012507212A; EP2359234B1

Description

本開示は、ＲＡＩＤ（ＲｅｄｕｎｄａｎｔＡｒｒａｙｏｆＩｎｄｅｐｅｎｄｅｎｔＤｉｓｋｓ）に関し、特にレベル６ＲＡＩＤに関する。 The present disclosure relates to RAID (Redundant Array of Independent Disks), and more particularly to level 6 RAID.

ＲＡＩＤは、リライアビリティ、キャパシティ又はパフォーマンスのため、複数の物理的ハードディスクドライブを論理的ドライブに合成する。このため、複数の物理的ハードディスクドライブの代わりに、オペレーティングシステムは、単一の論理的ドライブを参照する。当業者に周知なように、ＲＡＩＤシステムの物理的ハードディスクドライブにデータを分散させるため、ＲＡＩＤレベルと呼ばれる多数の標準的な方法が存在する。 RAID combines multiple physical hard disk drives into logical drives for reliability, capacity or performance. Thus, instead of multiple physical hard disk drives, the operating system refers to a single logical drive. As is well known to those skilled in the art, there are a number of standard methods called RAID levels for distributing data across the physical hard disk drives of a RAID system.

例えば、レベル０ＲＡＩＤシステムでは、データをブロックに分割し、各ブロックを別々のハードディスクドライブに書き込むことによって、データが物理的ハードディスクドライブアレイにストライピングされる。入出力（Ｉ／Ｏ）パフォーマンスは、多数のハードディスクドライブにロードを分散させることによって向上する。レベル０ＲＡＩＤはＩ／Ｏパフォーマンスを向上させるが、１つのハードディスクドライブが故障した場合にはすべてのデータが失われるため、冗長性を提供しない。 For example, in a level 0 RAID system, data is striped into a physical hard disk drive array by dividing the data into blocks and writing each block to a separate hard disk drive. Input / output (I / O) performance is improved by distributing the load across multiple hard disk drives. Level 0 RAID improves I / O performance but does not provide redundancy because if one hard disk drive fails, all data is lost.

レベル５ＲＡＩＤシステムは、データとパリティ情報との両方を少なくとも３つのハードディスクドライブにストライピングすることによって、高いレベルの冗長性を提供する。データストライピングは、故障した場合のリカバリパスを提供するため、分散化されたパリティと合成される。 Level 5 RAID systems provide a high level of redundancy by striping both data and parity information to at least three hard disk drives. Data striping is combined with distributed parity to provide a recovery path in case of failure.

レベル６ＲＡＩＤ（ＲＡＩＤ−６）は、２つのディスクの故障からのリカバリを可能にすることによって、レベル５ＲＡＩＤシステムよりさらに高いレベルの冗長性を提供する。レベル６ＲＡＩＤシステムでは、ＰシンドロームとＱシンドロームと呼ばれる２つのシンドロームが、データに対して生成され、ＲＡＩＤシステムのハードディスクドライブに格納される。 Level 6 RAID (RAID-6) provides a higher level of redundancy than a Level 5 RAID system by allowing recovery from the failure of two disks. In the level 6 RAID system, two syndromes called P syndrome and Q syndrome are generated for data and stored in the hard disk drive of the RAID system.

Ｐシンドロームは、ストライプにおけるデータのパリティ情報を計算することによって生成される（データブロック（ストリップ）、Ｐシンドロームブロック及びＱシンドロームブロック）。Ｑシンドロームの生成は、ガロア体（ＧａｌｏｉｓＦｉｅｌｄ）乗算を必要とし、ディスクドライブの故障のイベントでは複雑である。ガロア体（有限体）の計算ＧＦ（２^８）は、還元多項式ｘ^８＋ｘ^４＋ｘ^３＋ｘ＋１（すなわち、１１Ｂ（１６進法による））を介し定義される。 P syndrome is generated by calculating parity information of data in a stripe (data block (strip), P syndrome block, and Q syndrome block). The generation of Q syndrome requires Galois Field multiplication and is complicated in the event of a disk drive failure. The Galois field (finite field) calculation GF (2 ⁸ ) is defined via the reduction polynomial x ⁸ + x ⁴ + x ³ + x + 1 (ie, 11B (in hexadecimal)).

ディスクリカバリ処理中に実行されるデータ、Ｐシンドローム及び／又はＱシンドロームをリカバリするための再生成方式は、ガロア体乗算と逆演算との両方を必要とする。 The regeneration scheme for recovering data, P syndrome and / or Q syndrome executed during the disk recovery process requires both Galois field multiplication and inverse operation.

例えば、ｎ個のデータディスクＤ０，Ｄ１，Ｄ２，．．．，Ｄｎ−１（ｎ≦２５５）を備えたＲＡＩＤアレイでは、２つの数量、すなわち、パリティ（Ｐ）とリードソロモンコード（Ｑ）とが、２つのディスクの消失からリカバリするために必要とされる。 For example, n data disks D0, D1, D2,. . . , Dn−1 (n ≦ 255), two quantities are required to recover from the loss of two disks, two quantities: parity (P) and Reed-Solomon code (Q). .

Ｐ及びＱは、 P and Q are

により定義される。ここで、ｇ＝｛０２｝は、ガロア体（有限体）ＧＦ（２^８）の要素であり、“＋”及び“・”はこの体における演算である。

Defined by Here, g = {02} is an element of a Galois field (finite field) GF (2 ⁸ ), and “+” and “•” are operations in this field.

ＲＡＩＤ−６システムに関する計算上のボトルネックは、Ｑを計算するコストである。この困難さは、従来のプロセッサ（ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ））はガロア体（有限体）ＧＦ（２^８）における計算についてパフォーマンスが低いという事実から生じている。従って、典型的には、パフォーマンスを向上させるため、テーブルルックアップベースアルゴリズムが利用される。テーブルルックアップの利用は、本来的に低速なシリアルプロセスをもたらす。 The computational bottleneck for RAID-6 systems is the cost of calculating Q. This difficulty stems from the fact that conventional processors (CPU (Central Processing Unit)) have poor performance for computations in Galois field (finite field) GF (2 ⁸ ). Therefore, table lookup based algorithms are typically used to improve performance. The use of table lookup results in an inherently slow serial process.

請求される主題の各実施例の特徴は、同様の数字が同様の部分を示す図面を参照して以下の詳細な説明を参照することにより明らかになる。
図１は、各ストライプがハードディスクアレイにストライピングされるデータブロック（ストリップ）、Ｐシンドローム及びＱシンドロームを有する複数のストライプを示すＲＡＩＤ−６アレイの実施例を示すブロック図である。図２は、汎用プロセッサにおいてＡＥＳ暗号化及び解読を実行するための命令を有するシステムのブロック図である。図３は、図１に示されるプロセッサの実施例のブロック図である。図４は、本発明の原理によるガロア体乗算を実行する方法の実施例のフローチャートである。図５Ａは、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令の利用を示す。図５Ｂは、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令の利用を示す。図５Ｃは、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令の利用を示す。図６Ａは、ガロア体乗算が複数の１６バイトデータブロック上で同時に実行されることを可能にするサンプルコードである。図６Ｂは、ガロア体乗算が複数の１６バイトデータブロック上で同時に実行されることを可能にするサンプルコードである。図６Ｃは、ガロア体乗算が複数の１６バイトデータブロック上で同時に実行されることを可能にするサンプルコードである。以下の詳細な説明は請求された主題の例示的な実施例を参照して進められるが、当業者には多数の代替、改良及び変更が明らかであろう。従って、請求された主題は広くみなされるべきであり、添付した請求項に与えられるものによってのみ規定されるものである。 The features of each embodiment of the claimed subject matter will become apparent by reference to the following detailed description, taken in conjunction with the drawings, in which like numerals indicate like parts.
FIG. 1 is a block diagram illustrating an example of a RAID-6 array showing a plurality of stripes having data blocks (strips), each having a P syndrome and a Q syndrome, with each stripe striped into a hard disk array. FIG. 2 is a block diagram of a system having instructions for performing AES encryption and decryption in a general purpose processor. FIG. 3 is a block diagram of an embodiment of the processor shown in FIG. FIG. 4 is a flowchart of an embodiment of a method for performing Galois field multiplication according to the principles of the present invention. FIG. 5A illustrates the use of a packed shuffle byte (PSHUFB) instruction. FIG. 5B illustrates the use of a packed shuffle byte (PSHUFB) instruction. FIG. 5C illustrates the use of a packed shuffle byte (PSHUFB) instruction. FIG. 6A is sample code that allows Galois field multiplication to be performed simultaneously on multiple 16-byte data blocks. FIG. 6B is sample code that allows Galois field multiplication to be performed simultaneously on multiple 16-byte data blocks. FIG. 6C is sample code that allows Galois field multiplication to be performed simultaneously on multiple 16-byte data blocks. The following detailed description proceeds with reference to exemplary embodiments of the claimed subject matter, but numerous alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, the claimed subject matter is to be regarded broadly and is defined only by the terms given in the appended claims.

図１は、各ストライプがハードディスク１５０のアレイ全体にストライピングされたデータブロック（ストリップ）、Ｐシンドローム及びＱシンドロームを含む複数のストライプを示すＲＡＩＤ−６アレイ１００の実施例を示すブロック図である。図示される実施例では、ＲＡＩＤアレイ１００は、５つのハードディスク１５０を有する。ラウンドロビン方式によりメンバーのハードディスク全体に分散されたＰ及びＱシンドロームによるブロックレベルのストライピングを用いて、データがＲＡＩＤ−６アレイに書き込まれる。ブロックにセグメント化されたファイルなどのシーケンシャルデータが、データディスク１５０の３つのデータブロック１０２，１０４，１０６に格納されるブロックの１つを有するホリゾンタルストライプ０などのストライプ全体に分散されてもよい。一実施例では、ストライプの各ブロックには５１２バイトがある。 FIG. 1 is a block diagram illustrating an embodiment of a RAID-6 array 100 showing a plurality of stripes including data blocks (strips), each of which is striped across an array of hard disks 150, P syndrome, and Q syndrome. In the illustrated embodiment, the RAID array 100 has five hard disks 150. Data is written to the RAID-6 array using block level striping with P and Q syndromes distributed across the member hard disks in a round robin fashion. Sequential data, such as files segmented into blocks, may be distributed across stripes such as horizontal stripe 0 having one of the blocks stored in the three data blocks 102, 104, 106 of the data disk 150. In one embodiment, there are 512 bytes in each block of the stripe.

ホリゾンタルストライプ０のデータブロック１０２，１０４，１０６について計算されたＰ及びＱシンドロームは、ストライプ０のＰブロック１３０とＱブロック１３２にそれぞれ格納される。Ｐ及びＱシンドロームブロックは、各ストライプの異なるハードディスク１５０に格納される。 The P and Q syndromes calculated for the data blocks 102, 104, and 106 of the horizontal stripe 0 are stored in the P block 130 and the Q block 132 of the stripe 0, respectively. P and Q syndrome blocks are stored on different hard disks 150 in each stripe.

Ｐシンドロームは、排他的ＯＲ（ＸＯＲ）演算を実行することにより生成されてもよい。ＸＯＲは、オペランドの１つのみが“１”の論理値を有する場合に限って、論理値“１”を生じさせる２つのオペランドに対する論理演算である。例えば、“１１００１０１０”の値を有する第１オペランドと、“１０００００１１”の値を有する第２オペランドとのＸＯＲは、“０１００１００１”の値を有する結果を提供する。第１オペランドを格納するハードディスクが故障した場合、第１オペランドは、第２オペランドとこの結果とに対してＸＯＲ演算を実行することによってリカバリされてもよい。 The P syndrome may be generated by performing an exclusive OR (XOR) operation. XOR is a logical operation on two operands that yields a logical value “1” only if only one of the operands has a logical value of “1”. For example, an XOR of a first operand having a value of “11001010” and a second operand having a value of “10000011” provides a result having a value of “01001001”. If the hard disk storing the first operand fails, the first operand may be recovered by performing an XOR operation on the second operand and the result.

Ｐシンドロームは、 P syndrome is

の（ＸＯＲ）演算を用いてストライプ全体に対して計算されるデータ（Ｄ）のシンプルなパリティである。ｎ個のデータディスクを有するシステムでは、Ｐシンドロームの生成は、以下の式１により表される。

This is a simple parity of data (D) calculated for the entire stripe using the (XOR) operation of. In a system having n data disks, the generation of P syndrome is expressed by the following Equation 1.

Ｑシンドロームの計算は、ガロア体多項式（ｇ）を用いた乗算（・）を要求する。極めて高いパフォーマンスにより８ビット（バイト）ガロア体多項式に対して、算出演算が実行される。多項式は、有限個の定数と変数とが加算、減算、乗算及び非負の整数の指数のみを用いて結合された式である。１つの原始多項式は、ｘ^８＋ｘ^４＋ｘ^３＋ｘ＋１である。多項式に対するガロア体（ＧＦ）演算はまた、ＧＦ（２^８）計算とも呼ばれる。ｎ個のデータディスクを有するシステムでは、Ｑシンドロームの生成は、以下の式２により表される。

The calculation of the Q syndrome requires multiplication (•) using a Galois field polynomial (g). Calculation operations are performed on 8-bit (byte) Galois field polynomials with extremely high performance. A polynomial is an expression in which a finite number of constants and variables are combined using only addition, subtraction, multiplication, and non-negative integer exponents. One primitive polynomial is x ⁸ + x ⁴ + x ³ + x + 1. Galois field (GF) operations on polynomials are also referred to as GF (2 ⁸ ) calculations. In a system having n data disks, the generation of the Q syndrome is expressed by the following Equation 2.

バイト単位のガロア体演算は、ブロックの各バイトがその他のバイトから計算上独立しているストライプベースにより実行される。バイト単位のガロア体演算は、２５５（２^８−１）個のデータディスクまで収容可能である。

Byte-wise Galois field operations are performed on a stripe base where each byte of the block is computationally independent from the other bytes. A byte unit Galois field operation can accommodate up to 255 (2 ⁸ -1) data disks.

Ｑシンドロームの生成のパフォーマンスは、以下の式３により表されるように、ＱをそれのＨｏｒｎｅｒ表現により表すことによって向上されてもよい。 The performance of Q syndrome generation may be improved by representing Q by its Horner representation, as represented by Equation 3 below.

従って、２つの演算がＱを計算するのに用いられる。

Accordingly, two operations are used to calculate Q.

式２に示される計算とは対照的に、式３の計算はＧＦ２５６の一般的な乗算を必要としない。その代わりに、乗算はｇ｛０２｝によるものである。１バイトについて、ｇ＝｛０２｝との乗算は、値を左に１ビットだけシフトすることによって実行可能である。その後、条件付の排他的ＯＲ（ＸＯＲ）演算が、乗算の結果と、当該結果の最上位ビットの状態に基づく他の値とに対して実行される。一時に４バイトをパラレルに計算するため、｛０２｝による乗算は、４バイトに格納されている値を左に１ビットだけシフトし、後述されるようにバイト毎に４つの条件付ＸＯＲ演算を実行することによって実行される。

In contrast to the calculation shown in Equation 2, the calculation in Equation 3 does not require a general multiplication of GF256. Instead, multiplication is by g {02}. For one byte, multiplication with g = {02} can be performed by shifting the value to the left by one bit. A conditional exclusive OR (XOR) operation is then performed on the result of the multiplication and other values based on the state of the most significant bit of the result. In order to calculate 4 bytes in parallel at a time, multiplication by {02} shifts the value stored in 4 bytes by 1 bit to the left, and performs four conditional XOR operations for each byte as described below. It is executed by executing.

“＆０ｘｆｅｆｅｆｅｆｅ”は、所望されないキャリを回避するためのマスクである。しかしながら、条件付ＸＯＲ演算はあまり効率的でない。その計算時間は、４バイトの代わりに８バイトをパラレルに処理し、８バイトのそれぞれにおける最上位ビット（ＭＳＢ）に基づくマスクを用いることによって、低減されてもよい。

“& 0xfefefe” is a mask for avoiding an unwanted carry. However, conditional XOR operations are not very efficient. The computation time may be reduced by processing 8 bytes in parallel instead of 4 bytes and using a mask based on the most significant bit (MSB) in each of the 8 bytes.

ＲＡＩＤ−６アルゴリズムのＱシンドロームの計算は、ＦｅｄｅｒａｌＩｎｆｏｒｍａｔｉｏｎＰｒｏｃｅｓｓｉｎｇＳｔａｎｄａｒｄ（ＦＩＰＳ）１９７としてＮａｔｉｏｎａｌＩｎｓｔｉｔｕｔｅｏｆＳｔａｎｄａｒｄｓａｎｄＴｅｃｈｎｏｌｏｇｙ（ＮＩＳＴ）により公表されたＡｄｖａｎｃｅｄＥｎｃｒｙｐｔｉｏｎＳｔａｎｄａｒｄ（ＡＥＳ）のために用いられるＧＦ（２^８）の同一表現を利用する。ＡＥＳは、情報を暗号化及び解読可能なシンメトリックブロック暗号化である。 The calculation of the RAID-6 Q syndrome is used by Advanced Institute of Standards (NIST) Advanced GF ⁸ published by National Institute of Standards (NIST) as Federal Information Processing Standard (FIPS) 197. Use the same expression. AES is symmetric block encryption that can encrypt and decrypt information.

一実施例では、ＧＦ（２^８）を用いるＡＥＳ命令は、本発明の原理によりＲＡＩＤレベル６に対して要求されるＱシンドロームを計算するために、ガロア体乗算演算を実行するため用いられる。 In one embodiment, an AES instruction using GF (2 ⁸ ) is used to perform a Galois field multiplication operation to calculate the Q syndrome required for RAID level 6 in accordance with the principles of the present invention.

図２は、汎用プロセッサにおいてＡＥＳ暗号化及び解読を実行するための命令を含むシステム２００のブロック図である。システム２００は、プロセッサ２０１と、ＭＣＨ（ＭｅｍｏｒｙＣｏｎｔｒｏｌｌｅｒＨｕｂ）又はＧＭＣＨ（ＧｒａｐｈｉｃｓＭｅｍｏｒｙＣｏｎｔｒｏｌｌｅｒＨｕｂ）２０２と、入出力（Ｉ／Ｏ）コントローラハブ（ＩＣＨ）２０４とを有する。ＭＣＨ２０２は、プロセッサ２０１とメモリ２０８との間の通信を制御するメモリコントローラ２０６を有する。プロセッサ２０１とＭＣＨ２０２とは、システムバス２１６を介し通信する。 FIG. 2 is a block diagram of a system 200 that includes instructions for performing AES encryption and decryption in a general purpose processor. The system 200 includes a processor 201, an MCH (Memory Controller Hub) or a GMCH (Graphics Memory Controller Hub) 202, and an input / output (I / O) controller hub (ICH) 204. The MCH 202 includes a memory controller 206 that controls communication between the processor 201 and the memory 208. The processor 201 and the MCH 202 communicate via the system bus 216.

プロセッサ２０１は、シングルコアＩｎｔｅｌ（登録商標）Ｐｅｎｔｉｕｍ（登録商標）ＩＶプロセッサ、シングルコアＩｎｔｅｌＣｅｌｅｒｏｎプロセッサ、Ｉｎｔｅｌ（登録商標）ＸＳｃａｌｅプロセッサ、Ｉｎｔｅｌ（登録商標）Ｐｅｎｔｉｕｍ（登録商標）Ｄ、Ｉｎｔｅｌ（登録商標）Ｘｅｏｎ（登録商標）プロセッサ、Ｉｎｔｅｌ（登録商標）Ｃｏｒｅ（登録商標）Ｄｕｏプロセッサなどのマルチコアプロセッサ、又は他の何れかのタイプのプロセッサの何れかであってもよい。 The processor 201 is a single-core Intel (registered trademark) Pentium (registered trademark) IV processor, single-core Intel Celeron processor, Intel (registered trademark) XScale processor, Intel (registered trademark) Pentium (registered trademark) D, Intel (registered trademark). It may be either a multi-core processor such as a Xeon (R) processor, an Intel (R) Core (R) Duo processor, or any other type of processor.

メモリ２０８は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＳＤＲＡＭ（ＳｙｎｃｈｒｏｎｉｚｅｄＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＤＤＲ２（ＤｏｕｂｌｅＤａｔｅＲａｔｅ２）ＲＡＭ、ＲＤＲＡＭ（ＲａｍｂｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）又は他の何れかのタイプのメモリであってもよい。 Memory 208 includes DRAM (Dynamic Random Access Memory), SRAM (Static Random Access Memory), SDRAM (Synchronized Dynamic Random Access Memory), and DDR2 (Dum2 Random Access Memory). This type of memory may be used.

ＩＣＨ２０４は、ＤＭＩ（ＤｉｒｅｃｔＭｅｄｉａＩｎｔｅｒｆａｃｅ）などの高速チップ・ツー・チップインターコネクト２１４を用いてＭＣＨ２０２に接続されてもよい。ＤＭＩは、２つの一方向のレーンを介し２ギガビット／秒の同時伝送レートをサポートする。 The ICH 204 may be connected to the MCH 202 using a high-speed chip-to-chip interconnect 214 such as DMI (Direct Media Interface). DMI supports a 2 Gigabit / second simultaneous transmission rate over two unidirectional lanes.

ＩＣＨ２０４は、ＲＡＩＤ１００（図１）などの少なくとも１つの記憶装置２１２との通信を制御するためのストレージＩ／Ｏコントローラ２１０を有してもよい。ＩＣＨ２０４は、ＳｅｒｉａｌＡｔｔａｃｈｅｄＳｍａｌｌＣｏｍｐｕｔｅｒＳｙｓｔｅｍＩｎｔｅｒｆａｃｅ（ＳＡＳ）やＳｅｒｉａｌＡｄｖａｎｃｅｄＴｅｃｈｎｏｌｏｇｙＡｔｔａｃｈｍｅｎｔ（ＳＡＴＡ）などのシリアルストレージプロトコルを用いて、ストレージプロトコルインターコネクト２１８を介し記憶装置２１２と通信してもよい。 The ICH 204 may include a storage I / O controller 210 for controlling communication with at least one storage device 212 such as RAID 100 (FIG. 1). The ICH 204 may use a serial storage protocol such as Serial Attached Small Computer System Interface (SAS) or Serial Advanced Technology Attachment (SATA) to communicate with the storage protocol interconnect 218.

プロセッサ２０１は、ＡＥＳ暗号化及び解読処理を実行するＡＥＳ機能２０３を有する。ＡＥＳ機能２０３は、メモリ２０８及び／又は記憶装置２１２に格納される情報を暗号化又は解読するのに利用される。 The processor 201 has an AES function 203 that executes AES encryption and decryption processing. The AES function 203 is used to encrypt or decrypt information stored in the memory 208 and / or the storage device 212.

暗号化は、秘密鍵（暗号鍵）を用いて“平文”と呼ばれる理解できるデータを“暗号文”と呼ばれる理解できない形式に変換する一連の変換処理を実行する。暗号化における変換処理は、（１）ラウンド鍵（暗号鍵から導出される値）を排他的ＯＲ（ＸＯＲ）演算を用いた状態（２次元バイトアレイ）に加算し、（２）非線形バイト置換テーブル（Ｓ−Ｂｏｘ）を用いて当該状態を処理し、（３）状態の最後の３列を異なるオフセットだけ循環的にシフトし、（４）状態のすべての列を抽出し、ミックスカラム変換と呼ばれる新たな列を生成するため、それらのデータを合成する（互いに独立して）、ことを含む。これら４つの変換処理は、後述される単一のＡＥＳ命令により実行される。 Encryption performs a series of conversion processes for converting understandable data called “plaintext” into an incomprehensible format called “ciphertext” using a secret key (encryption key). In the conversion process in encryption, (1) a round key (value derived from the encryption key) is added to a state (two-dimensional byte array) using an exclusive OR (XOR) operation, and (2) a nonlinear byte replacement table. Process the state using (S-Box), (3) cyclically shift the last three columns of the state by different offsets, (4) extract all the columns of the state, referred to as mix column transformation Combining these data (independent of each other) to generate a new column. These four conversion processes are executed by a single AES instruction described later.

ミックスカラム変換では、状態のすべての列からのデータが、新たな列を生成するため合成される（互いに独立して）。ミックスカラムは、１２８ビット（１６バイト）入力の４×４マトリックス表現の列上で実行される１２８ビット→１２８ビット変換である。当該変換は、各列をＡＥＳ−ガロア体２５６の係数を有する３次の多項式として扱う。状態の４×４のマトリックス表現の各列は、多項式ａ（ｘ）＝｛０３｝ｘ^３＋｛０１｝ｘ^２＋｛０１｝ｘ＋｛０２｝と乗算され、ｘ^４＋１とｒｅｄｕｃｅｄｍｏｄｕｌｏされる。１２８ビットから１２８ビットへのミックスカラム変換は、１６バイトから１６バイトへの変換である。例えば、１６バイト（状態）は、［ｐ，ｏ，ｎ，ｍ，ｉ，ｋ，ｊ，ｉ，ｈ，ｇ，ｆ，ｅ，ｄ，ｃ，ｂ，ａ］として示されてもよく、ここで、ａが最下位バイトであり、状態は４つの列を有し、各列は３２ビットダブルワード（４バイト）である。 In mixed column transformations, data from all the columns in the state are combined (independent of each other) to generate a new column. The mix column is a 128 bit → 128 bit conversion performed on a 4 × 4 matrix representation column with 128 bits (16 bytes) input. This transformation treats each column as a third-order polynomial having coefficients of AES-Galois field 256. Each column of the 4 × 4 matrix representation of the state is multiplied by the polynomial a (x) = {03} x ³ + {01} x ² + {01} x + {02} and is reduced modulo x ⁴ +1. . The mixed column conversion from 128 bits to 128 bits is a conversion from 16 bytes to 16 bytes. For example, 16 bytes (state) may be indicated as [p, o, n, m, i, k, j, i, h, g, f, e, d, c, b, a], where Where a is the least significant byte, the state has four columns, and each column is a 32-bit doubleword (4 bytes).

ミックスカラム変換は、ＧＦ（２^８）計算（モジュロｘ^８＋ｘ^４＋ｘ^３＋ｘ＋１）に基づくマトリックス乗算である。従って、ミックスカラム変換は、後述されるようにレベル６ＲＡＩＤシステムのためのＱシンドロームを計算するため、ガロア体乗算機能２５０により利用されてもよい。ミックスカラム乗算を利用するため、ミックスカラム変換は、ＡＥＳ命令から隔離される。 The mix column transformation is a matrix multiplication based on the GF (2 ⁸ ) calculation (modulo x ⁸ + x ⁴ + x ³ + x + 1). Accordingly, the mix column transform may be utilized by the Galois field multiplication function 250 to calculate the Q syndrome for the level 6 RAID system as described below. In order to take advantage of mix column multiplication, the mix column transformation is isolated from the AES instruction.

ミックスカラム変換は、状態の４つの列に対して個別に実行される。４つの列は、（１）［ｐ，ｏ，ｎ，ｍ］、（２）［ｉ，ｋ，ｊ，ｉ］、（３）［ｈ，ｇ，ｆ，ｅ］及び（４）［ｄ，ｃ，ｂ，ａ］である。 Mix column transformations are performed individually on the four columns of states. The four columns are (1) [p, o, n, m], (2) [i, k, j, i], (3) [h, g, f, e] and (4) [d, c, b, a].

［ｐ，ｏ，ｎ，ｍ，ｉ，ｋ，ｊ，ｉ，ｈ，ｇ，ｆ，ｅ，ｄ，ｃ，ｂ，ａ］に対するミックスカラム変換の結果は、テーブル１に示されるように、［ｐ’，ｏ’，ｎ’，ｍ’，ｉ’，ｋ’，ｊ’，ｉ’，ｈ’，ｇ’，ｆ’，ｅ’，ｄ’，ｃ’，ｂ’，ａ’］である。 As shown in Table 1, the result of the mix column conversion for [p, o, n, m, i, k, j, i, h, g, f, e, d, c, b, a] p ′, o ′, n ′, m ′, i ′, k ′, j ′, i ′, h ′, g ′, f ′, e ′, d ′, c ′, b ′, a ′]. .

テーブル１に示されるように、４つの列のそれぞれに対して同一の処理が実行される。

As shown in Table 1, the same processing is executed for each of the four columns.

従って、これらの処理が各ダブルワード（列）について同様のものであると仮定すると、４つの列の１つに対するミックスカラム変換を記述するため、簡単な表記が利用されてもよい（例えば、最下位ダブルワードであるカラム４など）。 Thus, assuming that these processes are similar for each doubleword (column), a simple notation may be used to describe the mix column transformation for one of the four columns (eg, Column 4 which is the lower double word).

カラム４について、ダブルワード（ｄｗｏｒｄ）＝［ｄ，ｃ，ｂ，ａ］であり、簡単な表記によるミックスカラム変換は、以下に示されるように表記される。 For column 4, doubleword (dword) = [d, c, b, a], and the mixed column conversion by simple notation is expressed as shown below.

図３は、図２に示されるプロセッサ２０１の実施例のブロック図である。プロセッサ２０１は、レベル１（Ｌ１）命令キャッス３０２から受け取ったプロセッサ命令を復号化するためのフェッチ復号化ユニット３０６を有する。プロセッサ命令を実行するため用いられるデータは、レジスタファイル３０８に格納されてもよい。一実施例では、レジスタファイル３０８は、ＡＥＳ命令により使用されるデータを格納するため、ＡＥＳ命令により用いられる複数の１２８ビットレジスタを有する。

FIG. 3 is a block diagram of an embodiment of the processor 201 shown in FIG. The processor 201 includes a fetch decoding unit 306 for decoding processor instructions received from the level 1 (L1) instruction cache 302. Data used to execute processor instructions may be stored in register file 308. In one embodiment, register file 308 includes a plurality of 128-bit registers used by AES instructions to store data used by AES instructions.

一実施例では、レジスタファイル３０８は、Ｓｔｒｅａｍｉｎｇ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ（ＳＩＭＤ））Ｅｘｔｅｎｓｉｏｎ（ＳＳＥ）命令セットを有するＩｎｔｅｌＰｅｎｔｉｕｍ（登録商標）ＭＭＸプロセッサに備えられている１２８ビットＭＭＸレジスタに類似した１２８ビットレジスタ群である。ＳＩＭＤプロセッサでは、１つの１２８ビットブロックが一度にロードされることによって、データが１２８ビットブロックで処理される。 In one embodiment, register file 308 is a 128-bit MMX register similar to the Intel Pentium® MMX processor with Streaming (Single Instruction Multiple Data (SIMD)) Extension (SSE) instruction set. A group of registers. In a SIMD processor, data is processed in 128-bit blocks by loading one 128-bit block at a time.

フェッチ復号化ユニット３０６は、Ｌ１命令キャッシュ３０２からマクロ命令をフェッチし、マクロ命令を復号化し、マイクロコードＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）３１４に格納されるマイクロ処理（μｏｐｓ）と呼ばれるシンプルな処理に分割する。パイプライン化された実行ユニット３１０が、マイクロ処理をスケジューリング及び実行する。図示された実施例では、実行ユニット３１０のＡＥＳ機能２０３は、ＡＥＳ命令のためのマイクロ命令を含む。処理準備されたデータがある場合、プロセッサ（ＣＰＵ）がすべてのサイクルにおいて命令を送ってもよいように、ＡＥＳ命令は完全にパイプライン化される。リタイアメントユニット３１２は、実行されたＡＥＳ命令の結果をレジスタ又はメモリに書き込む。ＡＥＳ命令により用いられるラウンドキー３１６は、Ｌ１データキャッシュ３０４に格納され、ＡＥＳ命令の何れかを実行するためマイクロ処理による利用のため、実行ユニット３１０にロードされてもよい。 The fetch decoding unit 306 fetches a macro instruction from the L1 instruction cache 302, decodes the macro instruction, and divides it into simple processing called micro processing (μops) stored in a microcode ROM (Read Only Memory) 314. . A pipelined execution unit 310 schedules and executes microprocessing. In the illustrated embodiment, the AES function 203 of the execution unit 310 includes microinstructions for AES instructions. If there is data ready for processing, the AES instruction is fully pipelined so that the processor (CPU) may send the instruction in every cycle. The retirement unit 312 writes the result of the executed AES instruction to a register or memory. The round key 316 used by the AES instruction may be stored in the L1 data cache 304 and loaded into the execution unit 310 for use by microprocessing to execute any of the AES instructions.

ＡＥＳ命令がフェッチ復号化ユニット３０６により復号化された後、実行ユニット３１０によるＡＥＳ命令の実行は、マイクロコードＲＯＭ３１４に格納されるＡＥＳ命令に係るマイクロ処理を実行することを伴う。 After the AES instruction is decoded by the fetch decoding unit 306, execution of the AES instruction by the execution unit 310 involves executing micro processing related to the AES instruction stored in the microcode ROM 314.

ＡＥＳ命令セットは、暗号化ラウンド、解読ラウンド、暗号化最終ラウンド及び解読最終ラウンドを実行するための個別のＡＥＳ命令を有する。一実施例では、各ＡＥＳ命令は一意的な処理コード（ｏｐｃｏｄｅ）を有する。 The AES instruction set has separate AES instructions for performing an encryption round, a decryption round, an encryption final round, and a decryption final round. In one embodiment, each AES instruction has a unique processing code (opcode).

ＡＥＳ命令セットは、テーブル２に示されるように、４つのＡＥＳ命令（ｅｎｃｒｙｐｔ，ｄｅｃｒｙｐｔ，ｅｎｃｒｙｐｔｌａｓｔｒｏｕｎｄ，ｄｅｃｒｙｐｔｌａｓｔｒｏｕｎｄ）を有する。ＡＥＳ命令セットのＡＥＳ命令は、最終ラウンドを除くすべてのラウンドに用いられる暗号化及び解読ラウンド処理を実行するための単一のラウンド処理を含む。 As shown in Table 2, the AES instruction set has four AES instructions (encrypt, decrypt, encrypt last round, and decrypt last round). The AES instruction of the AES instruction set includes a single round process to perform the encryption and decryption round processes used for all rounds except the final round.

例えば、テーブル２のＡＥＳＥＮＣシングルラウンド命令では、入力データは１２８ビットレジスタ（ｘｍｍｓｒｃｄｓｔ）に格納され、ラウンドキーは他の１２８ビットレジスタ（ｘｍｍ）に格納される。この命令は、１２８ビットｘｍｍｓｒｃｄｓｔレジスタに格納される入力データ（ソース）に対して１つのＡＥＳ暗号化ラウンドの一連の４つの変換処理を実行し、ラウンド処理の実行結果によって１２８ビットｘｍｍｓｒｃｄｓｔレジスタに格納される入力データを上書きする。このため、ｘｍｍｓｒｃｄｓｔはまず、入力データを格納し、その後にＡＥＳラウンド処理の結果を格納する。

For example, in the AESENC single round instruction in Table 2, the input data is stored in a 128-bit register (xmmsrcdst), and the round key is stored in another 128-bit register (xmm). This instruction performs a series of four conversion processes of one AES encryption round on the input data (source) stored in the 128-bit xmmsrcdst register, and is stored in the 128-bit xmmsrcdst register according to the execution result of the round process. Overwrite the input data. For this reason, xmmsrcdst first stores the input data, and then stores the result of the AES round process.

テーブル２に示されるように、ＦＯＰＩＳパブリケーション１９７の用語を用いて、１２８ビット→１２８ビット変換の対応するシーケンスが説明される。暗号化ラウンドの変換シーケンスは、以下を含む。
（１）ＡｄｄＲｏｕｎｄＫｅｙ変換：ラウンドキー（暗号鍵から導出された値）が、排他的ＯＲ（ＸＯＲ）演算を用いて状態（２次元１２８ビットバイトアレイ）に加えられる。ＡｄｄＲｏｕｎｄＫｅｙは、それの２つの引数のビット単位のＸＯＲとして規定される（１２８ビット，１２８ビット）→１２８ビット変換である。ＡＥＳフローでは、これらの引数は状態とラウンドキーである。
（２）ＳｕｂＢｙｔｅｓ変換：状態は、非線形バイト置換テーブル（Ｓ−Ｂｏｘ）を用いて処理される。ＳｕｂＢｙｔｅｓは、入力の１６バイトのそれぞれにＳ−ｂｏｘ変換を適用することによって規定される１６バイトから１６バイト（バイト単位）への変換である。Ｓ−ｂｏｘ変換は、以下のようなルックアップテーブルを介し表すことができる。ルックアップテーブルへの入力は、バイトＢ［７：０］である。ただし、ｘ及びｙは低ニブル及び高ニブルを示し、ｘ［３：０］＝Ｂ［７：４］，ｙ［３：０］＝Ｂ［３：０］である。出力バイトは、１６進数（Ｈ）表記による２桁の数としてテーブルに符号化される。例えば、入力８５Ｈは９７Ｈをもたらす。
（３）ＳｈｉｆｔＲｏｗｓ変換：状態の最後の３つの行は異なるオフセットだけ循環的にシフトされる。ＳｈｉｆｔＲｏｗｓは、以下のバイト単位の順列である。 As shown in Table 2, using the terminology of FOPIS Publication 197, the corresponding sequence of 128 bit → 128 bit conversion is described. The encryption round conversion sequence includes:
(1) AddRoundKey transformation: A round key (value derived from an encryption key) is added to a state (two-dimensional 128-bit byte array) using an exclusive OR (XOR) operation. AddRoundKey is defined as a bitwise XOR of its two arguments (128 bits, 128 bits) → 128 bit conversion. In the AES flow, these arguments are state and round key.
(2) SubBytes transformation: the state is processed using a non-linear byte replacement table (S-Box). SubBytes is a conversion from 16 bytes to 16 bytes (byte unit) defined by applying S-box conversion to each of the 16 bytes of input. The S-box conversion can be expressed through a lookup table as follows. The input to the lookup table is bytes B [7: 0]. However, x and y indicate a low nibble and a high nibble, and x [3: 0] = B [7: 4] and y [3: 0] = B [3: 0]. The output byte is encoded in the table as a 2-digit number in hexadecimal (H) notation. For example, input 85H results in 97H.
(3) ShiftRows transformation: The last three rows of the state are cyclically shifted by different offsets. ShiftRows is a permutation of the following byte units.

この変換は、状態の４×４のマトリックス表現に対する処理としてみなされる。４×４のマトリックスの第１行は変更されない。第２行は、１バイトポジションだけ左回転される。第３行は、２バイトポジションだけ左回転される。第４行は、３バイトポジションだけ左回転される。
（４）ミックスカラム変換：状態のすべての列からのデータが、新たな列を生成するためミックスされる（互いに独立して）。ミックスカラムは、１２８ビット（１６バイト）入力の４×４のマトリックス表現の列に対して実行される１２８ビット→１２８ビットへの変換である。この変換は、各列をＡＥＳガロア体２５６の係数を有する３次の多項式として扱う。状態の４×４のマトリックス表現の各列は、多項式ａ（ｘ）＝｛０３｝ｘ^３＋｛０１｝ｘ^２＋｛０１｝ｘ＋｛０２｝と乗算され、ｘ^４＋１の還元モジュローされる。

This transformation is viewed as a process for a 4 × 4 matrix representation of the state. The first row of the 4x4 matrix is not changed. The second line is rotated left by one byte position. The third line is rotated left by 2 byte positions. The fourth line is rotated left by 3 byte positions.
(4) Mixed column conversion: Data from all columns in the state are mixed (independent of each other) to generate a new column. The mix column is a 128 bit → 128 bit conversion performed on a 4 × 4 matrix representation column with 128 bits (16 bytes) input. This conversion treats each column as a cubic polynomial having coefficients of AES Galois field 256. Each column of the 4 × 4 matrix representation of the state is multiplied by the polynomial a (x) = {03} x ³ + {01} x ² + {01} x + {02} and reduced modulo x ⁴ +1. .

テーブル２に示されるように、ＡＥＳ最終暗号化ラウンド命令ＡＥＳＥＮＣＬＡＳＴは、ミックスカラム変換を実行しない。 As shown in Table 2, the AES final encryption round instruction AESENCLAST does not perform mix column conversion.

解読（逆暗号化）は、暗号鍵を用いて“暗号文”を同一サイズの“平文”に変換する一連の変換処理を実行する。逆暗号化における変換処理は、暗号化の変換処理の逆である。 Decryption (reverse encryption) executes a series of conversion processes for converting “ciphertext” into “plaintext” of the same size using an encryption key. The conversion process in reverse encryption is the reverse of the conversion process in encryption.

上述された解読ラウンドの変換シーケンスは、テーブル２に示されるように、単一のＡＥＳ解読ラウンド命令ＡＥＳＤＥＣにより実行され、最後の解読ラウンドに対しては、単一のＡＥＳ最終解読ラウンド命令ＡＥＳＤＥＣＣＬＡＳＴにより実行される。 The decryption round conversion sequence described above is performed by a single AES decryption round instruction AESDEC, as shown in Table 2, and for the last decryption round, by a single AES final decryption round instruction AESDECCLAST. Is done.

ＡＥＳ暗号化及び解読命令を含む命令の組み合わせは、隔離された変換としてＡＥＳアルゴリズムのサブステップ（変換）を取得するのに利用されてもよい。隔離された変換は、暗号化ＡＥＳ命令（ＡＥＳＥＮＣ，ＡＥＳＥＮＣＬＡＳＴ）により用いられるＳｈｉｆｔＲｏｗｓ、ＳｕｂｓｔｉｔｕｔｅＢｙｔｅｓ及びミックスカラムを含む。 A combination of instructions including AES encryption and decryption instructions may be used to obtain sub-steps (transformations) of the AES algorithm as isolated transforms. Isolated transformations include Shift Rows, Substitute Bytes, and Mix columns used by encrypted AES instructions (AESENC, AESENCLAST).

本発明の実施例は、ＡＥＳ暗号化及び解読命令を用いた命令の組み合わせを利用して取得された隔離されたＡＥＳミックスカラム変換を用いて、レベル６ＲＡＩＤのＱシンドロームを計算する。 Embodiments of the present invention compute a level 6 RAID Q syndrome using an isolated AES mix column transform obtained using a combination of instructions using AES encryption and decryption instructions.

図４は、本発明の原理によるガロア体（ＧＦ）乗算を実行する方法の実施例のフローチャートである。 FIG. 4 is a flowchart of an embodiment of a method for performing Galois field (GF) multiplication according to the principles of the present invention.

ミックスカラム変換のためのマイクロ処理は、ＡＥＳＥＮＣ命令とＡＥＳＤＥＣ命令との両方において用いられる。テーブル２に示されるように、ＡＥＳＤＥＣ命令は、ＡＥＳＥＮＣ命令における変換処理と逆の変換処理を含む。このため、ミックスカラム変換のマイクロ処理は、（１）ゼロに設定されたラウンドキーによるＡＥＳＥＮＣ命令と、その後の（２）ゼロに設定されたラウンドキーによるＡＥＳＤＥＣＬＡＳＴ命令との命令シーケンスを実行することによって隔離されてもよい。 Microprocessing for mix column conversion is used in both AESENC and AESDEC instructions. As shown in Table 2, the AESDEC instruction includes a conversion process opposite to the conversion process in the AESENC instruction. Therefore, the micro processing of the mixed column conversion is performed by executing an instruction sequence of (1) AESENC instruction with a round key set to zero and (2) AESDECLAST instruction with a round key set to zero. It may be isolated.

各ＡＥＳ命令のための変換シーケンスを参照して、この命令シーケンスは、ミックスカラム変換を隔離する。これは、ＡｄｄＲｏｕｎｄＫｅｙマイクロ処理が、Ｎｏ処理（ＮＯＰ）を実行し、その他のマイクロ処理（ＳｈｉｆｔＲｏｗｓ，ＳｕｂｓｔｉｔｕｔｅＢｙｔｅｓ）が、逆マイクロ処理（ＩｎｖｅｒｓｅＳｈｉｆｔＲｏｗｓ，ＩｎｖｅｒｓｅＳｕｂｓｔｉｔｕｔｅＢｙｔｅｓ）とを実行することによって変更される。 With reference to the conversion sequence for each AES instruction, this instruction sequence isolates the mix column conversion. This is because the AddRoundKey micro process executes a No process (NOP), and the other micro process (Shift Rows, Substitute Bytes) is changed to a reverse micro process (Inverse Shift Rows, Inverse Substitute Bytes). The

従って、ＡＥＳ命令（ＡＥＳＥＮＣ，ＡＥＳＥＮＣＬＡＳＴ）のシーケンスの実行は、以下に示されるようなミックスカラム（状態）変換を隔離させる。 Thus, execution of the sequence of AES instructions (AESENC, AESENCLAST) isolates the mix column (state) transformation as shown below.

Ｙ＝ＩｎｖｅｒｓｅＭｉｘＣｏｌｕｍｎｓ（ＩｎｖｅｒｓｅＳｕｂｓｔｉｔｕｔｅＢｙｔｅｓ（ＩｎｖｅｒｓｅＳｈｉｆｔＲｏｗｓ（ＳｕｂｓｔｉｔｕｔｅＢｙｔｅｓ（ＳｈｉｆｔＲｏｗｓ（Ｓｔａｔｅ））））
隔離されたミックスカラム変換は、本発明の実施例に従ってＡＥＳガロア体において１６バイトと｛０２｝とを乗算するのに用いられる。１６バイト（ｐ，ｏ，ｎ，ｍ，ｌ，ｋ，ｊ，ｉ，ｈ，ｇ，ｆ，ｅ，ｄ，ｃ，ｂ，ａ）の４（ｄ，ｃ，ｂ，ａ）と｛０２｝とを乗算する実施例が説明される。 Y = Inverse Mix Columns (Inverse Substitute Bytes (Inverse Shift Rows (Substitute Bytes (Shift Rows)))
The isolated mix column transformation is used to multiply 16 bytes and {02} in an AES Galois field according to an embodiment of the present invention. 16 bytes (p, o, n, m, l, k, j, i, h, g, f, e, d, c, b, a) 4 (d, c, b, a) and {02} An embodiment for multiplying is described.

本実施例では、有限体は、還元多項式（ｒｅｄｕｃｔｉｏｎｐｏｌｙｎｏｍｉａｌ）０ｘ１１ｂにより規定される。他の実施例では、体表現の選択は設定可能であってもよい。 In this embodiment, the finite field is defined by a reduction polynomial 0x11b. In other embodiments, the selection of body representation may be configurable.

図４を参照して、ブロック４００において、（０，ｃ，０，ａ）を提供するため、入力データ（ｄ，ｃ，ｂ，ａ）の奇数バイトポジションがゼロに、すなわち、ｂ＝ｄ＝０に設定される。一実施例では、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令が、奇数バイトポジションをゼロに設定するのに用いられる。 Referring to FIG. 4, in block 400, to provide (0, c, 0, a), the odd byte positions of the input data (d, c, b, a) are zero, ie, b = d = Set to zero. In one embodiment, a packed shuffle byte (PSHUFB) instruction is used to set the odd byte position to zero.

図５Ａ〜５Ｃは、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令の利用を示す。ＰＳＨＵＦＢ命令は、第２オペランドに格納されたシャッフル制御マスクに基づき第１オペランドのバイトをシャッフルする（バイトのインプレースシャッフルを実行する）。シャッフル制御マスクのバイトの最上位ビットが設定されている場合、第１オペランドの対応するバイトにゼロが書き込まれる。 5A-5C illustrate the use of a packed shuffle byte (PSHUFB) instruction. The PSHUFB instruction shuffles the bytes of the first operand based on the shuffle control mask stored in the second operand (executes in-place shuffling of bytes). If the most significant bit of the shuffle control mask byte is set, zero is written to the corresponding byte of the first operand.

ＰＳＨＵＦＢ命令は、バイトＡ及びＢの２つのレジスタと呼ばれる２つの１２８ビット入力を有する。ＰＳＨＵＦＢは、バイトＡ＝［ａ_１５ａ_１４ａ_１３．．．ａ_０］及びＢ＝［ｂ_１５ｂ_１４ｂ_１３．．．ｂ_０］の２つの１２８ビットレジスタをとりあげ、レジスタＡを［ａ_ｂ１５ａ_ｂ１４ａ_ｂ１３．．．ａ_ｂ０］と置換する。ｂ_ｉの先頭ビットが１に設定されている場合、その結果の第ｉエントリは０となる。 The PSHUFB instruction has two 128-bit inputs called two registers, bytes A and B. PSHUFB contains bytes A = [a ₁₅ a ₁₄ a ₁₃ . . . a ₀ ] and B = [b ₁₅ b ₁₄ b ₁₃ . . . b ₀ ] and take register A as [a _b15 a _b14 a _b13 . . . a _b0 ]. If the first bit of b _i is set to 1, the resulting i-th entry is 0.

図５Ａを参照して、ブロック５００は、１２８ビット第１レジスタの下位４バイトの初期的な内容を示し、ブロック５０２は、ＰＳＨＵＦＢ命令が“ｆｆ０２ｆｆ００ｈ”のシャッフル制御マスクにより実行された後、第１レジスタの下位４バイトの内容を示す。図示されるように、２つの奇数バイト（バイト１とバイト３）は、ＭＳＢが“１”に設定されているため、“０”に設定されている。 Referring to FIG. 5A, block 500 shows the initial contents of the lower 4 bytes of the 128-bit first register, and block 502 shows the first after the PSHUFB instruction is executed with the shuffle control mask of “ff02ff00h”. Indicates the contents of the lower 4 bytes of the register. As shown in the figure, the two odd bytes (byte 1 and byte 3) are set to “0” because the MSB is set to “1”.

図４を参照して、奇数バイトが“０”に設定された後、ミックスカラム変換が、第１レジスタの内容を用いて、ＡＥＳＤＥＣＬＡＳＴとその後のＡＥＳＥＮＣの命令シーケンスを実行することによって実行される。この命令シーケンスは、 Referring to FIG. 4, after the odd byte is set to “0”, the mix column conversion is performed by executing the AESDECLAST and subsequent AESENC instruction sequence using the contents of the first register. This instruction sequence is

という変換を実行する。

The conversion is executed.

ｄとｂの両方がゼロであるため、“ｄ＝０，ｃ，ｂ＝０，ａ”の命令シーケンスの結果は、３ａ＋ｃ，ａ＋２ｃ，ａ＋３ｃ，２ａ＋ｃとなる。 Since both d and b are zero, the result of the instruction sequence “d = 0, c, b = 0, a” is 3a + c, a + 2c, a + 3c, 2a + c.

次に、結果（３ａ＋ｃ，ａ＋２ｃ，ａ＋３ｃ，２ａ＋ｃ）の奇数バイトは、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）を用いてゼロに設定され、第２ＰＳＨＵＦＢ命令の結果である（０，ａ＋２ｃ，０，２ａ＋ｃ）が第１レジスタに格納される。 Next, the odd bytes of the result (3a + c, a + 2c, a + 3c, 2a + c) are set to zero using the packed shuffle byte (PSHUFB), and the result of the second PSHUFB instruction (0, a + 2c, 0, 2a + c) is the first. Stored in one register.

ブロック４０４において、（ｄ，０，ｂ，０）を提供するため、入力データ（ｄ，ｃ，ｂ，ａ）の偶数バイトポジションがゼロに、すなわち、ａ＝ｃ＝０に設定される。一実施例では、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令が、偶数バイトポジションをゼロに設定するため用いられる。 At block 404, the even byte position of the input data (d, c, b, a) is set to zero, i.e., a = c = 0, to provide (d, 0, b, 0). In one embodiment, a packed shuffle byte (PSHUFB) instruction is used to set the even byte position to zero.

図５Ｂを参照すると、５０２は、第１レジスタの初期的な内容を示し、５０４は、ＰＳＨＵＦＢ命令が“０３ｆｆ０１ｆｆｈ”のシャッフル制御マスクにより実行された後の第１レジスタの内容を示す。図示されるように、すべての偶数バイトポジションは“０”に設定されている。 Referring to FIG. 5B, reference numeral 502 indicates the initial contents of the first register, and reference numeral 504 indicates the contents of the first register after the PSHUFB instruction is executed with the shuffle control mask of “03ff01ffh”. As shown, all even byte positions are set to “0”.

図４を参照して、偶数バイトポジションが“０”に設定された後、ミックスカラム変換が、第１レジスタの内容を用いてＡＥＳＤＥＣＬＡＳＴの後にＡＥＳＥＮＣとの命令シーケンスを実行することによって実行される。この命令シーケンスは、 Referring to FIG. 4, after the even byte position is set to “0”, the mix column conversion is executed by executing an instruction sequence with AESDECC after AESDECLAST using the contents of the first register. This instruction sequence is

という変換を実行する。

The conversion is executed.

ｃとａの両方がゼロであるため、“ｄ，ｃ＝０，ｂ，ａ＝０”の命令シーケンスの結果は、ｂ＋２ｄ，ｂ＋３ｄ，２ｂ＋ｄ，３ｂ＋ｄとなる。 Since both c and a are zero, the result of the instruction sequence “d, c = 0, b, a = 0” is b + 2d, b + 3d, 2b + d, 3b + d.

次に、結果（ｂ＋２ｄ，ｂ＋３ｄ，２ｂ＋ｄ，３ｂ＋ｄ）の偶数バイトが、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）を用いてゼロに設定され、第４ＰＳＨＵＦＢ命令の結果（ｂ＋２ｄ，０，２ｂ＋ｄ，０）が第２レジスタに格納される。 Next, the even bytes of the result (b + 2d, b + 3d, 2b + d, 3b + d) are set to zero using the packed shuffle byte (PSHUFB), and the result (b + 2d, 0, 2b + d, 0) of the fourth PSHUFB instruction is stored in the second register. Stored in

ブロック４０８において、第１レジスタ（ブロック４０２）に格納された結果と、第２レジスタ（ブロック４０６）に格納された結果とが、両方のミックスカラム変換の結果（ｂ＋２ｄ，ａ＋２ｃ，２ｂ＋ｄ，２ａ＋ｃ）を提供するためにＸＯＲ演算される。実施例では、この結果は、ＰＸＯＲ命令を用いてＸＯＲ演算される。ＰＸＯＲ命令は、２つのレジスタの内容に対してＸＯＲ演算を実行し、レジスタの１つにその結果を格納する。 In block 408, the result stored in the first register (block 402) and the result stored in the second register (block 406) are both mixed column conversion results (b + 2d, a + 2c, 2b + d, 2a + c). XORed to provide. In the exemplary embodiment, this result is XORed using the PXOR instruction. The PXOR instruction performs an XOR operation on the contents of two registers and stores the result in one of the registers.

ブロック４１０において、Ｐａｃｋｅｄシャッフルバイト（ＰＳＨＵＦＢ）命令が、マスクに基づき入力データ（ｄ，ｃ，ｂ，ａ）のバイトをシャッフルするのに利用される。 In block 410, a packed shuffle byte (PSHUFB) instruction is used to shuffle the bytes of input data (d, c, b, a) based on the mask.

図５Ｃを参照して、５０６は、第３レジスタの初期的な内容を示し、５０８は、ＰＳＨＵＦＢ命令が“０００３０２ｈ”のシャッフル制御マスクにより実行された後の第３レジスタの内容を示す。図示されるように、入力データ（ｄ，ｃ，ｂ，ａ）のバイトは、第３レジスタに格納される結果（ｂ，ａ，ｄ，ｃ）５１０を提供するためシャッフルされる。 Referring to FIG. 5C, 506 shows the initial contents of the third register, and 508 shows the contents of the third register after the PSHUFB instruction is executed with the shuffle control mask of “000302h”. As shown, the bytes of input data (d, c, b, a) are shuffled to provide the result (b, a, d, c) 510 stored in the third register.

図４に続いて、ブロック４１２において、乗算の結果、すなわち、（２ｄ，２ｃ，２ｂ，２ａ）を提供するため、レジスタ３（ｂ＋２ｄ，ｂ＋３ｄ，２ｂ＋ｄ，３ｂ＋ｄ）とレジスタ２（ｂ，ａ，ｄ，ｃ）の内容に対してＸＯＲ演算が実行される。 Continuing with FIG. 4, in block 412, register 3 (b + 2d, b + 3d, 2b + d, 3b + d) and register 2 (b, a, d) are provided to provide the result of the multiplication, ie, (2d, 2c, 2b, 2a). , C), an XOR operation is performed.

４バイトデータブロックに対してｇ＝｛０２｝による乗算を実行する実施例が説明された。以下のテーブル４は、１つの１６バイトデータブロックに対して実行される機能的な正確な最適化されたに実施例のコードサンプル（アセンブラ）を示す。 An embodiment has been described in which multiplication by g = {02} is performed on a 4-byte data block. Table 4 below shows a functional accurate optimized code sample (assembler) executed on one 16-byte data block.

テーブル４のコードサンプルに示されるように、ガロア体乗算は、１１個の命令（５つのＰＳＨＵＦＢ命令、２つのＰＸＯＲ命令、２つのＡＥＳＥＮＣ命令、２つのＡＥＳＤＥＣＬＡＳＴ命令）、３個のマスク（マスク１，マスク２，マスク３）及び３個のｘｍｍレジスタ（ｘｍｍ１，ｘｍｍ２，ｘｍｍ３）を用いて、１６バイトのデータに対して実行される。

As shown in the code sample in Table 4, Galois field multiplication involves 11 instructions (5 PSHUFB instructions, 2 PXOR instructions, 2 AESENC instructions, 2 AESDECLAST instructions), 3 masks (Mask 1, This is performed on 16 bytes of data using mask 2, mask 3) and three xmm registers (xmm1, xmm2, xmm3).

例えば、入力データ“ｅ５９８２７１ｅｆ１１１４１ｂ８ａｅ５２ｂ４ｅ０３０５ｄｂｆｄ４”に対するガロア体乗算の実行結果は、出力“ｄ１２ｂ４ｅ３ｃｆ９２２８２６ｂ４７ａ４７３ｄｂ６０ｂａ６５ｂ３”をもたらす。 For example, the execution result of Galois field multiplication for the input data “e598271ef11141b8ae52b4e0305dbfd4” yields the output “d12b4e3cf922826b47a473db60ba65b3”.

このコードサンプルでは、命令が逐次処理される場合、スループットは、ＡＥＳ命令の遅延により遅くなる。例えば、実施例では、ＰＳＨＵＦＢとＰＸＯＲ命令の遅延は１サイクルであり、ＡＥＳ命令の遅延は６サイクルである。従って、ＡＥＳ命令のペアが逐次処理される場合、１２サイクルの遅延が生じる。他の実施例では、ＡＥＳ命令のペアの第２ＡＥＳ命令が、第１ＡＥＳ命令がスケジューリングされた６サイクル後にスケジューリングされるように、全体的な遅延は、インタリーブされた命令による複数の１６バイトの入力データを同時に処理することによって減少されてもよい。テーブル４に示されるサンプルコードの命令の順序は、図６Ａ〜６Ｃに示される例に示されるように変更されてもよい。この命令順序は、複数の１６バイトデータブロックが同時に処理することを可能にする。これは、ＡＥＳ命令の遅延はＰＸＯＲ及びＰＳＨＵＦＢ命令の遅延より大きいためである。 In this code sample, if the instructions are processed sequentially, the throughput is slowed by the delay of the AES instruction. For example, in the embodiment, the delay of the PSHUFB and PXOR instructions is 1 cycle, and the delay of the AES instruction is 6 cycles. Thus, when AES instruction pairs are processed sequentially, a 12 cycle delay occurs. In another embodiment, the overall delay is a plurality of 16 bytes of input data from the interleaved instruction so that the second AES instruction of the pair of AES instructions is scheduled 6 cycles after the first AES instruction is scheduled. May be reduced by processing them simultaneously. The order of the sample code instructions shown in Table 4 may be changed as shown in the examples shown in FIGS. This instruction order allows multiple 16-byte data blocks to be processed simultaneously. This is because the delay of the AES instruction is greater than the delay of the PXOR and PSHUFB instructions.

図６Ａ〜６Ｃは、ガロア体乗算が複数の１６バイトデータブロックに対して同時に実行されることを可能にするサンプルコードである。このコードは、利用されうるコードの単なる一例である。他の多数の変形があってもよく、例えば、コードは特定のコンパイラによる利用に最適化されてもよい。図６Ａ〜６Ｃは、ＮＢＬＯＣＫＳデータブロック（各ブロックは１６バイト（１６Ｂ）を有する）のデータバッファに対して｛０２｝による乗算を実行する機能（インラインアセンブラ）を示す。４つの１６バイトブロックがパラレルに処理され、２５６バイトデータバッファを利用するため、当該処理（パラレルな４つの１６バイトブロック）が４回繰り返される（パラレルに４ブロック）。１２個のｘｍｍレジスタ（ｘｍｍ０〜ｘｍｍ１１）が、入力データとその演算結果とを格納するのに用いられる。３つのマスクレジスタ（マスク１，マスク２，マスク３）が、テーブル４に示されるサンプルコードと同じマスクを格納する。図６Ａを参照して、ブロック６００の命令は、ｘｍｍレジスタ（ｘｍｍ１，ｘｍｍ４，ｘｍｍ７，ｘｍｍ１０）に格納されている入力データの偶数バイトポジションをゼロに設定する。図示された例では、Ｐａｃｋｅｄシャッフルバイト（ＶＰＳＨＵＦＢ）命令が、奇数バイトポジションをゼロに設定するため用いられる。ＶＰＳＨＵＦＢ命令は、ｍｏｖｅの後にＰＳＨＵＦＢなどを実行し、“ｖｐｓｈｕｆｂｘｍｍ１，ｘｍｍ０，ｍａｓｋ１”の命令に対して、ｘｍｍ０の内容はｘｍｍ１に移され、ｘｍｍ１の内容は、ｘｍｍ０に格納されている制御マスクに基づきシャッフルされる。 6A-6C are sample code that allows Galois field multiplication to be performed simultaneously on multiple 16-byte data blocks. This code is just one example of a code that can be used. There may be many other variations, for example, the code may be optimized for use by a particular compiler. 6A to 6C show a function (inline assembler) for performing multiplication by {02} on a data buffer of an NBLOCKS data block (each block has 16 bytes (16B)). Since four 16-byte blocks are processed in parallel and a 256-byte data buffer is used, the processing (four parallel 16-byte blocks) is repeated four times (four blocks in parallel). Twelve xmm registers (xmm0 to xmm11) are used to store input data and its operation results. Three mask registers (mask 1, mask 2, mask 3) store the same mask as the sample code shown in Table 4. Referring to FIG. 6A, the instruction in block 600 sets the even byte position of the input data stored in the xmm registers (xmm1, xmm4, xmm7, xmm10) to zero. In the illustrated example, a packed shuffle byte (VPSHUFB) instruction is used to set the odd byte position to zero. The VPSHUFB instruction executes PSHUFB after move and the contents of xmm0 are moved to xmm1 and the contents of xmm1 are stored in the control mask stored in xmm0 with respect to the instruction “vpshub xmm1, xmm0, mask1”. Shuffle based.

次に、ブロック６０２のＡＥＳＤＥＣＬＡＳＴ命令が、ｘｍｍ１，ｘｍｍ４，ｘｍｍ７及びｘｍｍ１０レジスタに格納されたゼロに設定されている奇数バイトポジションを有する入力データに対して実行される。 Next, the AESDECLAST instruction in block 602 is executed on the input data having odd byte positions set to zero stored in the xmm1, xmm4, xmm7 and xmm10 registers.

ブロック６０４の命令は、図５Ｃに関して説明されたように、入力データをｘｍｍレジスタ（ｘｍｍ２，ｘｍｍ５）に移し、奇数バイトポジションをゼロに設定し、ｘｍｍレジスタ（ｘｍｍ０，ｘｍｍ３）において入力バイトをリシャッフルする。 The instruction in block 604 moves the input data to the xmm registers (xmm2, xmm5), sets the odd byte positions to zero, and shuffles the input bytes in the xmm registers (xmm0, xmm3) as described with respect to FIG. 5C. To do.

図６Ｂを参照して、ブロック６０６のＡＥＳＥＮＣ命令は、ミックスカラム変換を隔離し、その結果をｘｍｍレジスタ（ｘｍｍ１，ｘｍｍ３，ｘｍｍ７，ｘｍｍ１０）に格納する。 Referring to FIG. 6B, the AESENC instruction in block 606 isolates the mix column conversion and stores the result in the xmm registers (xmm1, xmm3, xmm7, xmm10).

ブロック６０８の命令は、図５Ｃに関して説明されたように、入力データをｘｍｍレジスタ（ｘｍｍ８，ｘｍｍ１１）に移動し、偶数バイトポジションをゼロに設定し、ｘｍｍレジスタ（ｘｍｍ６，ｘｍｍ９）の入力バイトをリシャッフルする。 The instruction in block 608 moves the input data to the xmm registers (xmm8, xmm11), sets the even byte position to zero, and resets the input bytes in the xmm registers (xmm6, xmm9) as described with respect to FIG. 5C. Shuffle.

ブロック６１０の命令は、ｘｍｍレジスタ（ｘｍｍ２，ｘｍｍ５，ｘｍｍ８，ｘｍｍ１１）に格納されているゼロに設定された偶数バイトポジションを有する入力データに対してＡＥＳＤＥＣＬＡＳＴ命令を実行する。 The instruction in block 610 executes the AESDECLAST instruction on input data having an even byte position set to zero stored in the xmm registers (xmm2, xmm5, xmm8, xmm11).

ブロック６１２の命令は、ｘｍｍレジスタ（ｘｍｍ１，ｘｍｍ４，ｘｍｍ７，ｘｍｍ１０）に格納されているデータの奇数ポジションのバイトをゼロにする。 The instruction in block 612 zeroes out the odd position bytes of the data stored in the xmm registers (xmm1, xmm4, xmm7, xmm10).

ブロック６１４の命令は、ｘｍｍレジスタ（ｘｍｍ２，ｘｍｍ５，ｘｍｍ８，ｘｍｍ１１）に格納されているデータに対してＡＥＳＥＮＣ命令を実行し、ｘｍｍレジスタ（ｘｍｍ２，ｘｍｍ５，ｘｍｍ８，ｘｍｍ１１）に格納されている結果において偶数バイトポジションをゼロに設定する。 The instruction in block 614 executes an AESENC instruction on the data stored in the xmm registers (xmm2, xmm5, xmm8, xmm11), and in the result stored in the xmm registers (xmm2, xmm5, xmm8, xmm11) Set even byte position to zero.

ブロック６１６の命令は、ｘｍｍレジスタ（ｘｍｍ０，ｘｍｍ３，ｘｍｍ６，ｘｍｍ９）において乗算の結果を提供するため、ｘｍｍレジスタ（ｘｍｍ０〜ｘｍｍ１１）の内容に対してＸＯＲ演算を実行する。 The instruction in block 616 performs an XOR operation on the contents of the xmm registers (xmm0 to xmm11) to provide the result of the multiplication in the xmm registers (xmm0, xmm3, xmm6, xmm9).

ブロック６１８の命令は、ｘｍｍレジスタ（ｘｍｍ０，ｘｍｍ３，ｘｍｍ６，ｘｍｍ９）に格納されている乗算の結果をｒｂｘレジスタに移動する。 The instruction in block 618 moves the result of the multiplication stored in the xmm register (xmm0, xmm3, xmm6, xmm9) to the rbx register.

ブロック６２０の命令は、乗算対象の次の１６バイトブロックの位置へのポインタを計算する。 The instruction in block 620 calculates a pointer to the location of the next 16 byte block to be multiplied.

他の実施例では、ＲＡＩＤ−６計算がＧＦ（２^８）の他の表現により実行される場合、ＡＥＳ命令が適用可能な“好適な”表現（還元多項式１１Ｂにより）に入力を変換することによって、上述した技術を利用することが可能である。当初の表現への最終的な変換が必要とされる（しかしながら、リカバリが実際に要求されるケースに留保可能である）。この変換は、予め計算されたテーブルを用いて実行可能である。 In another embodiment, if the RAID-6 calculation is performed with other representations of GF (2 ⁸ ), by converting the input to a “preferred” representation (with the reduction polynomial 11B) to which the AES instruction can be applied. It is possible to use the technique described above. A final conversion to the original representation is required (however, it can be reserved in cases where recovery is actually required). This conversion can be performed using a pre-calculated table.

本発明の他の実施例はまた、本発明の処理を実行するための命令を含むマシーンアクセス可能な媒体を含む。このような実施例はまた、プログラムプロダクトと呼ばれてもよい。このようなマシーンアクセス可能な媒体は、限定することなく、フロッピー（登録商標）ディスク、ハードディスク、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などの記憶媒体と、マシーン又はデバイスにより形成又は製造された粒子の他の有形な構成とを含むものであってもよい。命令はまた分散環境で利用されてもよく、シングル又はマルチプロセッサマシーンによるアクセスのため、ローカル及び／又はリモートに格納されてもよい。 Other embodiments of the present invention also include a machine accessible medium containing instructions for performing the processing of the present invention. Such an embodiment may also be referred to as a program product. Such machine-accessible media include, but are not limited to, floppy (registered trademark) disks, hard disks, CD-ROM (Compact Disk-Read Only Memory), ROM (Read Only Memory), RAM (Random Access Memory), and the like. Storage media and other tangible configurations of particles formed or manufactured by a machine or device. The instructions may also be utilized in a distributed environment and stored locally and / or remotely for access by a single or multiprocessor machine.

本発明の実施例がそれの実施例を参照して図示及び説明されたが、当業者は、添付した請求項により包含される本発明の実施例の範囲から逸脱することなく、形式及び詳細について各種変更が可能であることを理解するであろう。 While embodiments of the invention have been illustrated and described with reference to such embodiments, those skilled in the art will recognize the form and details without departing from the scope of the embodiments of the invention encompassed by the appended claims. It will be understood that various changes are possible.

Claims

Performing a Galois field multiplication operation on each of a plurality of bytes of a block of bytes, comprising:
Performing the Galois field multiplication operation,
Performing an Advanced Encryption Standard (AES) mix column transformation on the block of bytes in which all even position bytes are set to zero to provide a first result;
Performing the AES mix column conversion on the block of bytes in which all odd position bytes are set to zero to provide a second result;
Combining the first result and the second result to provide a result of the Galois field multiplication operation;
Having a method.

The method according to claim 1, wherein the finite field in the Galois field multiplication operation is defined by a reduction polynomial 0x11B.

The method of claim 1, wherein performing the AES mix column transformation comprises performing an AESDECLAST round instruction followed by an AESENC round instruction.

The conversion sequence executed by the AESDECLAST round instruction includes an Inverse Shift Rows conversion and an Inverse Substitute Bytes conversion.
The method according to claim 3, wherein the conversion sequence executed by the AESENC round instruction includes a Shift Rows conversion, a Substitute Bytes conversion, and a mix column conversion.

Performing an exclusive OR (XOR) operation on the first result and the second result to provide a third result;
Shuffling the data stored in the block of bytes to switch the lower 2 bytes and the upper 2 bytes of each 4-byte block in the block of bytes to provide a fourth result;
Performing an XOR operation on the third result and the fourth result;
The method of claim 1, further comprising:

The method according to claim 1, wherein the AES mix column conversion converts a 4-byte block sequence d, c, b, a into another 4-byte block sequence 3a + b + c + 2d, a + b + 2c + 3d, a + 2b + 3c + d, 2a + 3b + c + d.

The method of claim 1, wherein the synthesis is utilized to calculate a Q syndrome for a level 6 RAID system.

A memory for storing a plurality of instructions for executing a Galois field multiplication operation in parallel for each of a plurality of bytes of a block of bytes;
A processor including an execution unit;
A device comprising:
When the instruction is executed by the execution unit, the execution unit provides a first result so that the AES (Advanced Encryption Standard for all blocks of bytes with all even position bytes set to zero. ) Performing the AES mix column transform on the block of bytes where all odd position bytes are set to zero to perform a mix column transform and provide a second result, the Galois field multiplication operation An apparatus that is stored in the memory in order of combining the first result and the second result to provide a result.

The apparatus according to claim 8, wherein the finite field in the Galois field multiplication operation is defined by a reduction polynomial 0x11B.

9. The apparatus of claim 8, wherein the execution unit performs the AES mix column transformation by executing an AESDECLAST round instruction followed by an AESENC round instruction.

The conversion sequence executed by the AESDECLAST round instruction includes an Inverse Shift Rows conversion and an Inverse Substitute Bytes conversion.
The apparatus according to claim 10, wherein the conversion sequence executed by the AESENC round instruction includes a Shift Rows conversion, a Substitute Bytes conversion, and a mix column conversion.

9. The apparatus according to claim 8, wherein the AES mix column conversion converts a 4-byte block sequence d, c, b, a to another 4-byte block sequence 3a + b + c + 2d, a + b + 2c + 3d, a + 2b + 3c + d, 2a + 3b + c + d.

9. The apparatus of claim 8, wherein the composition is utilized to calculate a Q syndrome for a level 6 RAID system.

A machine-accessible storage medium having relevant information,
The information, when accessed, causes the machine to perform a Galois field multiplication operation on each of the plurality of bytes of the block of bytes,
The execution of the Galois field multiplication operation is as follows:
Performing an Advanced Encryption Standard (AES) mix column transformation on the block of bytes in which all even position bytes are set to zero to provide a first result;
Performing the AES mix column conversion on the block of bytes in which all odd position bytes are set to zero to provide a second result;
Combining the first result and the second result to provide a result of the Galois field multiplication operation;
A storage medium .

The storage medium according to claim 14, wherein the finite field in the Galois field multiplication operation is defined by a reduction polynomial 0x11B.

The storage medium of claim 14, wherein performing the AES mix column conversion includes executing an AESDECLAST round instruction followed by an AESENC round instruction.

The conversion sequence executed by the AESDECLAST round instruction includes an Inverse Shift Rows conversion and an Inverse Substitute Bytes conversion.
The storage medium according to claim 14, wherein the conversion sequence executed by the AESENC round instruction includes a Shift Rows conversion, a Substitute Bytes conversion, and a mix column conversion.

A processor;
A storage device accessible by the processor and storing a plurality of instructions;
A system comprising:
At least one of the instructions performs a conversion sequence and, when executed by the processor, provides a first result to the processor to block the bytes with all even position bytes set to zero. To perform an AES (Advanced Encryption Standard) mix column conversion and provide the second result, the AES mix column conversion is performed on the block of bytes where all odd position bytes are set to zero. And a system configured by combining the first result and the second result to provide a result of the Galois field multiplication operation.

The system according to claim 18, wherein the finite field in the Galois field multiplication operation is defined by a reduction polynomial 0x11B.

19. The system of claim 18, wherein an AESDECLAST round instruction followed by an AESENC round instruction performs the AES mix column conversion.