JP2016218528A

JP2016218528A - Data processing device and data processing method

Info

Publication number: JP2016218528A
Application number: JP2015099446A
Authority: JP
Inventors: 毅葛; Ge Yi
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-05-14
Filing date: 2015-05-14
Publication date: 2016-12-22

Abstract

【課題】データロード回数の低減を図ること。
【解決手段】変換部１１１は、対象の演算命令によってメモリ１０１から取得した第１行列である行列Ａを受け付ける。変換部１１１は、第１行列である行列Ａと、行列Ａを転置した第２行列である行列Ａ^Tと、のうち、演算命令が指示する演算対象の行列を出力する。第２行列は、第１行列に含まれる複数の要素と同一の要素を含む行列であって、第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた行列である。演算部１１２は、変換部１１１によって出力された演算対象の行列に対して、演算命令が指示する演算を行う。演算は、例えば、加算、減算、積算などである。１回の演算命令において行列変換と演算とを行うことができる。
【選択図】図１An object of the present invention is to reduce the number of data loads.
A conversion unit 111 receives a matrix A that is a first matrix acquired from a memory 101 by a target operation instruction. The conversion unit 111 outputs a matrix to be calculated designated by the calculation instruction, out of the matrix A that is the first matrix and the matrix ^AT that is the second matrix obtained by transposing the matrix A. The second matrix is a matrix including the same elements as the plurality of elements included in the first matrix, and is a matrix in which the arrangement of at least any two of the plurality of elements included in the first matrix is replaced. is there. The calculation unit 112 performs a calculation instructed by a calculation command on the calculation target matrix output by the conversion unit 111. The calculation is, for example, addition, subtraction, integration or the like. Matrix conversion and calculation can be performed in one calculation instruction.
[Selection] Figure 1

Description

本発明は、データ処理装置、およびデータ処理方法に関する。 The present invention relates to a data processing apparatus and a data processing method.

従来、無線通信ベースバンド処理などでは、大量の行列演算処理が行われる場合がある。行列演算処理などのように配列型データを扱うには、配列処理アーキテクチャが適している。配列型データを扱うプロセッサとしては、例えば、ＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）方式やベクトル方式のプロセッサがある。 Conventionally, a large amount of matrix calculation processing may be performed in wireless communication baseband processing or the like. An array processing architecture is suitable for handling array type data such as matrix operation processing. As a processor that handles array-type data, for example, there is a SIMD (Single Instruction Multiple Data) method or a vector method processor.

また、無線通信ベースバンド処理における行列演算では、入力行列の転置行列や複素共役などを取るような行列変換処理が多く発生する。例えば、ＳＩＭＤにおける行列演算の一機能として、複数行列の転置行列を得る技術が公知である（例えば、以下特許文献１参照。）。 Further, in matrix operations in wireless communication baseband processing, many matrix transformation processes such as taking the transposed matrix or complex conjugate of the input matrix occur. For example, as one function of matrix calculation in SIMD, a technique for obtaining a transposed matrix of a plurality of matrices is known (for example, refer to Patent Document 1 below).

特開２００５−１７４２９２号公報JP 2005-174292 A

従来、配列型データを扱うプロセッサでは、連続した単位データに同一の演算を施すため、メモリから連続的にデータを供給する。ベースバンド処理における行列演算では、入力行列の転置行列や複素共役を取るなどの行列変換処理が多く発生するため、データメモリからのデータロード回数が増大するという問題点がある。 Conventionally, a processor that handles array type data supplies data continuously from a memory in order to perform the same operation on continuous unit data. Matrix operations in baseband processing often involve matrix transformation processing such as transposing an input matrix or taking complex conjugates, which increases the number of data loads from the data memory.

１つの側面では、本発明は、データロード回数の低減を図ることができるデータ処理装置、およびデータ処理方法を提供することを目的とする。 In one aspect, an object of the present invention is to provide a data processing apparatus and a data processing method capable of reducing the number of data loads.

本発明の一側面によれば、対象の演算命令によってメモリから取得した第１行列を受け付けて、受け付けた前記第１行列と、前記第１行列に含まれる複数の要素と同一の要素を含み前記第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた第２行列と、のうち、前記演算命令が指示する演算対象の行列を出力する変換部と、前記変換部によって出力された前記演算対象の行列に対して、前記演算命令が指示する演算を行う演算部と、を有するデータ処理装置、およびデータ処理方法が提案される。 According to an aspect of the present invention, the first matrix acquired from the memory according to the target operation instruction is received, and the received first matrix includes the same elements as the plurality of elements included in the first matrix. A second matrix in which the arrangement of at least any two of the plurality of elements included in the first matrix is exchanged; a conversion unit that outputs a matrix to be calculated indicated by the calculation instruction; and the conversion A data processing apparatus and a data processing method are proposed that include a calculation unit that performs a calculation instructed by the calculation command on the calculation target matrix output by the calculation unit.

本発明の一態様によれば、データロード回数の低減を図ることができる。 According to one embodiment of the present invention, the number of data loads can be reduced.

図１は、データ処理装置による一動作例を示す説明図である。FIG. 1 is an explanatory diagram illustrating an operation example of the data processing apparatus. 図２は、データ処理装置の適用例を示す説明図である。FIG. 2 is an explanatory diagram illustrating an application example of the data processing apparatus. 図３は、データ処理装置と周辺の回路との一構成例を示す説明図である。FIG. 3 is an explanatory diagram illustrating a configuration example of the data processing device and peripheral circuits. 図４は、演算データパスによる演算例を示す説明図である。FIG. 4 is an explanatory diagram illustrating a calculation example using a calculation data path. 図５は、転置行列に変換する例を示す説明図である。FIG. 5 is an explanatory diagram illustrating an example of conversion into a transposed matrix. 図６は、データ処理装置の機能的構成例を示す説明図である。FIG. 6 is an explanatory diagram illustrating a functional configuration example of the data processing device. 図７は、行列変換例を示す説明図である。FIG. 7 is an explanatory diagram illustrating an example of matrix transformation. 図８は、データ供給回路例を示す説明図である。FIG. 8 is an explanatory diagram illustrating an example of a data supply circuit. 図９は、インデックステーブル例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of an index table. 図１０は、共役回路例を示す説明図である。FIG. 10 is an explanatory diagram illustrating an example of a conjugate circuit. 図１１は、共役制御信号の出力例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an output example of the conjugate control signal. 図１２は、命令実行処理手順例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of an instruction execution processing procedure. 図１３は、ラップアラウンド処理回路を含むデータ供給回路の一構成例を示す説明図である。FIG. 13 is an explanatory diagram showing a configuration example of a data supply circuit including a wraparound processing circuit. 図１４は、ＷＭ処理回路と行列変換回路とによる要素の選択例を示す説明図である。FIG. 14 is an explanatory diagram showing an example of element selection by the WM processing circuit and the matrix conversion circuit. 図１５は、ラップアラウンド処理回路を含むデータ供給回路の別の構成例を示す説明図である。FIG. 15 is an explanatory diagram illustrating another configuration example of the data supply circuit including the wraparound processing circuit. 図１６は、図１５に示すデータ供給回路による行列変換例を示す説明図である。FIG. 16 is an explanatory diagram showing an example of matrix conversion by the data supply circuit shown in FIG. 図１７は、メモリアクセスユニットおよびデータ供給回路の処理を模式的に示す図（その１）である。FIG. 17 is a diagram (part 1) schematically illustrating processing of the memory access unit and the data supply circuit. 図１８は、メモリアクセスユニットおよびデータ供給回路の処理を模式的に示す図（その２）である。FIG. 18 is a diagram (part 2) schematically illustrating processing of the memory access unit and the data supply circuit. 図１９は、ラップアラウンド処理回路の詳細例を示す図である。FIG. 19 is a diagram illustrating a detailed example of the wrap-around processing circuit. 図２０は、ＷＭ制御回路による選択動作の一例を示す図である。FIG. 20 is a diagram illustrating an example of the selection operation by the WM control circuit. 図２１は、ＷＭ制御回路による選択動作の別の一例を示す図である。FIG. 21 is a diagram illustrating another example of the selection operation by the WM control circuit. 図２２は、ＷＭ制御回路による選択動作の更に別の一例を示す図である。FIG. 22 is a diagram showing still another example of the selection operation by the WM control circuit. 図２３は、ＷＭ制御回路の構成の一例を示す図である。FIG. 23 is a diagram illustrating an example of the configuration of the WM control circuit. 図２４は、ＳＥＬ＿ＷＲＡＰ回路の構成の一例を示す図である。FIG. 24 is a diagram illustrating an example of the configuration of the SEL_WRAP circuit. 図２５は、ＡＤＤ＿ＯＦＦＳＥＴ回路の構成の一例を示す図である。FIG. 25 is a diagram illustrating an example of the configuration of the ADD_OFFSET circuit. 図２６は、ＳＬＳ≦Ｍの場合の各信号の生成アルゴリズムを示す図である。FIG. 26 is a diagram illustrating a generation algorithm of each signal when SLS ≦ M. 図２７は、ＳＬＳ＞Ｍの場合の各信号の生成アルゴリズムを示す図である。FIG. 27 is a diagram illustrating a generation algorithm of each signal when SLS> M. 図２８は、ＷＭ制御回路の構成の別の一例を示す図である。FIG. 28 is a diagram illustrating another example of the configuration of the WM control circuit. 図２９は、ＳＬＳ＿ＭＯＤテーブルのデータの一例を示す図である。FIG. 29 is a diagram illustrating an example of data in the SLS_MOD table. 図３０は、データ処理装置の構成の別の一例を示す図である。FIG. 30 is a diagram illustrating another example of the configuration of the data processing device.

以下に添付図面を参照して、本発明にかかるデータ処理装置、およびデータ処理方法の実施の形態を詳細に説明する。 Exemplary embodiments of a data processing device and a data processing method according to the present invention will be explained below in detail with reference to the accompanying drawings.

図１は、データ処理装置による一動作例を示す説明図である。データ処理装置１００は、ストリーム型処理を行うコンピュータである。ストリーム型処理とは、メモリ１０１から一連のデータ（配列型データ）を順次読み出して演算を行い、一連の演算結果をデータメモリに順次書き込む処理である。データ処理装置１００の適用の一例は図２に示す。適用対象は、例えば、スマートフォン、携帯電話機、タブレット端末装置、タブレット型ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、ＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙ−ｐｈｏｎｅＳｙｓｔｅｍ）などである。 FIG. 1 is an explanatory diagram illustrating an operation example of the data processing apparatus. The data processing apparatus 100 is a computer that performs stream type processing. The stream type process is a process of sequentially reading a series of data (array type data) from the memory 101 and performing an operation, and sequentially writing a series of operation results to the data memory. An example of application of the data processing apparatus 100 is shown in FIG. The application target is, for example, a smartphone, a mobile phone, a tablet terminal device, a tablet PC (Personal Computer), a PHS (Personal Handy-phone System), and the like.

また、データ処理装置１００は、命令セットとして、単位データを扱う命令群を有する。単位データは、例えば、行列形式のデータである。命令は、例えば、乗算命令、加算命令、逆行列（除算）命令などの演算命令である。 Further, the data processing apparatus 100 has a command group that handles unit data as a command set. The unit data is, for example, data in a matrix format. The instruction is, for example, an operation instruction such as a multiplication instruction, an addition instruction, or an inverse matrix (division) instruction.

ここで、無線通信信号処理などにおいては、大量の行列演算処理が必要となる場合がある。行列演算のような演算量が多い処理を高速に実行することが要求される場合、行列演算のための専用回路が設けられる。上述したように、配列型データを扱うプロセッサとしては、例えば、ＳＩＭＤ方式やベクトルプロセッサ方式のプロセッサがある。 Here, in wireless communication signal processing, a large amount of matrix calculation processing may be required. When it is required to execute a process with a large amount of calculation such as matrix calculation at high speed, a dedicated circuit for matrix calculation is provided. As described above, examples of processors that handle array type data include SIMD and vector processor processors.

ＳＩＭＤ方式のプロセッサでは、単位データとしては例えば１２８ビットのデータを扱う。ここでは、行列に含まれる要素の各々が３２ビットのデータで表される。例えば、２ｘ２行列の単位データ長ＵＬは１２８ビットであり、ＳＩＭＤ幅が４である場合、データ処理幅ＰはＵＬ×ＳＩＭＤ幅の５１２ビットである。 The SIMD processor handles, for example, 128-bit data as unit data. Here, each element included in the matrix is represented by 32-bit data. For example, when the unit data length UL of the 2 × 2 matrix is 128 bits and the SIMD width is 4, the data processing width P is 512 bits of UL × SIMD width.

従来、配列型データを扱うプロセッサでは、連続した単位データに同一の演算を施すため、メモリ１０１から連続的にデータを供給する。ベースバンド処理における行列演算では、入力行列の転置行列や複素共役を取るなどの行列変換処理が多く発生するため、データメモリからのロード回数が増大するという問題点がある。 Conventionally, a processor that handles array type data supplies data continuously from the memory 101 in order to perform the same operation on continuous unit data. Matrix operations in baseband processing often involve matrix transformation processing such as transposing an input matrix or taking complex conjugates, which increases the number of loads from the data memory.

そこで、本実施の形態では、入力行列と該入力行列の複数の要素の並びを入れ替えた変換行列とのうちの演算対象となるいずれかの行列を出力して演算を行う。これにより、行列変換と演算とを１命令で行うことができ、データロード数の低減を図ることができる。したがって、行列演算処理に要する時間の短縮化を図ることができる。 Therefore, in the present embodiment, the calculation is performed by outputting any one of the input matrix and the transformation matrix in which the arrangement of a plurality of elements of the input matrix is replaced. Thereby, matrix conversion and calculation can be performed with one instruction, and the number of data loads can be reduced. Therefore, the time required for the matrix calculation process can be shortened.

まず、データ処理装置１００は、変換部１１１と、演算部１１２と、を有する。メモリ１０１には、行列に含まれる複数の要素が特定の並びで記憶されてある。メモリ１０１は、データメモリである。特定の並びは、例えば、行を優先して順に一次元配列とした並びである。図１の行列Ａの場合、｛ａ，ｂ，ｃ，ｄ｝のように並べられて記憶される。また、データ処理装置１００が、メモリ１０１から行列を取得する際にも｛ａ，ｂ，ｃ，ｄ｝の並びで取得する。 First, the data processing apparatus 100 includes a conversion unit 111 and a calculation unit 112. The memory 101 stores a plurality of elements included in the matrix in a specific sequence. The memory 101 is a data memory. The specific sequence is, for example, a sequence in which the rows are given priority to form a one-dimensional array in order. In the case of the matrix A in FIG. 1, the data are arranged and stored as {a, b, c, d}. In addition, when the data processing apparatus 100 acquires a matrix from the memory 101, the matrix is acquired in the order of {a, b, c, d}.

変換部１１１は、対象の演算命令によってメモリ１０１から取得した第１行列を受け付ける。換言すると、第１行列は、入力行列である。図１の例では、第１行列は行列Ａである。変換部１１１は、第１行列と、第２行列と、のうち、演算命令が指示する演算対象の行列を出力する。第２行列は、第１行列に含まれる複数の要素と同一の要素を含む行列であって、第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた行列である。換言すると、第２行列は、例えば、第１行列を変換した変換行列である。第２行列は、例えば、第１行列の転置行列や複素共役の行列である。図１の例では、第２行列は、行列Ａ^Tであり、行列Ａの転置行列である。 The conversion unit 111 receives the first matrix acquired from the memory 101 by the target calculation instruction. In other words, the first matrix is an input matrix. In the example of FIG. 1, the first matrix is the matrix A. The conversion unit 111 outputs a matrix to be calculated, which is designated by the calculation instruction, out of the first matrix and the second matrix. The second matrix is a matrix including the same elements as the plurality of elements included in the first matrix, and is a matrix in which the arrangement of at least any two of the plurality of elements included in the first matrix is replaced. is there. In other words, the second matrix is, for example, a conversion matrix obtained by converting the first matrix. The second matrix is, for example, a transposed matrix of the first matrix or a complex conjugate matrix. In the example of FIG. 1, the second matrix is the matrix ^AT and is a transposed matrix of the matrix A.

例えば、２×２行列の転置行列を得るために、データ処理装置１００は、行列の１次元配列のデータにおいて２番目の要素と３番目の要素とのそれぞれに選択回路を設けて２番目の要素と３番目の要素とのうちのいずれの要素を出力するかを制御してもよい。変換方法の具体例については後述する。選択回路は、マルチプレクサとも称する。 For example, in order to obtain a transposed matrix of 2 × 2 matrix, the data processing apparatus 100 provides a selection circuit for each of the second element and the third element in the data of the one-dimensional array of the matrix, so that the second element And the third element may be controlled. A specific example of the conversion method will be described later. The selection circuit is also referred to as a multiplexer.

演算部１１２は、変換部１１１によって出力された演算対象の行列に対して、演算命令が指示する演算を行う。演算は、例えば、加算、減算、積算などが挙げられるが、特に限定しない。具体的に、演算部１１２は、例えば、出力された演算対象の行列に対して他の行列との積算を行う。具体的に、演算部１１２は、例えば、出力された演算対象の行列を整数倍する。 The calculation unit 112 performs a calculation instructed by a calculation command on the calculation target matrix output by the conversion unit 111. Examples of the calculation include addition, subtraction, and integration, but are not particularly limited. Specifically, the calculation unit 112 performs, for example, integration of the output calculation target matrix with another matrix. Specifically, for example, the calculation unit 112 multiplies the output matrix to be calculated by an integer.

これにより、行列変換と演算とを１つの演算命令で同時に行うことができるため、データロード数の低減を図ることができる。 As a result, matrix transformation and computation can be performed simultaneously with one computation instruction, so that the number of data loads can be reduced.

（データ処理装置１００の適用例）
ここで、本実施の形態にかかるデータ処理装置１００を携帯電話機のベースバンド処理ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）に適用した場合について説明する。 (Application example of data processing apparatus 100)
Here, a case where the data processing apparatus 100 according to the present embodiment is applied to a baseband processing LSI (Large Scale Integrated circuit) of a mobile phone will be described.

図２は、データ処理装置の適用例を示す説明図である。図２において、ベースバンド処理ＬＳＩ２００は、ＲＦ（ＲａｄｉｏＦｒｅｑｕｅｎｃｙ）部２０１と、専用ハードウェア２０２と、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）２０３−１〜２０３−３と、を含む。 FIG. 2 is an explanatory diagram illustrating an application example of the data processing apparatus. 2, the baseband processing LSI 200 includes an RF (Radio Frequency) unit 201, dedicated hardware 202, and DSPs (Digital Signal Processors) 203-1 to 203-3.

ＲＦ部２０１は、アンテナ２０５を介して受信された無線信号の周波数をダウンコンバートし、デジタル信号に変換してバス２０４に出力する。また、ＲＦ部２０１は、バス２０４に出力されたデジタル信号をアナログ信号に変換し、無線周波数にアップコンバートして、アンテナ２０５に出力する。 The RF unit 201 down-converts the frequency of the radio signal received via the antenna 205, converts it to a digital signal, and outputs it to the bus 204. The RF unit 201 converts the digital signal output to the bus 204 into an analog signal, up-converts the signal to a radio frequency, and outputs the converted signal to the antenna 205.

専用ハードウェア２０２は、例えば、誤り訂正符号を扱うｔｕｒｂｏ、ビタビアルゴリズムを実行するｖｉｔｅｒｂｉ、複数のアンテナでデータの送受信を行うためのＭＩＭＯ（ＭｕｌｔｉＩｎｐｕｔＭｕｌｔｉＯｕｔｐｕｔ）等を含む。 The dedicated hardware 202 includes, for example, a turbo that handles an error correction code, a viterbi that executes a Viterbi algorithm, a MIMO (Multi Input Multi Output) that transmits and receives data with a plurality of antennas, and the like.

ＤＳＰ２０３−１〜２０３−３（以下、単に「ＤＳＰ２０３」と称する）は、プロセッサ２１１と、命令メモリ２１３と、周辺回路２１２と、データメモリ２１４と、を含む。プロセッサ２１１は、ＣＰＵ２２１と、データ処理装置１００と、を含む。各ＤＳＰ２０３にはＳｅａｒｃｈｅｒ（同期）、Ｄｅｍｏｄｕｌａｔｏｒ（復調）、Ｄｅｃｏｄｅｒ（復号）、Ｃｏｄｅｃ（符号化）、Ｍｏｄｕｌａｔｏｒ（変調）等、無線通信信号処理の各要素処理が割り当てられる。 The DSPs 203-1 to 203-3 (hereinafter simply referred to as “DSP203”) include a processor 211, an instruction memory 213, a peripheral circuit 212, and a data memory 214. The processor 211 includes a CPU 221 and the data processing device 100. Each DSP 203 is assigned with each element processing of radio communication signal processing such as Searcher (synchronization), Demodulator (demodulation), Decoder (decoding), Codec (encoding), Modulator (modulation).

命令メモリ２１３は、プログラムなどの命令を記憶するメモリである。データメモリ２１４は、演算対象となるデータや演算結果などを記憶するメモリ１０１である。 The instruction memory 213 is a memory that stores instructions such as programs. The data memory 214 is a memory 101 that stores data to be calculated, calculation results, and the like.

図３は、データ処理装置と周辺の回路との一構成例を示す説明図である。図３には、データ処理装置１００と、データ処理装置１００の周辺の命令メモリ２１３およびデータメモリ２１４と、の簡単な接続例を示す。 FIG. 3 is an explanatory diagram illustrating a configuration example of the data processing device and peripheral circuits. FIG. 3 shows a simple connection example between the data processing device 100 and the instruction memory 213 and the data memory 214 around the data processing device 100.

データ処理装置１００は、命令デコーダ３０４と、データ供給回路３０１と、演算データパス３０２と、データストア回路３０３と、を有する。データ供給回路３０１は、データメモリ２１４に接続され、データメモリ２１４からデータを読み出す。演算データパス３０２は、データ供給回路３０１に接続され、データ供給回路３０１から供給される行列の複数の要素に対して、演算命令が指示する演算を行う演算部１１２である。演算とは、例えば、積算、加算、減算などが挙げられる。 The data processing apparatus 100 includes an instruction decoder 304, a data supply circuit 301, an operation data path 302, and a data store circuit 303. The data supply circuit 301 is connected to the data memory 214 and reads data from the data memory 214. The operation data path 302 is an operation unit 112 that is connected to the data supply circuit 301 and performs an operation instructed by an operation instruction on a plurality of elements of the matrix supplied from the data supply circuit 301. Examples of the calculation include integration, addition, and subtraction.

データストア回路３０３は、演算データパス３０２とデータメモリ２１４とに接続され、演算データパス３０２から供給される演算結果をデータメモリ２１４に書き込む。命令メモリ２１３には、複数の命令を有する命令列が格納されており、命令列の各命令が命令デコーダ３０４に順番に供給される。命令デコーダ３０４は、供給される各命令をデコードし、デコード結果に従いデータ供給回路３０１、演算データパス３０２、およびデータストア回路３０３を制御する。これにより、命令デコーダ３０４は、データメモリ２１４へのアクセスおよび演算データパス３０２による演算処理を実行する。 The data store circuit 303 is connected to the calculation data path 302 and the data memory 214, and writes the calculation result supplied from the calculation data path 302 to the data memory 214. The instruction memory 213 stores an instruction sequence having a plurality of instructions, and each instruction in the instruction sequence is sequentially supplied to the instruction decoder 304. The instruction decoder 304 decodes each supplied instruction and controls the data supply circuit 301, the operation data path 302, and the data store circuit 303 according to the decoding result. As a result, the instruction decoder 304 executes access processing to the data memory 214 and arithmetic processing by the arithmetic data path 302.

汎用性を考えた場合、行列演算のような配列データを扱うには、ＳＩＭＤ型アーキテクチャが適する。ＳＩＭＤ型アーキテクチャでは、単位データとして例えば３２ビットのスカラデータ、行列、ベクトルなどを扱う。例えばＳＩＭＤ幅が４のシステムであれば、データを４つ並べた長さ４のベクトルを対象として、ベクトルの４つの要素のそれぞれを並列処理することにより、高速に演算を実行する。このようなＳＩＭＤ型アーキテクチャでは、一般に、例えば単位データ長ＵＬが３２ビット、ＳＩＭＤ幅が４、データ処理幅Ｐが１２８（＝４×３２）ビット等に固定される。 In consideration of versatility, SIMD type architecture is suitable for handling array data such as matrix operations. In the SIMD type architecture, for example, 32-bit scalar data, a matrix, a vector, and the like are handled as unit data. For example, if the system has a SIMD width of 4, a vector having a length of 4 in which four pieces of data are arranged is processed, and each of the four elements of the vector is processed in parallel, thereby executing a calculation at high speed. In such a SIMD type architecture, for example, the unit data length UL is fixed to 32 bits, the SIMD width is 4, the data processing width P is fixed to 128 (= 4 × 32) bits, and the like.

本実施の形態では、ストリーム処理アーキテクチャのプロセッサにおいて、単位データ長やＳＩＭＤ幅等を可変なパラメタとしたハードウェア構成とする。これにより、様々な単位データ長の命令を定義できる。このようなハードウェア構成では、単位データ長ＵＬとＳＩＭＤ幅とにより決まるデータ処理幅Ｐ＝ＵＬ×ＳＩＭＤが、演算命令により異なることになる。 In the present embodiment, the processor of the stream processing architecture has a hardware configuration in which unit data length, SIMD width, and the like are variable parameters. As a result, commands having various unit data lengths can be defined. In such a hardware configuration, the data processing width P = UL × SIMD determined by the unit data length UL and the SIMD width differs depending on the operation instruction.

図４は、演算データパスによる演算例を示す説明図である。演算データパス３０２は、例えば、データ供給回路３０１から行列に含まれる複数の要素を受け付けると、受け付けた複数の要素に基づき行列演算を行う。データ供給回路３０１から受け付ける複数の要素は、例えば、第１のソースデータｓｒｃ０や第２のソースデータｓｒｃ１である。図４に示すように、データ供給回路３０１内のラップアラウンド回路４０１および、行列変換回路４０２は、ソースデータごとに用意されてもよい。図４の例では、第１のソースデータｓｒｃ０および第２のソースデータｓｒｃ１の各々は、２×２の行列である。行列の各要素は、例えば、実数行列の場合に１６ビットであり、複素数行列の場合に３２ビットであるが、特に限定しない。 FIG. 4 is an explanatory diagram illustrating a calculation example using a calculation data path. For example, when a plurality of elements included in a matrix are received from the data supply circuit 301, the calculation data path 302 performs a matrix calculation based on the received plurality of elements. The plurality of elements received from the data supply circuit 301 are, for example, the first source data src0 and the second source data src1. As shown in FIG. 4, the wraparound circuit 401 and the matrix conversion circuit 402 in the data supply circuit 301 may be prepared for each source data. In the example of FIG. 4, each of the first source data src0 and the second source data src1 is a 2 × 2 matrix. Each element of the matrix is, for example, 16 bits for a real matrix and 32 bits for a complex matrix, but is not particularly limited.

演算データパス３０２は、命令デコーダ３０４による演算命令のデコード結果に従い、行列同士の演算を行う。図４の例では、演算は積算である。演算データパス３０２はＳＩＭＤ型アーキテクチャであり、１つの命令による演算を複数の行列に対して実行する。図４に示す演算データパス３０２は、４並列に演算処理を行うことができる。 The operation data path 302 performs an operation between matrices in accordance with the decoding result of the operation instruction by the instruction decoder 304. In the example of FIG. 4, the calculation is integration. The operation data path 302 is a SIMD type architecture, and executes an operation by one instruction for a plurality of matrices. The arithmetic data path 302 shown in FIG. 4 can perform arithmetic processing in four parallel.

上述したように、例えば、入力行列の転置行列などを演算データパス３０２により求めた後に演算に利用する場合、入力行列に対する転置行列を得るための命令と、転置行列を利用する演算を行う命令と、のように複数の命令の実行により最終的な演算結果が得られる。具体的に、例えば、転置行列を得るための命令によって入力行列がデータメモリ２１４からロードされて転置行列が得られた後に転置行列がデータメモリ２１４に格納される。そして、転置行列を利用する演算を行う命令によってデータメモリ２１４から転置行列がロードされて転置行列を利用する演算が行われる。このように、演算と行列変換とが異なる命令実行であるような場合に、ロード数が増大して全体の処理に時間がかかるという問題点がある。 As described above, for example, when the transposed matrix of the input matrix is obtained by the operation data path 302 and used for the operation, an instruction for obtaining a transposed matrix for the input matrix, an instruction for performing an operation using the transposed matrix, and the like. As described above, the final operation result is obtained by executing a plurality of instructions. Specifically, for example, after an input matrix is loaded from the data memory 214 by an instruction for obtaining a transposed matrix to obtain a transposed matrix, the transposed matrix is stored in the data memory 214. Then, a transposition matrix is loaded from the data memory 214 by an instruction for performing an operation using the transposed matrix, and an operation using the transposed matrix is performed. As described above, there is a problem that when the operation and the matrix conversion are different instruction executions, the number of loads increases and the entire processing takes time.

図５は、転置行列に変換する例を示す説明図である。本実施の形態では、上述したように、例えば行列を一次元配列で表す場合には行を優先して表す。ここでは、２×２の入力行列から転置行列を得る例を示す。図５（１）に示すように、入力行列Ａ｛ａ，ｂ，ｃ，ｄ｝から転置行列Ａ^T｛ａ，ｃ，ｂ，ｄ｝を得るには、要素「ｂ」と要素「ｃ」の位置を入れ替えればよい。 FIG. 5 is an explanatory diagram illustrating an example of conversion into a transposed matrix. In the present embodiment, as described above, for example, when a matrix is represented by a one-dimensional array, the rows are preferentially represented. Here, an example is shown in which a transposed matrix is obtained from a 2 × 2 input matrix. As shown in FIG. 5A, in order to obtain the transposed matrix A ^T {a, c, b, d} from the input matrix A {a, b, c, d}, the element “b” and the element “c” What is necessary is just to swap the position of.

そこで、図５（２）に示すように、図４に示す行列変換回路４０２は、転置行列オンオフ信号によって「ｂ」と「ｃ」の要素を入れ替えて出力するマルチプレクサを設けることによって、２×２の入力行列から転置行列または入力行列そのものを得ることができる。 Therefore, as shown in FIG. 5 (2), the matrix conversion circuit 402 shown in FIG. 4 provides a 2 × 2 by providing a multiplexer that outputs by replacing the elements of “b” and “c” by a transposed matrix on / off signal. The transpose matrix or the input matrix itself can be obtained from the input matrix.

例えば、マルチプレクサ５０１−１とマルチプレクサ５０１−２とはいずれもｂとｃを入力とする。例えば、転置行列オンオフ信号がオンを示す場合に、マルチプレクサ５０１−１はｃを選択して出力し、マルチプレクサ５０１−２はｂを選択して出力する。これにより、転置行列Ａ^Tを得ることができる。一方、例えば、転置行列オンオフ信号がオフを示す場合に、マルチプレクサ５０１−１はｂを選択して出力し、マルチプレクサ５０１−２はｃを選択して出力する。これにより、入力行列Ａをそのまま得ることができる。 For example, both the multiplexer 501-1 and the multiplexer 501-2 have b and c as inputs. For example, when the transposed matrix ON / OFF signal indicates ON, the multiplexer 501-1 selects and outputs c, and the multiplexer 501-2 selects and outputs b. Thereby, the transposed matrix ^AT can be obtained. On the other hand, for example, when the transposed matrix ON / OFF signal indicates OFF, the multiplexer 501-1 selects and outputs b, and the multiplexer 501-2 selects and outputs c. Thereby, the input matrix A can be obtained as it is.

（データ処理装置１００の機能的構成例）
図６は、データ処理装置の機能的構成例を示す説明図である。データ処理装置１００は、例えば、バッファ６０１と、変換部１１１と、演算部１１２と、を有する。 (Functional configuration example of the data processing apparatus 100)
FIG. 6 is an explanatory diagram illustrating a functional configuration example of the data processing device. The data processing apparatus 100 includes, for example, a buffer 601, a conversion unit 111, and a calculation unit 112.

また、各部は、論理積回路であるＡＮＤ、否定論理回路であるＩＮＶＥＲＴＥＲ、論理和回路であるＯＲや、ラッチ回路であるＦＦ（ＦｌｉｐＦｌｏｐ）などの素子によって形成されてもよい。また、各部は、スタンダードセルやストラクチャードＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）などの特定用途向けＩＣ（以下、単に「ＡＳＩＣ」と称す。）やＦＰＧＡなどのＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）によって実現されてもよい。具体的には、たとえば、上述した各部の機能をハードウェア記述言語などによってネットリストに機能定義し、そのネットリストを論理合成してＡＳＩＣやＰＬＤに与えることにより、各部を実現してもよい。 Each unit may be formed by an element such as an AND circuit that is a logical product circuit, an INVERTER that is a negative logic circuit, an OR that is a logical sum circuit, or an FF (Flip Flop) that is a latch circuit. Each unit may be realized by an application-specific IC (hereinafter simply referred to as “ASIC”) such as a standard cell or a structured ASIC (Application Specific Integrated Circuit), or a PLD (Programmable Logic Device) such as an FPGA. Specifically, for example, the functions of the respective units described above may be defined in a netlist using a hardware description language or the like, and the netlist may be logically synthesized and provided to the ASIC or PLD to implement each unit.

変換部１１１は、対象の演算命令によってメモリ１０１から取得した第１行列を受け付ける。メモリ１０１は、例えば、上述したデータメモリ２１４である。そして、変換部１１１は、受け付けた第１行列と、第２行列と、のうち、演算命令が指示する演算対象の行列を出力する。第２行列は、第１行列に含まれる複数の要素と同一の要素を含む行列であって、第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた行列である。 The conversion unit 111 receives the first matrix acquired from the memory 101 by the target calculation instruction. The memory 101 is, for example, the data memory 214 described above. And the conversion part 111 outputs the matrix of the calculation object which a calculation command instruct | indicates among the received 1st matrix and 2nd matrix. The second matrix is a matrix including the same elements as the plurality of elements included in the first matrix, and is a matrix in which the arrangement of at least any two of the plurality of elements included in the first matrix is replaced. is there.

演算部１１２は、変換部１１１によって出力された演算対象の行列に対して、演算命令が指示する演算を行う。演算部１１２は、具体的に、例えば、上述した演算データパス３０２である。 The calculation unit 112 performs a calculation instructed by a calculation command on the calculation target matrix output by the conversion unit 111. Specifically, the calculation unit 112 is, for example, the calculation data path 302 described above.

変換部１１１は、選択部６１１と、選択制御部６１２と、共役部６１３と、を有する。選択部６１１は、第１行列と、第２行列と、のうち、いずれかを出力する。選択制御部６１２は、演算命令によって、第１行列と、第２行列と、のうち、いずれかを選択部６１１に出力させる制御を行う。 The conversion unit 111 includes a selection unit 611, a selection control unit 612, and a conjugate unit 613. The selection unit 611 outputs either the first matrix or the second matrix. The selection control unit 612 performs control to cause the selection unit 611 to output either the first matrix or the second matrix in accordance with an arithmetic instruction.

また、選択制御部６１２は、記憶部６２１を有する。記憶部６２１は、行の要素の数および列の要素の数に対応する第１行列に含まれる特定の並びの複数の要素について選択部６１１に出力させる第１行列に含まれる複数の要素の並びを指示する制御信号の値を記憶する。選択制御部６１２は、演算命令が指示する演算対象の行列についての行の要素の数および列の要素の数に基づいて、記憶部６２１から制御信号の値を読み出して選択部６１１に出力する。記憶部６２１は、例えば、後述する図９に示すインデックステーブルである。 The selection control unit 612 includes a storage unit 621. The storage unit 621 includes a plurality of elements included in the first matrix that are output to the selection unit 611 for a plurality of elements in a specific array included in the first matrix corresponding to the number of row elements and the number of column elements. The value of the control signal instructing is stored. The selection control unit 612 reads out the value of the control signal from the storage unit 621 and outputs it to the selection unit 611 based on the number of row elements and the number of column elements for the calculation target matrix indicated by the calculation instruction. The storage unit 621 is, for example, an index table shown in FIG.

選択部６１１は、第１行列に含まれる複数の要素を、選択制御部６１２から出力された制御信号が指示する並びにした複数の要素を含む演算対象の行列を出力する。 The selection unit 611 outputs a matrix to be calculated including a plurality of elements in which the control signal output from the selection control unit 612 indicates a plurality of elements included in the first matrix.

バッファ６０１は、例えば、第１行列に含まれる複数の要素の数以上の所定数の要素を格納可能である。また、バッファ６０１は、対象の演算命令によってメモリ１０１から取得した第１行列に含まれる複数の要素を特定の並びで格納する。本実施の形態では、第１行列の行の要素を優先して格納する。 For example, the buffer 601 can store a predetermined number of elements equal to or greater than the number of elements included in the first matrix. Further, the buffer 601 stores a plurality of elements included in the first matrix acquired from the memory 101 by a target arithmetic instruction in a specific arrangement. In the present embodiment, the row elements of the first matrix are preferentially stored.

選択部６１１は、バッファ６０１に格納された所定数の要素を入力可能な所定数の選択回路の各々が、バッファ６０１に格納された第１行列に含まれる複数の要素のうちのいずれかを選択して出力する。選択制御部６１２は、バッファ６０１に格納された所定数の要素のうち、選択回路の各々に選択させる要素を指示する。 The selection unit 611 selects one of a plurality of elements included in the first matrix stored in the buffer 601 by each of a predetermined number of selection circuits capable of inputting the predetermined number of elements stored in the buffer 601. And output. The selection control unit 612 instructs an element to be selected by each selection circuit among a predetermined number of elements stored in the buffer 601.

また、共役部６１３は、演算対象の行列に含まれる複数の要素の各々が虚部と実部とを有する場合に、演算対象の行列に含まれる複数の要素の各々が有する虚部の符号を入れ替えて、入れ替えた後の前記演算対象の行列を出力する。 In addition, the conjugate unit 613 adds the sign of the imaginary part of each of the plurality of elements included in the calculation target matrix when each of the plurality of elements included in the calculation target matrix has an imaginary part and a real part. The matrix to be calculated is output after replacement.

また、入力行列である第１行列は、行および列の要素の数が互いに同じ複数の第１行列であってもよい。また、変換後の行列である第２行列は、行および列の要素の数が互いに同じ複数の第２行列であってもよい。また、複数の第２行列の各々は、複数の第２行列の各々に対応する複数の第１行列の各々について第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた行列である。 The first matrix that is an input matrix may be a plurality of first matrices having the same number of row and column elements. In addition, the second matrix that is the matrix after conversion may be a plurality of second matrices having the same number of row and column elements. Each of the plurality of second matrices includes an arrangement of at least any two of the plurality of elements included in the first matrix for each of the plurality of first matrices corresponding to each of the plurality of second matrices. This is the replaced matrix.

以上を踏まえて、各部に対応する回路を用いて詳細に動作を説明する。 Based on the above, the operation will be described in detail using a circuit corresponding to each part.

図７は、行列変換例を示す説明図である。ここでは、１または複数の入力行列を扱えるようにし、バッファに納まる要素数であれば任意の要素数の行列を入力可能であり、かつ様々な行列変換が可能となるようにする。ここでは、１度に処理可能な要素の数が１６個の場合を例に挙げる。図７の左側では、２×２の第１行列を４個入力し、２×２の転置行列である第２行列を４個得る例である。図７の右側では、２×３の第１行列を２個入力し、３×２の転置行列である第２行列を２個得る例である。入力行列と変換後の行列は複数であっても、行および列の要素の数は互いに同一である。 FIG. 7 is an explanatory diagram illustrating an example of matrix transformation. Here, one or a plurality of input matrices can be handled, a matrix having an arbitrary number of elements can be input as long as the number of elements can be stored in the buffer, and various matrix transformations can be performed. Here, a case where the number of elements that can be processed at one time is 16 is taken as an example. The left side of FIG. 7 is an example in which four 2 × 2 first matrices are input and four second matrices that are 2 × 2 transposed matrices are obtained. The right side of FIG. 7 is an example in which two 2 × 3 first matrices are input to obtain two second matrices that are 3 × 2 transposed matrices. Even if there are a plurality of input matrices and transformed matrices, the number of elements in the rows and columns is the same.

（データ供給回路３０１例）
図８は、データ供給回路例を示す説明図である。データ供給回路３０１は、例えば、入力バッファ８０１と、マルチプレクサ８０２と、行列変換制御回路８０３と、共役回路８０４と、を有する。 (Example of data supply circuit 301)
FIG. 8 is an explanatory diagram illustrating an example of a data supply circuit. The data supply circuit 301 includes, for example, an input buffer 801, a multiplexer 802, a matrix conversion control circuit 803, and a conjugate circuit 804.

ここで、入力バッファ８０１は、入力行列に含まれる要素を格納可能なレジスタである。入力バッファ８０１のサイズは、例えば、演算データパス３０２での演算される行列のサイズや演算データパス３０２の構成などに応じて決定される。ここでは、入力バッファ８０１は、１６個の要素を格納可能である。入力バッファ８０１は、例えば、行列変換プロセッサにおいてパイプライン処理を行うために設けられる。１個の要素は３２ビットのデータである。 Here, the input buffer 801 is a register capable of storing elements included in the input matrix. The size of the input buffer 801 is determined according to, for example, the size of the matrix to be calculated in the calculation data path 302, the configuration of the calculation data path 302, and the like. Here, the input buffer 801 can store 16 elements. The input buffer 801 is provided, for example, for performing pipeline processing in the matrix conversion processor. One element is 32-bit data.

出力バッファ８０５は、変換後の行列に含まれる要素を格納可能なレジスタである。出力バッファ８０５に格納可能な要素の数は、入力バッファ８０１に格納可能な要素の数に応じた数であり、例えば入力バッファ８０１に格納可能な要素の数と同一である。 The output buffer 805 is a register capable of storing elements included in the converted matrix. The number of elements that can be stored in the output buffer 805 is a number corresponding to the number of elements that can be stored in the input buffer 801, and is the same as the number of elements that can be stored in the input buffer 801, for example.

マルチプレクサ８０２と行列変換制御回路８０３とは変換部１１１である。マルチプレクサ８０２は、第１行列と、第１行列に含まれる複数の要素のうちの少なくとも２つの要素の並びを入れ替えた第２行列と、のいずれかを出力する。マルチプレクサ８０２は、入力バッファ８０１に格納された複数の要素を入れ替えて出力する選択部６１１である。 The multiplexer 802 and the matrix conversion control circuit 803 are the conversion unit 111. The multiplexer 802 outputs either the first matrix or the second matrix in which the arrangement of at least two elements among the plurality of elements included in the first matrix is exchanged. The multiplexer 802 is a selection unit 611 that outputs by exchanging a plurality of elements stored in the input buffer 801.

マルチプレクサ８０２は、所定数のマルチプレクサＭＵＸを有する。所定数は、上述したように、入力バッファ８０１に格納可能な要素の数である。ここでは、マルチプレクサ８０２は、マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５までの１６個のマルチプレクサＭＵＸを有する。マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５の各々は、入力バッファ８０１に格納された１６個の要素をすべて入力可能である。マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５の各々は、マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５の各々に対応するデータ選択制御信号ｓｅｌ０〜ｓｅｌ１５に応じて一つの要素を選択して、対応する出力バッファ８０５に出力する。 The multiplexer 802 has a predetermined number of multiplexers MUX. The predetermined number is the number of elements that can be stored in the input buffer 801 as described above. Here, the multiplexer 802 includes 16 multiplexers MUX from multiplexers MUX_0 to MUX_15. Each of the multiplexers MUX_0 to MUX_15 can input all 16 elements stored in the input buffer 801. Each of the multiplexers MUX_0 to MUX_15 selects one element according to the data selection control signals sel0 to sel15 corresponding to each of the multiplexers MUX_0 to MUX_15, and outputs the selected element to the corresponding output buffer 805.

例えば、マルチプレクサＭＵＸ＿０は、入力バッファ８０１に格納された１６個の要素の中から、データ選択制御信号ｓｅｌ０に基づきいずれか一つの要素を選択して出力する。図８の例では、マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５の各々は、共役回路８０４に出力する。ただし、共役回路８０４がオフ状態である場合、マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５によって選択された各要素は、マルチプレクサＭＵＸ＿０〜ＭＵＸ＿１５の各々に対応する出力バッファ８０５に出力される。例えば、マルチプレクサＭＵＸ＿０によって選択された要素は、出力バッファ８０５−０に格納される。 For example, the multiplexer MUX_0 selects and outputs one of the 16 elements stored in the input buffer 801 based on the data selection control signal sel0. In the example of FIG. 8, each of the multiplexers MUX_0 to MUX_15 outputs to the conjugate circuit 804. However, when the conjugate circuit 804 is in the OFF state, each element selected by the multiplexers MUX_0 to MUX_15 is output to the output buffer 805 corresponding to each of the multiplexers MUX_0 to MUX_15. For example, the element selected by the multiplexer MUX_0 is stored in the output buffer 805-0.

行列変換制御回路８０３は、例えば、インデックステーブル８１１と、共役制御回路８１２と、を有する。インデックステーブル８１１は、変換後の行の要素の数および列の要素の数が指定されると、指定された行の要素の数および列の要素の数に対応する要素を選択可能なデータ選択制御信号ｓｅｌ０〜ｓｅｌ１５を出力する。 The matrix conversion control circuit 803 includes, for example, an index table 811 and a conjugate control circuit 812. The index table 811 is a data selection control capable of selecting an element corresponding to the number of elements in the specified row and the number of elements in the column when the number of elements in the converted row and the number of elements in the column are specified. Signals sel0 to sel15 are output.

図９は、インデックステーブル例を示す説明図である。図９に示すインデックステーブル８１１は、入力行列に含まれる複数の要素から転置行列を得るためのデータ選択信号の値が設定されてある。Ｍは変換後の行の要素の数であり、Ｎは、変換後の列の要素の数である。後述する行列変換モードなどを追加してインデックステーブル８１１の行を追加することによって、転置行列とともに他の行列変換などを行うことも可能である。 FIG. 9 is an explanatory diagram illustrating an example of an index table. In the index table 811 shown in FIG. 9, values of data selection signals for obtaining a transposed matrix from a plurality of elements included in the input matrix are set. M is the number of elements in the row after conversion, and N is the number of elements in the column after conversion. It is also possible to perform other matrix transformations together with the transposed matrix by adding a row of the index table 811 by adding a matrix transformation mode to be described later.

例えば、入力バッファ８０１に格納される１６個の要素のうち、データ選択制御信号ｓｅｌ０〜ｓｅｌ１５の各々に対応する入力バッファ８０１に格納された要素が出力されることとなる。例えばデータ選択制御信号ｓｅｌ０は、図８に示すマルチプレクサＭＵＸ＿０の制御信号となる。 For example, among the 16 elements stored in the input buffer 801, the elements stored in the input buffer 801 corresponding to each of the data selection control signals sel0 to sel15 are output. For example, the data selection control signal sel0 is a control signal for the multiplexer MUX_0 shown in FIG.

ここでは、共役回路８０４について省略して説明する。データ選択制御信号ｓｅｌ０が０であると、マルチプレクサＭＵＸ＿０は、入力バッファ８０１−０が付されたレジスタに格納された要素を出力バッファ８０５−０に出力する。インデックステーブル８１１において×印はドントケアである。入力バッファ８０１−０とは、図８において入力バッファ８０１のＩｎｄｅｘに０が付されたレジスタである。出力バッファ８０５−０とは、図８において出力バッファ８０５のＩｎｄｅｘに０が付されたレジスタである。 Here, the conjugate circuit 804 will be omitted. When the data selection control signal sel0 is 0, the multiplexer MUX_0 outputs the element stored in the register provided with the input buffer 801-0 to the output buffer 805-0. In the index table 811, “x” indicates don't care. The input buffer 801-0 is a register in which 0 is added to the Index of the input buffer 801 in FIG. The output buffer 805-0 is a register in which 0 is added to the Index of the output buffer 805 in FIG.

例えば、Ｍ＝１、Ｎ＝１の場合、インデックステーブル８１１によれば、入力バッファ８０１に入力される要素の並び通りに出力バッファ８０５に要素が出力される。 For example, when M = 1 and N = 1, according to the index table 811, elements are output to the output buffer 805 in the order of elements input to the input buffer 801.

図７の右側に示した２個の２×３行列から２個の３×２の転置行列を得るためには、Ｍ＝３、Ｎ＝２となる。Ｍ＝３、Ｎ＝２の場合、データ選択制御信号ｓｅｌ０は０であり、データ選択制御信号ｓｅｌ１は３である。そして、データ選択制御信号ｓｅｌ２は１であり、データ選択制御信号ｓｅｌ３は４であり、データ選択制御信号ｓｅｌ４は２であり、データ選択制御信号ｓｅｌ５は５である。また、データ選択制御信号ｓｅｌ６は６であり、データ選択制御信号ｓｅｌ７は９であり、データ選択制御信号ｓｅｌ８は７であり、データ選択制御信号ｓｅｌ９は１０であり、データ選択制御信号ｓｅｌ１０は８である。また、データ選択制御信号ｓｅｌ１１は１１であり、データ選択制御信号ｓｅｌ１２〜ｓｅｌ１５はドントケアである。 In order to obtain two 3 × 2 transposed matrices from the two 2 × 3 matrices shown on the right side of FIG. 7, M = 3 and N = 2. When M = 3 and N = 2, the data selection control signal sel0 is 0 and the data selection control signal sel1 is 3. The data selection control signal sel2 is 1, the data selection control signal sel3 is 4, the data selection control signal sel4 is 2, and the data selection control signal sel5 is 5. Further, the data selection control signal sel6 is 6, the data selection control signal sel7 is 9, the data selection control signal sel8 is 7, the data selection control signal sel9 is 10, and the data selection control signal sel10 is 8. is there. The data selection control signal sel11 is 11, and the data selection control signals sel12 to sel15 are don't cares.

図７の右側に示すように、２個の２×３行列の場合、要素の数は１２であり、入力バッファ８０１に格納される１６個の要素のうちの４個の要素は使用しないことになるため、データ選択制御信号ｓｅｌ１２〜ｓｅｌ１５はドントケアとなる。このように、簡単な制御処理によって入力行列を演算対象の行列に変換することができる。 As shown on the right side of FIG. 7, in the case of two 2 × 3 matrices, the number of elements is 12, and four of the 16 elements stored in the input buffer 801 are not used. Therefore, the data selection control signals sel12 to sel15 are don't care. In this way, the input matrix can be converted into a matrix to be calculated by a simple control process.

図８の説明に戻って、共役回路８０４は、共役制御信号に応じてマルチプレクサ８０２から出力された各要素の虚部の符号を入れ替える共役部６１３である。虚部の符号を入れ替えることは、共役複素数に取り替えるとも称する。 Returning to the description of FIG. 8, the conjugate circuit 804 is a conjugate unit 613 that exchanges the signs of the imaginary parts of the elements output from the multiplexer 802 in accordance with the conjugate control signal. Replacing the sign of the imaginary part is also referred to as replacing it with a conjugate complex number.

図１０は、共役回路例を示す説明図である。ここでは、１つの要素が例えば３２ビットのデータであり、３２ビットのデータのうち上位１６ビットのデータが実部であり、下位１６ビットのデータが虚部である。共役回路８０４は、３２ビットのデータのうち虚部の１６ビットのデータを虚部の符号を入れ替える回路ｃｏｍｐ２に入力する。回路ｃｏｍｐ２は、共役制御信号の値に応じて２の補数を計算する回路である。 FIG. 10 is an explanatory diagram illustrating an example of a conjugate circuit. Here, one element is, for example, 32-bit data. Among the 32-bit data, upper 16-bit data is a real part, and lower 16-bit data is an imaginary part. The conjugate circuit 804 inputs the 16-bit data of the imaginary part of the 32-bit data to the circuit comp2 that replaces the sign of the imaginary part. The circuit comp2 is a circuit that calculates a 2's complement according to the value of the conjugate control signal.

回路ｃｏｍｐ２は、例えば、共役制御信号の値が０の場合、２の補数を計算せずに入力された１６ビットのデータをそのまま出力し、共役制御信号の値が１の場合、２の補数を計算して計算後の１６ビットのデータを出力する。 For example, if the value of the conjugate control signal is 0, the circuit comp2 outputs the input 16-bit data as it is without calculating the 2's complement, and if the value of the conjugate control signal is 1, the 2's complement is output. The calculated 16-bit data is output.

図１１は、共役制御信号の出力例を示す説明図である。ここでは、例えば、命令デコーダ３０４から共役制御回路８１２に入力されるＴＦｍｏｄｅの値は、例えば、０であれば変換しないことを示し、１であれば転置行列に変換することを示し、２であればエルミート行列に変換することを示す。共役制御回路８１２は、例えば、ＴＦｍｏｄｅの値が０または１であれば、共役制御信号の値を０に設定して共役回路８０４に出力する。 FIG. 11 is an explanatory diagram illustrating an output example of the conjugate control signal. Here, for example, if the value of TFmode input from the instruction decoder 304 to the conjugate control circuit 812 is 0, for example, it indicates that conversion is not performed, and if it is 1, it indicates that it is converted into a transposed matrix. Indicates conversion to Hermitian matrix. For example, if the value of TFmode is 0 or 1, the conjugate control circuit 812 sets the value of the conjugate control signal to 0 and outputs it to the conjugate circuit 804.

共役制御回路８１２は、例えば、ＴＦｍｏｄｅの値が２であれば、共役制御信号の値を１に設定して共役回路８０４に出力する。これにより、入力行列のエルミート行列を求める場合に、共役回路８０４がオン状態となり、変換なしや転置行列の場合には、共役回路８０４がオフ状態となる。 For example, if the value of TFmode is 2, the conjugate control circuit 812 sets the value of the conjugate control signal to 1 and outputs it to the conjugate circuit 804. As a result, the conjugate circuit 804 is turned on when obtaining the Hermitian matrix of the input matrix, and the conjugate circuit 804 is turned off when there is no conversion or a transposed matrix.

図１２は、命令実行処理手順例を示すフローチャートである。命令実行が開始されると、命令デコーダ３０４が命令メモリ２１３から命令を取得してデコードする（ステップＳ１２０１）。つぎに、データ供給回路３０１が、デコード結果に基づきデータをロードする（ステップＳ１２０２）。データ供給回路３０１が、デコード結果に基づき、入力行列をそのまま出力または変換して出力する（ステップＳ１２０３）。 FIG. 12 is a flowchart illustrating an example of an instruction execution processing procedure. When instruction execution is started, the instruction decoder 304 acquires an instruction from the instruction memory 213 and decodes it (step S1201). Next, the data supply circuit 301 loads data based on the decoding result (step S1202). The data supply circuit 301 outputs or converts the input matrix as it is based on the decoding result and outputs it (step S1203).

演算データパス３０２が演算を行い、データストア回路３０３が演算結果をメモリ１０１に書き戻す（ステップＳ１２０４）。そして、データ供給回路３０１が、ストリームの全データの処理が終了したか否かを判断する（ステップＳ１２０５）。ストリームの全データの処理が終了していないと判断した場合（ステップＳ１２０５：Ｎｏ）、データ供給回路３０１が、ステップＳ１２０２へ戻って、データをロードする。ストリームの全データの処理が終了したと判断した場合（ステップＳ１２０５：Ｙｅｓ）、命令実行が終了する。 The operation data path 302 performs the operation, and the data store circuit 303 writes the operation result back to the memory 101 (step S1204). Then, the data supply circuit 301 determines whether or not processing of all the data of the stream has been completed (step S1205). If it is determined that the processing of all the data of the stream has not been completed (step S1205: No), the data supply circuit 301 returns to step S1202 and loads the data. If it is determined that the processing of all the data in the stream has been completed (step S1205: Yes), the command execution ends.

図１３は、ラップアラウンド処理回路を含むデータ供給回路の一構成例を示す説明図である。入力ＦＩＦＯ１３０１は、幅Ｍのデータを複数個格納可能なバッファキューである。Ｍは正の整数である。ここでは、入力バッファ８０１は、１６個の要素を２つ格納可能である。図１３の例では、１個の要素のデータ幅は３２ビットまでである。 FIG. 13 is an explanatory diagram showing a configuration example of a data supply circuit including a wraparound processing circuit. The input FIFO 1301 is a buffer queue capable of storing a plurality of data having a width M. M is a positive integer. Here, the input buffer 801 can store two 16 elements. In the example of FIG. 13, the data width of one element is up to 32 bits.

図示省略するが、メモリアクセスユニットは、データメモリ２１４に格納されたソースデータのデータ長のデータを読み出して１つ又は複数個の幅Ｍのデータとして入力ＦＩＦＯ１３０１に格納する。ソースデータのデータ長、即ち、演算対象となるソースデータの全体の長さを、ストリーム長ＳＬＳとも呼ぶ。例えば、演算単位が２×２の実数行列（演算ユニット長ＵＬ＝４ｓｈｏｒｔ）であり、１０００個の行列が演算対象となる場合、ストリーム長ＳＬＳは４０００ｓｈｏｒｔである。 Although not shown, the memory access unit reads the data having the data length of the source data stored in the data memory 214 and stores it in the input FIFO 1301 as one or a plurality of widths M. The data length of the source data, that is, the entire length of the source data to be calculated is also referred to as a stream length SLS. For example, when the arithmetic unit is a 2 × 2 real number matrix (arithmetic unit length UL = 4short) and 1000 matrices are to be operated, the stream length SLS is 4000 shorts.

具体的には、メモリアクセスユニットは、データメモリ２１４に格納されたデータ長ＳＬＳのデータの先頭から、データメモリ２１４の１ラインに等しいＭ個のデータを読み出す。データメモリ２１４の１ラインは、データメモリ２１４とデータ供給回路３０１との間のバスの幅に等しい。メモリアクセスユニットは、データメモリ２１４とデータ供給回路３０１との間の幅Ｍのバスを介して受け取った幅Ｍのデータを、入力ＦＩＦＯ１３０１に書き込む。入力ＦＩＦＯ１３０１は、幅Ｍのデータを順次格納することができ、先に格納された幅Ｍのデータから順番に読み出すことができる。 Specifically, the memory access unit reads M data equal to one line of the data memory 214 from the head of the data having the data length SLS stored in the data memory 214. One line of the data memory 214 is equal to the width of the bus between the data memory 214 and the data supply circuit 301. The memory access unit writes the data of width M received via the bus of width M between the data memory 214 and the data supply circuit 301 to the input FIFO 1301. The input FIFO 1301 can sequentially store the data of the width M, and can sequentially read from the data of the width M stored previously.

ラップアラウンド処理回路４０１は、入力ＦＩＦＯ１３０１からＰ（≦Ｍ）個（ｓｈｏｒｔ）の連続した単位データを選択することにより幅Ｐのデータを読み出す動作を、複数回繰り返すことにより、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータを隙間なく順番に読み出す。具体的には、ラップアラウンド処理回路４０１は、最初に、入力ＦＩＦＯ１３０１の最も先に格納された幅ＭのデータのＭ個の単位データのうちで、先頭からＰ（≦Ｍ）個の連続した単位データを選択する。ラップアラウンド処理回路４０１は、選択したＰ個の単位データを、行列変換回路４０２に供給してもよい。ただし、ラップアラウンド処理回路４０１と行列変換回路４０２と、の間のデータ転送幅を固定（例えば幅Ｍ）とした場合、ラップアラウンド処理回路４０１は、選択したＰ個の単位データを含む例えば幅Ｍのデータを、行列変換回路４０２に供給してよい。このとき、選択したＰ個の単位データ以外のＭ−Ｐ個の単位データについては、どのような値であってもよい。 The wrap-around processing circuit 401 repeats the operation of reading out data of width P by selecting P (≦ M) consecutive unit data from the input FIFO 1301 a plurality of times, thereby performing a plurality of operations from the input FIFO 1301. Read data of width P in order without gaps. Specifically, the wrap-around processing circuit 401 first has P (≦ M) consecutive units from the beginning among the M unit data of the data of the width M stored first in the input FIFO 1301. Select data. The wraparound processing circuit 401 may supply the selected P unit data to the matrix conversion circuit 402. However, when the data transfer width between the wraparound processing circuit 401 and the matrix conversion circuit 402 is fixed (for example, width M), the wraparound processing circuit 401 includes, for example, width M including the selected P unit data. May be supplied to the matrix conversion circuit 402. At this time, the MP unit data other than the selected P unit data may have any value.

Ｐ個の連続した単位データを選択した後、ラップアラウンド処理回路４０１は既に選択した最後の単位データのつぎの単位データからＰ個の連続した単位データを選択し、選択したＰ個の単位データを、演算データパス３０２に供給してよい。これを繰り返すことにより、ラップアラウンド処理回路４０１は複数個の幅Ｐのデータを隙間なく順番に入力ＦＩＦＯ１３０１から読み出す。なお、ラップアラウンド処理回路４０１により選択する単位データが幅Ｍのデータの終端の単位データになる場合には、ラップアラウンド処理回路４０１は、つぎの順番の幅Ｍのデータを入力ＦＩＦＯ１３０１から読み出す。そして、ラップアラウンド処理回路４０１は、この新たな幅Ｍのデータの先頭の単位データおよびそれに続く単位データを選択し続ければよい。 After selecting P consecutive unit data, the wraparound processing circuit 401 selects P consecutive unit data from the next unit data after the last selected unit data, and selects the selected P unit data. May be supplied to the operation data path 302. By repeating this, the wrap-around processing circuit 401 reads a plurality of data of width P from the input FIFO 1301 in order without any gaps. Note that when the unit data selected by the wraparound processing circuit 401 becomes the unit data at the end of the data of width M, the wraparound processing circuit 401 reads the data of the next width M from the input FIFO 1301. Then, the wrap-around processing circuit 401 may continue to select the head unit data and the subsequent unit data of the new width M data.

具体的に、マルチプレクサ１３１１は、１６個のマルチプレクサ８０２を有する。マルチプレクサ８０２は、入力バッファ８０１の出力する幅６４のデータＢＵＦＯＵＴから、ＷＭ制御回路１３１２が供給するＷＭ制御信号の指定するＰ個の連続した単位データを選択する。 Specifically, the multiplexer 1311 has 16 multiplexers 802. The multiplexer 802 selects P consecutive unit data designated by the WM control signal supplied from the WM control circuit 1312 from the data BUFOUT having a width of 64 output from the input buffer 801.

マルチプレクサ８０２は、上述した、行列変換回路４０２に含まれる選択部である。行列変換制御回路８０３は、行列変換回路４０２に含まれる選択制御部である。 The multiplexer 802 is a selection unit included in the matrix conversion circuit 402 described above. The matrix conversion control circuit 803 is a selection control unit included in the matrix conversion circuit 402.

図１４は、ＷＭ処理回路と行列変換回路とによる要素の選択例を示す説明図である。ＷＭ制御回路１３１２は、入力バッファ８０１−４〜入力バッファ８０１−１９に格納された要素を選択するようにＷＭ制御信号を出力する。 FIG. 14 is an explanatory diagram showing an example of element selection by the WM processing circuit and the matrix conversion circuit. The WM control circuit 1312 outputs a WM control signal so as to select elements stored in the input buffer 801-4 to the input buffer 801-19.

そして、行列変換制御回路８０３は、２×２の転置行列を得るためのＴＦ制御信号を出力する。具体的に、例えば、２×２の転置行列を得る場合、行列変換制御回路８０３は、左のマルチプレクサ８０２から順に０，２，１，３，４，６，５，７，８，１０，９，１１，１２，１４，１３，１５を選択するようにＴＦ制御信号を出力する。 The matrix conversion control circuit 803 outputs a TF control signal for obtaining a 2 × 2 transposed matrix. Specifically, for example, when obtaining a 2 × 2 transposed matrix, the matrix transformation control circuit 803 sequentially starts with 0, 2, 1, 3, 4, 6, 5, 7, 8, 10, 9 from the left multiplexer 802. , 11, 12, 14, 13, 15 are output so as to select a TF control signal.

ここで、各マルチプレクサ８０２は回路規模が大きいため、データ処理装置１００の規模が大きくなる。そこで、本実施の形態では、ラップアラウンド処理回路４０１におけるマルチプレクサ１３１１と、行列変換回路４０２におけるマルチプレクサ８０２と、を共有して要素の選択処理をマージする。これにより、回路規模の縮小を図ることができる。 Here, since each multiplexer 802 has a large circuit scale, the scale of the data processing apparatus 100 increases. Therefore, in this embodiment, the multiplexer 1311 in the wrap-around processing circuit 401 and the multiplexer 802 in the matrix conversion circuit 402 are shared to merge element selection processing. As a result, the circuit scale can be reduced.

選択部６１１は、入力バッファ８０１に格納された所定数の要素のうち第１行列に含まれる複数の要素が格納された位置データと、第１制御信号と、に基づいて、第２制御信号を生成する。第１制御信号は、行列変換制御回路８０３によって出力される信号である。第２制御信号は、入力バッファ８０１に格納された所定数の要素のうち選択回路の各々に選択させる要素を指示する信号である。そして、選択部６１１は、選択回路の各々が生成した第２制御信号に基づいてバッファに格納された所定数の要素のうちのいずれかを選択して出力する。 The selection unit 611 selects the second control signal based on the position data storing a plurality of elements included in the first matrix among the predetermined number of elements stored in the input buffer 801 and the first control signal. Generate. The first control signal is a signal output by the matrix transformation control circuit 803. The second control signal is a signal indicating an element to be selected by each selection circuit among a predetermined number of elements stored in the input buffer 801. Then, the selection unit 611 selects and outputs one of a predetermined number of elements stored in the buffer based on the second control signal generated by each of the selection circuits.

図１５は、ラップアラウンド処理回路を含むデータ供給回路の別の構成例を示す説明図である。データ供給回路３０１は、入力ＦＩＦＯ１３０１と、入力バッファ８０１と、ＷＭ制御回路１３１２と、行列変換制御回路８０３と、インデックス選択マルチプレクサ１５０１と、データ選択マルチプレクサ１５０２と、共役回路８０４と、を有する。 FIG. 15 is an explanatory diagram illustrating another configuration example of the data supply circuit including the wraparound processing circuit. The data supply circuit 301 includes an input FIFO 1301, an input buffer 801, a WM control circuit 1312, a matrix conversion control circuit 803, an index selection multiplexer 1501, a data selection multiplexer 1502, and a conjugate circuit 804.

インデックス選択マルチプレクサ１５０１とデータ選択マルチプレクサ１５０２とは、選択部６１１である。インデックス選択マルチプレクサ１５０１は、バッファ６０１に格納された所定数の要素のうち第１行列に含まれる複数の要素が格納された位置データと、第１制御信号と、に基づいて、第２制御信号を生成する。第２制御信号は、バッファ６０１に格納された所定数の要素のうち選択回路の各々に選択させる要素を指示する信号である。 The index selection multiplexer 1501 and the data selection multiplexer 1502 are the selection unit 611. The index selection multiplexer 1501 receives the second control signal based on the position data storing a plurality of elements included in the first matrix among the predetermined number of elements stored in the buffer 601 and the first control signal. Generate. The second control signal is a signal indicating an element to be selected by each of the selection circuits among a predetermined number of elements stored in the buffer 601.

具体的に、インデックス選択マルチプレクサ１５０１は、例えば、１６個のマルチプレクサを有する。１６個のマルチプレクサの各々は、位置データであるＷＭ制御信号を入力とし、第１制御信号であるＴＦ制御信号に基づいていずれかのデータ選択制御信号を出力する。インデックス選択マルチプレクサ１５０１から出力されるデータ選択制御信号が第２制御信号である。 Specifically, the index selection multiplexer 1501 has, for example, 16 multiplexers. Each of the 16 multiplexers receives a WM control signal that is position data, and outputs any data selection control signal based on a TF control signal that is a first control signal. The data selection control signal output from the index selection multiplexer 1501 is the second control signal.

そして、データ選択マルチプレクサ１５０２は、インデックス選択マルチプレクサ１５０１によって出力されたデータ選択制御信号に基づいて、入力バッファ８０１に格納された要素からいずれかの要素を選択して出力する。具体的に、インデックス選択マルチプレクサ１５０１は、入力バッファ８０１に格納された３２個の要素から、データ選択制御信号に基づいて、１６個の要素を選択して出力する。 The data selection multiplexer 1502 selects and outputs one of the elements stored in the input buffer 801 based on the data selection control signal output by the index selection multiplexer 1501. Specifically, the index selection multiplexer 1501 selects and outputs 16 elements from 32 elements stored in the input buffer 801 based on the data selection control signal.

図１３に示すマルチプレクサ１３１１およびマルチプレクサ８０２に対して図１５に示すインデックス選択マルチプレクサ１５０１およびデータ選択マルチプレクサ１５０２では、３２ビットのマルチプレクサから５ビットのマルチプレクサに削減している。図１３と図１５とを比較すると、インデックス選択マルチプレクサ１５０１およびデータ選択マルチプレクサ１５０２とによって、回路規模を約１／６倍に縮小することができる。 In contrast to the multiplexer 1311 and the multiplexer 802 shown in FIG. 13, the index selection multiplexer 1501 and the data selection multiplexer 1502 shown in FIG. 15 are reduced from a 32-bit multiplexer to a 5-bit multiplexer. Comparing FIG. 13 with FIG. 15, the index selection multiplexer 1501 and the data selection multiplexer 1502 can reduce the circuit scale to about 1/6.

図１６は、図１５に示すデータ供給回路による行列変換例を示す説明図である。ここでは、図７に示したように２×２の４つの入力行列から２×２の４つの転置行列に変換する例である。また、ここでは、２×２の４つの入力行列に含まれる要素は、入力バッファ８０１−４〜入力バッファ８０１−１９に格納されてある。 FIG. 16 is an explanatory diagram showing an example of matrix conversion by the data supply circuit shown in FIG. Here, as shown in FIG. 7, an example of conversion from four 2 × 2 input matrices to two 2 × 2 transposed matrices is shown. In addition, here, elements included in four 2 × 2 input matrices are stored in the input buffer 801-4 to the input buffer 801-19.

行列変換制御回路８０３は、左のデータマルチプレクサ１５０１から順に０，２，１，３，４，６，５，７，８，１０，９，１１，１２，１４，１３，１５を選択するようにＴＦ制御信号を出力する。ＷＭ制御回路１３１２は、入力バッファ８０１−４〜入力バッファ８０１−１９を選択するようにＷＭ制御信号を出力する。 The matrix conversion control circuit 803 selects 0, 2, 1, 3, 4, 6, 5, 7, 8, 10, 9, 11, 12, 14, 13, 15 in order from the left data multiplexer 1501. A TF control signal is output. The WM control circuit 1312 outputs a WM control signal so as to select the input buffer 801-4 to the input buffer 801-19.

＜ラップアラウンド処理回路４０１＞
ここで、ラップアラウンド処理回路４０１の構成および動作例について詳細に説明する。 <Wrap-around processing circuit 401>
Here, a configuration and an operation example of the wraparound processing circuit 401 will be described in detail.

図１７は、メモリアクセスユニットおよびデータ供給回路の処理を模式的に示す図（その１）である。図１７に示す処理は、ＳＬＳ＞Ｍの場合に実行される処理である。 FIG. 17 is a diagram (part 1) schematically illustrating processing of the memory access unit and the data supply circuit. The process shown in FIG. 17 is a process executed when SLS> M.

図１７（ａ）に示すようにデータメモリ２１４にはストリーム長ＳＬＳのデータが格納される。ストリーム長ＳＬＳは、幅Ｍよりも長い。このストリーム長ＳＬＳのデータが、メモリアクセスユニットにより、幅Ｍごとに読み出され、入力ＦＩＦＯ１３０１に格納される。図１７（ｂ）には、入力ＦＩＦＯ１３０１に格納されたデータ５１を示す。この入力ＦＩＦＯ１３０１に格納されたデータから、Ｐ（≦Ｍ）個の連続した単位データを選択することにより幅Ｐのデータを読み出す動作を、複数回繰り返すことにより、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータ６１〜６４を隙間なく順番に読み出す。幅Ｐのデータ６５はデータ５１の終端部分にかかってしまうので、幅Ｐのデータ６５を読み出す前までに、メモリアクセスユニットにより、ストリーム長ＳＬＳのデータをデータメモリ２１４から読み出して、入力ＦＩＦＯ１３０１にデータ５２として格納しておく。これにより、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータ６１〜６９を隙間なく順番に読み出すことができる。なお、図１７（ｃ）に示すように、幅Ｐのデータ６１〜６９の各々は、演算サイクルごとに、即ち各演算サイクルにおいて１つずつ読み出される。 As shown in FIG. 17A, the data memory 214 stores data having a stream length SLS. The stream length SLS is longer than the width M. Data of this stream length SLS is read for each width M by the memory access unit and stored in the input FIFO 1301. FIG. 17B shows data 51 stored in the input FIFO 1301. By selecting P (≦ M) continuous unit data from the data stored in the input FIFO 1301 and repeating the operation of reading the data of the width P a plurality of times, the input FIFO 1301 has a plurality of widths P. Data 61 to 64 are read sequentially without any gaps. Since the data 65 of the width P is applied to the end portion of the data 51, the data of the stream length SLS is read from the data memory 214 by the memory access unit before the data 65 of the width P is read, and the data is input to the input FIFO 1301. Stored as 52. As a result, the data 61 to 69 having a plurality of widths P can be sequentially read from the input FIFO 1301 without gaps. As shown in FIG. 17C, each of the data 61 to 69 having the width P is read out every calculation cycle, that is, one in each calculation cycle.

なお、図１７の動作例では、ストリーム長ＳＬＳのデータをデータメモリ２１４から読み出して、入力ＦＩＦＯ１３０１にデータ５１として格納している。そして更にその後、同一のストリーム長ＳＬＳのデータをデータメモリ２１４から読み出して、入力ＦＩＦＯ１３０１にデータ５２として格納している。このような構成にする代わりに、入力ＦＩＦＯ１３０１内に既に格納されているデータ５１を使い回して、データ５２に相当するデータ部分を入力ＦＩＦＯ１３０１に配置してもよい。 In the operation example of FIG. 17, the data of the stream length SLS is read from the data memory 214 and stored as data 51 in the input FIFO 1301. After that, data having the same stream length SLS is read from the data memory 214 and stored as data 52 in the input FIFO 1301. Instead of such a configuration, the data 51 already stored in the input FIFO 1301 may be reused, and the data portion corresponding to the data 52 may be arranged in the input FIFO 1301.

図１８は、メモリアクセスユニットおよびデータ供給回路の処理を模式的に示す図（その２）である。図１８に示す処理は、ＳＬＳ≦Ｍの場合に実行される処理である。 FIG. 18 is a diagram (part 2) schematically illustrating processing of the memory access unit and the data supply circuit. The process shown in FIG. 18 is a process executed when SLS ≦ M.

図１８（ａ）に示すようにデータメモリ２１４にはストリーム長ＳＬＳのデータが格納されている。ストリーム長ＳＬＳは、幅Ｍよりも短い。このストリーム長ＳＬＳのデータが、メモリアクセスユニットにより、幅Ｍのデータとしてロードされ、入力ＦＩＦＯ１３０１に格納される。図１８（ｂ）には、入力ＦＩＦＯ１３０１に格納されたデータ７０を示す。この入力ＦＩＦＯ１３０１に格納されたデータから、Ｐ（≦Ｍ）個の連続した単位データを選択することにより幅Ｐのデータを読み出す動作を、複数回繰り返すことにより、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータ７１〜７５を隙間なく順番に読み出す。ただし、幅Ｐのデータ７３の場合、データ７０の終端部分にかかってしまうので、データ７０の先端部分に戻り、先端部分から続けてデータを選択して読み出すことになる。これは幅Ｐのデータ７５についても同様である。このようにして、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータ７１〜７５を隙間なく順番に読み出すことができる。なお、幅Ｐのデータ７１〜７５の各々は、演算サイクルごとに、即ち各演算サイクルにおいて１つずつ読み出される。 As shown in FIG. 18A, the data memory 214 stores data having a stream length SLS. The stream length SLS is shorter than the width M. This stream length SLS data is loaded as data of width M by the memory access unit and stored in the input FIFO 1301. FIG. 18B shows data 70 stored in the input FIFO 1301. By selecting P (≦ M) continuous unit data from the data stored in the input FIFO 1301 and repeating the operation of reading the data of the width P a plurality of times, the input FIFO 1301 has a plurality of widths P. Data 71 to 75 are read sequentially without any gaps. However, in the case of the data 73 having the width P, the data 70 is applied to the end portion of the data 70, so that the data is returned to the front end portion of the data 70, and data is selected and read continuously from the front end portion. The same applies to the data 75 of the width P. In this way, the data 71 to 75 having a plurality of widths P can be sequentially read from the input FIFO 1301 without a gap. Each of the data 71 to 75 having the width P is read out every calculation cycle, that is, one in each calculation cycle.

ここで、ラップアラウンド処理回路について詳細に説明する。図１９以降では、ラップアラウンド処理回路について説明するため、行列変換回路４０２については図示などを省略する。 Here, the wrap-around processing circuit will be described in detail. In FIG. 19 and subsequent figures, illustration of the matrix conversion circuit 402 is omitted to describe the wraparound processing circuit.

図１９は、ラップアラウンド処理回路の詳細例を示す図である。ラップアラウンド処理回路は、データ選択部１９０１とＷＭ制御回路１３１２とを有する。データ選択部１９０１は、セレクタ回路１９１１と、バッファ回路１９１２と、結合回路１９１３と、マルチプレクサ１３１１と、結合回路１９１４と、を有する。マルチプレクサ１３１１は、セレクタ８４−１〜８４−３２を有する。 FIG. 19 is a diagram illustrating a detailed example of the wrap-around processing circuit. The wraparound processing circuit includes a data selection unit 1901 and a WM control circuit 1312. The data selection unit 1901 includes a selector circuit 1911, a buffer circuit 1912, a coupling circuit 1913, a multiplexer 1311, and a coupling circuit 1914. The multiplexer 1311 includes selectors 84-1 to 84-32.

入力ＦＩＦＯ１３０１の最も先に格納された幅Ｍ（この例で３２ｓｈｏｒｔ）のデータが、ＷＭ制御回路１３１２からのＰＯＰ信号の「１」に応答して、入力ＦＩＦＯ１３０１から読み出され、セレクタ回路１９１１を介してバッファ回路１９１２に格納される。この例では幅Ｍは、３２ｓｈｏｒｔである。この時、セレクタ回路１９１１はＰＯＰ信号の「１」により、図１９の右側の入力を選択する状態となっている。幅３２のデータがバッファ回路１９１２に格納された状態で、入力ＦＩＦＯ１３０１が出力している幅３２のデータは、バッファ回路１９１２に格納されたデータのつぎのデータとなっている。入力ＦＩＦＯ１３０１が出力している幅３２のデータは、現時点で最も先に格納された幅３２のデータである。 The data of the width M (32 short in this example) stored first in the input FIFO 1301 is read from the input FIFO 1301 in response to the POP signal “1” from the WM control circuit 1312, and passes through the selector circuit 1911. And stored in the buffer circuit 1912. In this example, the width M is 32short. At this time, the selector circuit 1911 selects the input on the right side of FIG. 19 according to “1” of the POP signal. In the state where the data of width 32 is stored in the buffer circuit 1912, the data of width 32 output from the input FIFO 1301 is the next data after the data stored in the buffer circuit 1912. The data of width 32 output from the input FIFO 1301 is the data of width 32 stored first at the present time.

なお、ＰＯＰ信号の「１」に応答して、メモリアクセスユニットにより、ストリーム長ＳＬＳのデータのうち入力ＦＩＦＯ１３０１に未だ格納していない残りのデータをデータメモリ２１４から読み出して、入力ＦＩＦＯ１３０１に後続データとして格納してよい。この際、データメモリ２１４から読み出したデータが、ストリーム長ＳＬＳのデータの終端に至った場合には、つぎのＰＯＰ信号の「１」に応答して、ストリーム長ＳＬＳのデータの始端から読み出しを再開してよい。この場合、図１８（ｂ）に示すように、ストリーム長ＳＬＳのデータの始端が、前に読み出したストリーム長ＳＬＳのデータの終端に隙間なく続くように、入力ＦＩＦＯ１３０１にデータを格納してよい。 In response to “1” of the POP signal, the memory access unit reads the remaining data not yet stored in the input FIFO 1301 from the data of the stream length SLS from the data memory 214 and stores it in the input FIFO 1301 as subsequent data. May be stored. At this time, when the data read from the data memory 214 reaches the end of the data of the stream length SLS, the reading is resumed from the start of the data of the stream length SLS in response to “1” of the next POP signal. You can do it. In this case, as shown in FIG. 18B, the data may be stored in the input FIFO 1301 so that the start of the data of the stream length SLS continues without a gap from the end of the data of the stream length SLS read out before.

結合回路１９１３は、バッファ回路１９１２の格納する１つの幅３２のデータと、入力ＦＩＦＯ１３０１の出力するつぎの幅３２のデータとを並べて構成した、幅６４のデータＢＵＦＯＵＴを出力する。このデータＢＵＦＯＵＴの長さは、６４ｓｈｏｒｔ×１６ビットの１０２４ビットである。 The combining circuit 1913 outputs data BUFOUT having a width of 64, which is formed by arranging the data having one width 32 stored in the buffer circuit 1912 and the data having the next width 32 output from the input FIFO 1301. The length of the data BUFOUT is 1024 bits of 64 shorts × 16 bits.

マルチプレクサ１３１１は、結合回路１９１３の出力する幅６４のデータＢＵＦＯＵＴから、ＷＭ制御回路１３１２が供給するＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１の指定するＰ個の連続した単位データを選択する。実際には、データ選択部１９０１の出力は幅３２（ｓｈｏｒｔ）であるので、選択したＰ個の連続した単位データは、幅３２の出力データのうちの連続した一部に配置されてよい。連続した一部とは、具体的に、例えば、左端の連続した一部分である。演算データパス３０２は、データ処理幅Ｐのデータのみを演算対象とするので、データ選択部１９０１の出力する幅３２のデータのうちで例えば左端の連続したＰ個の単位データを対象として演算を実行すればよい。 The multiplexer 1311 selects P consecutive unit data designated by the WM control signals SEL00 to SEL31 supplied from the WM control circuit 1312 from the data BUFOUT having a width of 64 output from the combining circuit 1913. Actually, since the output of the data selection unit 1901 has a width 32 (short), the selected P consecutive unit data may be arranged in a continuous part of the output data having the width 32. Specifically, the continuous part is, for example, a continuous part at the left end. Since the operation data path 302 targets only data having the data processing width P, the operation is performed on, for example, P unit data at the left end among the data of the width 32 output from the data selection unit 1901. do it.

具体的には、セレクタ８４−１が、幅６４のデータＢＵＦＯＵＴのうち、ＷＭ制御信号ＳＥＬ００の指し示す位置にある１ｓｈｏｒｔの単位データを選択して出力する。またセレクタ８４−２が、幅６４のデータＢＵＦＯＵＴのうち、ＷＭ制御信号ＳＥＬ０１の指し示す位置にある１ｓｈｏｒｔの単位データを選択して出力する。以下同様であり、セレクタ８４−３２が、幅６４のデータＢＵＦＯＵＴのうち、ＷＭ制御信号ＳＥＬ３１の指し示す位置にある１ｓｈｏｒｔの単位データを選択して出力する。 Specifically, the selector 84-1 selects and outputs 1 short unit data at the position indicated by the WM control signal SEL00 from the data BUFOUT having a width of 64. Further, the selector 84-2 selects and outputs 1 short unit data at the position indicated by the WM control signal SEL01 from the data BUFOUT having a width of 64. The same applies to the following, and the selector 84-32 selects and outputs 1 short unit data at the position indicated by the WM control signal SEL31 from the data BUFOUT having a width of 64.

図２０は、ＷＭ制御回路による選択動作の一例を示す図である。図２０に示す例では、幅Ｍが３２（ｓｈｏｒｔ）、ストリーム長ＳＬＳが３４（ｓｈｏｒｔ）、データ処理幅Ｐが８（ｓｈｏｒｔ）である。図２０の表に示すＳＬＳ＿ＭＯＤ、ＯＦＦＳＥＴについては、後程説明する。データ処理幅Ｐが８であるので、以下の説明においては、図１９に示す左端の８個のセレクタ８４−１〜８４−８に供給されるＷＭ制御信号ＳＥＬ００〜ＳＥＬ０７のみに着目する。 FIG. 20 is a diagram illustrating an example of the selection operation by the WM control circuit. In the example shown in FIG. 20, the width M is 32 (short), the stream length SLS is 34 (short), and the data processing width P is 8 (short). SLS_MOD and OFFSET shown in the table of FIG. 20 will be described later. Since the data processing width P is 8, only the WM control signals SEL00 to SEL07 supplied to the eight leftmost selectors 84-1 to 84-8 shown in FIG.

まず、ストリーム長ＳＬＳが３４であるデータの先頭の３２個の単位データが図１９のバッファ回路１９１２に格納され、残りの２個の単位データが、入力ＦＩＦＯ１３０１の出力しているデータの左端に格納された状態であるとする。なお、前述のように、入力ＦＩＦＯ１３０１の出力しているデータにおいては、左端の上記の２個の単位データに続くようにして、その右側に、ストリーム長ＳＬＳが３４であるデータの先頭の部分のデータ（先頭の３０個の単位データ）が格納されている。このように、メモリアクセスユニットにより、ストリーム長ＳＬＳのデータをデータメモリ２１４から随時読み出して、入力ＦＩＦＯ１３０１に後続データとして格納する動作が、継続的に実行される。 First, the 32 unit data at the head of the data whose stream length SLS is 34 is stored in the buffer circuit 1912 in FIG. 19, and the remaining two unit data are stored at the left end of the data output from the input FIFO 1301. It is assumed that it is in a state that has been done. As described above, in the data output from the input FIFO 1301, following the above two unit data at the left end, on the right side, the head portion of the data having the stream length SLS of 34 is displayed. Data (first 30 unit data) is stored. In this way, the memory access unit continuously reads the data of the stream length SLS from the data memory 214 and stores it as subsequent data in the input FIFO 1301 continuously.

最初のサイクル（ｃｙｃｌｅ＝０）では、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ０７は、「０」〜「７」であり、幅６４のデータＢＵＦＯＵＴの０番（一番左端）の単位データから７番（左端から数えて８個目）の単位データまでが選択される。つぎのサイクル（ｃｙｃｌｅ＝１）では、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ０７は、「８」〜「１５」であり、幅６４のデータＢＵＦＯＵＴの８番（左端から数えて９個目）の単位データから１５番（左端から数えて１６個目）の単位データまでが選択される。その後同様に進行し、バッファ回路１９１２を利用しながら、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータを隙間なく順番に選択し、読み出していく。 In the first cycle (cycle = 0), the WM control signals SEL00 to SEL07 are “0” to “7”, and the unit data of 0th (leftmost) of the data BUFOUT of width 64 is 7th (from the leftmost). Up to the 8th unit data is selected. In the next cycle (cycle = 1), the WM control signals SEL00 to SEL07 are “8” to “15”, and 15 from the unit data of the eighth data BUFOUT having the width 64 (9th counting from the left end). Up to the unit data of the number (16th counting from the left end) is selected. Thereafter, the process proceeds in the same manner, and while using the buffer circuit 1912, data of a plurality of widths P are sequentially selected from the input FIFO 1301 and read out.

５番目のサイクル（ｃｙｃｌｅ＝４）において、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ０７は、「３２」〜「３９」であり、幅６４のデータＢＵＦＯＵＴの３２番の単位データから３９番の単位データまでが選択される。このとき、ＰＯＰ信号が「１」になる。従って、つぎのサイクルにおいては、ストリーム長ＳＬＳが３４であるデータの終端の２個の単位データとそれに続く先頭の３０個の単位データが、図１９のバッファ回路１９１２に格納される。またそれに続くストリーム長ＳＬＳが３４であるデータの終端の４個の単位データと、ストリーム長ＳＬＳが３４であるデータの先頭の部分のデータ（先頭の２８個の単位データ）とが、入力ＦＩＦＯ１３０１の出力データ部分に並んで格納される。 In the fifth cycle (cycle = 4), the WM control signals SEL00 to SEL07 are “32” to “39”, and the unit data from the 32nd unit data to the 39th unit data of the data BUFOUT having a width of 64 are selected. The At this time, the POP signal becomes “1”. Therefore, in the next cycle, the two unit data at the end of the data whose stream length SLS is 34 and the next 30 unit data at the head are stored in the buffer circuit 1912 of FIG. Further, the following four unit data at the end of the data with the stream length SLS of 34 and the data at the head of the data with the stream length SLS of 34 (the head 28 unit data) are stored in the input FIFO 1301. Stored side by side in the output data part.

６番目のサイクル（ｃｙｃｌｅ＝５）では、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ０７は、「８」〜「１５」であり、幅６４のデータＢＵＦＯＵＴの８番（左端から数えて９個目）の単位データから１５番（左端から数えて１６個目）の単位データまでが選択される。その後同様に進行し、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータを隙間なく順番に選択し、読み出していく。 In the sixth cycle (cycle = 5), the WM control signals SEL00 to SEL07 are “8” to “15”, and from the unit data of the eighth data BUFOUT of width 64 (the ninth counted from the left end). Up to 15th unit data (16th counting from the left end) is selected. Thereafter, the process proceeds in the same manner, and data of a plurality of widths P are sequentially selected and read from the input FIFO 1301 without gaps.

図２１は、ＷＭ制御回路による選択動作の別の一例を示す図である。図２１に示す例では、幅Ｍが３２（ｓｈｏｒｔ）、ストリーム長ＳＬＳが３４（ｓｈｏｒｔ）、データ処理幅Ｐが３２（ｓｈｏｒｔ）である。図２１の表に示すＳＬＳ＿ＭＯＤ、ＯＦＦＳＥＴについては、後程説明する。データ処理幅Ｐが３２であるので、以下の説明においては、図１９に示す３２個のセレクタ８４−１〜８４−３２に供給されるＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１に着目する。 FIG. 21 is a diagram illustrating another example of the selection operation by the WM control circuit. In the example shown in FIG. 21, the width M is 32 (short), the stream length SLS is 34 (short), and the data processing width P is 32 (short). SLS_MOD and OFFSET shown in the table of FIG. 21 will be described later. Since the data processing width P is 32, the following description focuses on the WM control signals SEL00 to SEL31 supplied to the 32 selectors 84-1 to 84-32 shown in FIG.

最初のサイクル（ｃｙｃｌｅ＝０）では、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１は、「０」〜「３１」であり、幅６４のデータＢＵＦＯＵＴの０番（一番左端）の単位データから３１番（一番右端）の単位データまでが選択される。このとき、ＰＯＰ信号が「１」になる。従って、つぎのサイクルにおいては、ストリーム長ＳＬＳが３４であるデータの終端の２個の単位データとそれに続く先頭の３０個の単位データが、図１９のバッファ回路１９１２に格納される。またそれに続くストリーム長ＳＬＳが３４であるデータの終端の４個の単位データと、ストリーム長ＳＬＳが３４であるデータの先頭の部分のデータ（先頭の２８個の単位データ）とが、入力ＦＩＦＯ１３０１の出力データ部分に並んで格納される。 In the first cycle (cycle = 0), the WM control signals SEL00 to SEL31 are “0” to “31”, and the unit data from the 0th (leftmost) unit data of the data BUFOUT having a width of 64 is the 31st (first). Up to the rightmost unit data is selected. At this time, the POP signal becomes “1”. Therefore, in the next cycle, the two unit data at the end of the data whose stream length SLS is 34 and the next 30 unit data at the head are stored in the buffer circuit 1912 of FIG. Further, the following four unit data at the end of the data with the stream length SLS of 34 and the data at the head of the data with the stream length SLS of 34 (the head 28 unit data) are stored in the input FIFO 1301. Stored side by side in the output data part.

つぎのサイクル（ｃｙｃｌｅ＝１）でも、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１は、「０」〜「３１」であり、幅６４のデータＢＵＦＯＵＴの０番（一番左端）の単位データから３１番（一番右端）の単位データまでが選択される。このとき、ＰＯＰ信号が「１」になる。従って、つぎのサイクルにおいては、ストリーム長ＳＬＳが３４であるデータの終端の４個の単位データとそれに続く先頭の２８個の単位データが、図１９のバッファ回路１９１２に格納される。またそれに続くストリーム長ＳＬＳが３４であるデータの終端の６個の単位データと、ストリーム長ＳＬＳが３４であるデータの先頭の部分のデータ（先頭の２６個の単位データ）とが、入力ＦＩＦＯ１３０１の出力データ部分に並んで格納される。以降同様に進行し、バッファ回路１９１２を利用しながら、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータを隙間なく順番に選択し、読み出していく。 Also in the next cycle (cycle = 1), the WM control signals SEL00 to SEL31 are “0” to “31”, and the unit data from the 0th (leftmost) unit data of the data BUFOUT having a width of 64 is the 31st (first). Up to the rightmost unit data is selected. At this time, the POP signal becomes “1”. Accordingly, in the next cycle, the four unit data at the end of the data having the stream length SLS of 34 and the following 28 unit data at the head are stored in the buffer circuit 1912 of FIG. Further, the following 6 unit data at the end of the data having the stream length SLS of 34 and the data at the head of the data having the stream length SLS of 34 (the 26 unit data at the head) are input to the input FIFO 1301. Stored side by side in the output data part. Thereafter, the process proceeds in the same manner. While using the buffer circuit 1912, data of a plurality of widths P are sequentially selected and read from the input FIFO 1301 without any gaps.

図２２は、ＷＭ制御回路による選択動作の更に別の一例を示す図である。図２２に示す例では、幅Ｍが３２（ｓｈｏｒｔ）、ストリーム長ＳＬＳが１２（ｓｈｏｒｔ）、データ処理幅Ｐが８（ｓｈｏｒｔ）である。図２２の表に示すＳＬＳ＿ＭＯＤ、ＯＦＦＳＥＴについては、後程説明する。データ処理幅Ｐが８であるので、以下の説明においては、図１９に示す左端の８個のセレクタ８４−１〜８４−８に供給されるＷＭ制御信号ＳＥＬ００〜ＳＥＬ０７のみに着目する。 FIG. 22 is a diagram showing still another example of the selection operation by the WM control circuit. In the example shown in FIG. 22, the width M is 32 (short), the stream length SLS is 12 (short), and the data processing width P is 8 (short). The SLS_MOD and OFFSET shown in the table of FIG. 22 will be described later. Since the data processing width P is 8, only the WM control signals SEL00 to SEL07 supplied to the eight leftmost selectors 84-1 to 84-8 shown in FIG.

まず、ストリーム長ＳＬＳが１２であるデータの１２個の単位データが図１９のバッファ回路１９１２の左端側に詰めて格納された状態であるとする。 First, it is assumed that twelve unit data of data having a stream length SLS of 12 are packed and stored on the left end side of the buffer circuit 1912 in FIG.

最初のサイクル（ｃｙｃｌｅ＝０）では、ＷＭ制御信号ＳＥＬ００からＳＥＬ０７は、「０」から「７」のいずれかであり、幅６４のデータＢＵＦＯＵＴの０番（一番左端）の単位データから７番（左端から数えて８個目）の単位データまでが選択される。つぎのサイクル（ｃｙｃｌｅ＝１）では、ＷＭ制御信号ＳＥＬ００からＳＥＬ０７は、「８，９，１０，１１，０，１，２，３」である。したがって、幅６４のデータＢＵＦＯＵＴの８番（左端から数えて９個目）の単位データから１１番（左端から数えて１２個目）の単位データまでと、続いて０番（一番左端）の単位データから３番（左端から数えて４個目）の単位データまでが選択される。その後同様に進行し、バッファ回路１９１２を利用しながら、入力ＦＩＦＯ１３０１から複数個の幅Ｐのデータを隙間なく順番に選択し、読み出していく。この読み出し動作では、ストリーム長ＳＬＳが幅Ｍよりも短いので、ＰＯＰ信号が「１」になることはない。 In the first cycle (cycle = 0), the WM control signals SEL00 to SEL07 are any one of “0” to “7”, and the number 7 from the 0th (leftmost) unit data of the data BUFOUT having a width of 64. Up to the unit data (eighth counting from the left end) are selected. In the next cycle (cycle = 1), the WM control signals SEL00 to SEL07 are “8, 9, 10, 11, 0, 1, 2, 3”. Therefore, the unit data from the 8th (9th counted from the left end) to the 11th (12th counted from the left end) unit data of the data BUFOUT 64 of width 64, and then the 0th (leftmost end) unit data. From the unit data to the third unit data (fourth from the left end) is selected. Thereafter, the process proceeds in the same manner, and while using the buffer circuit 1912, data of a plurality of widths P are sequentially selected from the input FIFO 1301 and read out. In this read operation, since the stream length SLS is shorter than the width M, the POP signal does not become “1”.

図２３は、ＷＭ制御回路の構成の一例を示す図である。図２３に示すＷＭ制御回路１３１２は、ＳＬＳ＿ＭＯＤ回路２３０１と、ＳＬＳレジスタ２３０２と、ＳＥＬ＿ＷＲＡＰ回路２３０３−１〜２３０３−３２と、を有する。また、ＷＭ制御回路１３１２は、ＯＦＦＳＥＴレジスタ２３０４と、ＡＤＤ＿ＯＦＦＳＥＴ回路２３０５と、Ｐ減算回路２３０６と、セレクタ回路２３０７と、を有する。 FIG. 23 is a diagram illustrating an example of the configuration of the WM control circuit. The WM control circuit 1312 illustrated in FIG. 23 includes an SLS_MOD circuit 2301, an SLS register 2302, and SEL_WRAP circuits 2303-1 to 2303-32. The WM control circuit 1312 includes an OFFSET register 2304, an ADD_OFFSET circuit 2305, a P subtraction circuit 2306, and a selector circuit 2307.

図２４は、ＳＥＬ＿ＷＲＡＰ回路の構成の一例を示す図である。図２４に示すＳＥＬ＿ＷＲＡＰ回路２３０３は、ＳＬＳ判定回路２４０１と、ＳＬＳ減算回路２４０２と、Ｎ加算回路２４０３と、セレクタ回路２４０４と、比較回路２４０５と、１加算回路２４０６と、セレクタ回路２４０７と、を有する。ＳＥＬ＿ＷＲＡＰ回路２３０３−１の場合、印加されるＳＬＳ＿ＭＯＤ信号は、ＳＬＳ＿ＭＯＤ回路２３０１の格納する値に等しい。それ以降のＳＥＬ＿ＷＲＡＰ回路２３０３−２〜２３０３−３２の場合、印加されるＳＬＳ＿ＭＯＤ信号は、前段のＳＥＬ＿ＷＲＡＰ回路２３０３の出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号に等しい。 FIG. 24 is a diagram illustrating an example of the configuration of the SEL_WRAP circuit. The SEL_WRAP circuit 2303 illustrated in FIG. 24 includes an SLS determination circuit 2401, an SLS subtraction circuit 2402, an N addition circuit 2403, a selector circuit 2404, a comparison circuit 2405, a 1 addition circuit 2406, and a selector circuit 2407. . In the case of the SEL_WRAP circuit 2303-1, the applied SLS_MOD signal is equal to the value stored in the SLS_MOD circuit 2301. In the subsequent SEL_WRAP circuits 2303-2 to 2303-32, the applied SLS_MOD signal is equal to the SLS_MOD_NEXT signal output from the preceding SEL_WRAP circuit 2303.

図２５は、ＡＤＤ＿ＯＦＦＳＥＴ回路の構成の一例を示す図である。図２５に示すＡＤＤ＿ＯＦＦＳＥＴ回路２３０５は、加算回路２５０１と、ＯＦＦＳＥＴレジスタ２５０２と、ＯＦＦＳＥＴレジスタ２５０３と、セレクタ回路２５０４と、セレクタ回路２５０５と、を有する。 FIG. 25 is a diagram illustrating an example of the configuration of the ADD_OFFSET circuit. An ADD_OFFSET circuit 2305 illustrated in FIG. 25 includes an adder circuit 2501, an OFFSET register 2502, an OFFSET register 2503, a selector circuit 2504, and a selector circuit 2505.

ここで、図２０および図２３から図２５を用いて、ＷＭ制御回路１３１２の動作の一例を説明する。初期状態においては、ＳＬＳ＿ＭＯＤ回路２３０１の格納するＳＬＳ＿ＭＯＤ信号は「０」である。またＯＦＦＳＥＴレジスタ２３０４の格納するＯＦＦＳＥＴ信号は「０」である。 Here, an example of the operation of the WM control circuit 1312 will be described with reference to FIGS. 20 and 23 to 25. In the initial state, the SLS_MOD signal stored in the SLS_MOD circuit 2301 is “0”. The OFFSET signal stored in the OFFSET register 2304 is “0”.

図２０の例において、ＳＬＳ＞Ｍであることにより図２４に示すセレクタ回路２４０４は、ＯＦＦＳＥＴ信号の値にＮを加算した値を選択する。この値Ｎは、何番目のＳＥＬ＿ＷＲＡＰ回路２３０３であるかを示す値であり、「０」を開始番号として、０番のＳＥＬ＿ＷＲＡＰ回路２３０３−１の場合には「０」である。従って、ＳＥＬ＿ＷＲＡＰ回路２３０３−１の場合、ＯＦＦＳＥＴ信号の値に「０」を加算した「０」が、出力のＷＭ制御信号ＳＥＬの値となる。また「０」であるＳＬＳ＿ＭＯＤ信号に１加算回路２４０６により「１」を加算した値である「１」が、ＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号として出力される。 In the example of FIG. 20, when SLS> M, the selector circuit 2404 shown in FIG. 24 selects a value obtained by adding N to the value of the OFFSET signal. This value N is a value indicating what number the SEL_WRAP circuit 2303 is, and is “0” in the case of the 0th SEL_WRAP circuit 2303-1 with “0” as the start number. Therefore, in the case of the SEL_WRAP circuit 2303-1, “0” obtained by adding “0” to the value of the OFFSET signal is the value of the output WM control signal SEL. Also, “1”, which is a value obtained by adding “1” to the SLS_MOD signal that is “0” by the 1 addition circuit 2406, is output as the SLS_MOD_NEXT signal.

つぎのＳＥＬ＿ＷＲＡＰ回路２３０３−２の場合、ＯＦＦＳＥＴ信号の値に「１」を加算した「１」が、出力のＷＭ制御信号ＳＥＬの値となる。またこのＳＥＬ＿ＷＲＡＰ回路２３０３−２の場合、印加されるＳＬＳ＿ＭＯＤ信号は前段からの値「１」のＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号であるので、出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号の値は「２」となる。以下同様にして、ＳＥＬ＿ＷＲＡＰ回路２３０３−ｎの場合、出力するＷＭ制御信号ＳＥＬは「ｎ−１」であり、出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号は「ｎ」となる。ここでのｎは自然数である。これにより、図２０の０番のサイクルに示すようなＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１が生成される。 In the case of the next SEL_WRAP circuit 2303-2, “1” obtained by adding “1” to the value of the OFFSET signal is the value of the output WM control signal SEL. In the case of the SEL_WRAP circuit 2303-2, since the SLS_MOD signal to be applied is the SLS_MOD_NEXT signal having the value “1” from the previous stage, the value of the SLS_MOD_NEXT signal to be output is “2”. Similarly, in the case of the SEL_WRAP circuit 2303-n, the WM control signal SEL to be output is “n−1”, and the SLS_MOD_NEXT signal to be output is “n”. Here, n is a natural number. As a result, WM control signals SEL00 to SEL31 as shown in cycle 0 of FIG. 20 are generated.

セレクタ回路２３０７は、ＳＥＬ＿ＷＲＡＰ回路２３０３−１〜２３０３−３２のそれぞれが出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴを受け取る。セレクタ回路２３０７は、さらに、データ処理幅Ｐから１減算した値、この例では「７」を選択制御信号として受け取る。セレクタ回路２３０７は、０番を開始番号とした場合の７番（即ち８番目）のＳＥＬ＿ＷＲＡＰ回路２３０３−８が出力する値「８」であるＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号を選択して、ＳＬＳ＿ＭＯＤ回路２３０１に供給する。これにより、つぎのサイクルにおいて、ＳＬＳ＿ＭＯＤ回路２３０１に格納されているＳＬＳ＿ＭＯＤ信号は「８」となる。 The selector circuit 2307 receives SLS_MOD_NEXT output from each of the SEL_WRAP circuits 2303-1 to 2303-32. The selector circuit 2307 further receives a value obtained by subtracting 1 from the data processing width P, in this example, “7” as a selection control signal. The selector circuit 2307 selects the SLS_MOD_NEXT signal that is the value “8” output from the 7th (that is, 8th) SEL_WRAP circuit 2303-8 when the 0th is the start number, and supplies the SLS_MOD_230 signal to the SLS_MOD circuit 2301. As a result, in the next cycle, the SLS_MOD signal stored in the SLS_MOD circuit 2301 becomes “8”.

図２５に示すＡＤＤ＿ＯＦＦＳＥＴ回路２３０５において、ＳＬＳ＞Ｍであることにより、セレクタ回路２５０５は、データ処理幅ＰにＯＦＦＳＥＴ信号の値を加算した値を選択し、ＯＦＦＳＥＴ＿ＮＥＸＴ信号として出力する。このＯＦＦＳＥＴ＿ＮＥＸＴ信号が、図２３に示すＯＦＦＳＥＴレジスタ２３０４に格納され、つぎのサイクルでのＯＦＦＳＥＴ信号となる。従って、ＯＦＦＳＥＴ信号の値は、１サイクルごとにＰずつ増加していく。ただし、ＯＦＦＳＥＴ信号の値に加算回路２５０１によりＰを加算した値が「３２」となるサイクルにおいては、ＯＦＦＳＥＴレジスタ２５０２に格納された値が１となり、ＰＯＰ＿ＮＥＸＴ信号が「１」となる。 In SADD> M in the ADD_OFFSET circuit 2305 shown in FIG. 25, the selector circuit 2505 selects a value obtained by adding the value of the OFFSET signal to the data processing width P, and outputs it as an OFFSET_NEXT signal. This OFFSET_NEXT signal is stored in the OFFSET register 2304 shown in FIG. 23, and becomes the OFFSET signal in the next cycle. Therefore, the value of the OFFSET signal increases by P every cycle. However, in the cycle in which the value obtained by adding P to the value of the OFFSET signal by the addition circuit 2501 is “32”, the value stored in the OFFSET register 2502 is 1, and the POP_NEXT signal is “1”.

このＰＯＰ＿ＮＥＸＴ信号が、ＷＭ制御回路１３１２からＰＯＰ信号として出力される。またＯＦＦＳＥＴ信号の値に加算回路２５０１によりＰを加算した値の下位５ビットのみをＯＦＦＳＥＴレジスタ２５０３に格納することにより、ＯＦＦＳＥＴ＿ＮＥＸＴ信号の値は、「０」から「３１」の範囲の値のみをとることになる。即ち、ＯＦＦＳＥＴレジスタ２３０４に格納されるＯＦＦＳＥＴ値は、「０」から「３１」の範囲の値を繰り返すことになる。このようにして、図２０の動作例に示すような、ＯＦＦＳＥＴ信号およびＰＯＰ信号が生成される。なお図２０では、ＯＦＦＳＥＴの値は、６ビット目も含めた値を示してあるため、値「３２」の場合が示されている。 This POP_NEXT signal is output from the WM control circuit 1312 as a POP signal. Further, only the lower 5 bits of the value obtained by adding P to the value of the OFFSET signal by the addition circuit 2501 are stored in the OFFSET register 2503, so that the value of the OFFSET_NEXT signal takes only a value in the range of “0” to “31”. It will be. That is, the OFFSET value stored in the OFFSET register 2304 repeats a value in the range of “0” to “31”. In this way, an OFFSET signal and a POP signal are generated as shown in the operation example of FIG. In FIG. 20, since the OFFSET value includes a value including the sixth bit, the value “32” is shown.

図２２〜図２５を用いて、ＷＭ制御回路１３１２の動作の別の一例を説明する。初期状態においては、ＳＬＳ＿ＭＯＤ回路２３０１の格納するＳＬＳ＿ＭＯＤ信号は「０」である。またＯＦＦＳＥＴレジスタ２３０４の格納するＯＦＦＳＥＴ信号は「０」である。 Another example of the operation of the WM control circuit 1312 will be described with reference to FIGS. In the initial state, the SLS_MOD signal stored in the SLS_MOD circuit 2301 is “0”. The OFFSET signal stored in the OFFSET register 2304 is “0”.

図２２の例において、ＳＬＳ≦Ｍであることにより図２４に示すセレクタ回路２４０４はＳＬＳ＿ＭＯＤ信号を選択するので、ＳＥＬ＿ＷＲＡＰ回路２３０３−１の場合、出力のＷＭ制御信号ＳＥＬは「０」である。また「０」であるＳＬＳ＿ＭＯＤ信号に「１」を加算した値である「１」が、ＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号として出力される。つぎのＳＥＬ＿ＷＲＡＰ回路２３０３−２の場合、印加されるＳＬＳ＿ＭＯＤ信号は前段からの値「１」のＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号であるので、出力のＷＭ制御信号ＳＥＬは「１」であり、且つ、出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号の値は「２」となる。以下同様にして、ＳＥＬ＿ＷＲＡＰ回路２３０３−ｎの場合、出力するＷＭ制御信号ＳＥＬは「ｎ−１」であり、出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号は「ｎ」となる。ここでのｎはＳＬＳより小さい自然数である。 In the example of FIG. 22, since SLS ≦ M, the selector circuit 2404 shown in FIG. 24 selects the SLS_MOD signal. Therefore, in the case of the SEL_WRAP circuit 2303-1, the output WM control signal SEL is “0”. Also, “1”, which is a value obtained by adding “1” to the SLS_MOD signal that is “0”, is output as the SLS_MOD_NEXT signal. In the case of the next SEL_WRAP circuit 2303-2, since the SLS_MOD signal to be applied is the SLS_MOD_NEXT signal with the value “1” from the previous stage, the output WM control signal SEL is “1” and the SLS_MOD_NEXT signal to be output The value is “2”. Similarly, in the case of the SEL_WRAP circuit 2303-n, the WM control signal SEL to be output is “n−1”, and the SLS_MOD_NEXT signal to be output is “n”. Here, n is a natural number smaller than SLS.

図２２の例ではストリーム長ＳＬＳが１２であるので、ＳＥＬ＿ＷＲＡＰ回路２３０３−１２の場合、図２４に示す比較回路２４０５の出力が１となる。そして、「０」がセレクタ回路２４０７により選択されて、出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号の値が「０」となる。従って、図２２の０番のサイクルに示すように、ＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１は、「０」から「１１」の間を繰り返す信号となる。 In the example of FIG. 22, since the stream length SLS is 12, in the case of the SEL_WRAP circuit 2303-12, the output of the comparison circuit 2405 shown in FIG. Then, “0” is selected by the selector circuit 2407, and the value of the SLS_MOD_NEXT signal to be output becomes “0”. Therefore, as shown in cycle 0 of FIG. 22, the WM control signals SEL00 to SEL31 are signals that repeat from “0” to “11”.

セレクタ回路２３０７は、ＳＥＬ＿ＷＲＡＰ回路２３０３−１〜２３０３−３２のそれぞれが出力するＳＬＳ＿ＭＯＤ＿ＮＥＸＴを受け取る。セレクタ回路２３０７は、さらに、データ処理幅Ｐから１減算した値を選択制御信号として受け取る。この例では、セレクタ回路２３０７は、「７」を選択制御信号として受け取る。セレクタ回路２３０７は、０番を開始番号とした場合の７番（即ち８番目）のＳＥＬ＿ＷＲＡＰ回路２３０３−８が出力する値「８」であるＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号を選択して、ＳＬＳ＿ＭＯＤ回路２３０１に供給する。これにより、つぎのサイクルにおいて、ＳＬＳ＿ＭＯＤ回路２３０１に格納されているＳＬＳ＿ＭＯＤ信号は「８」となる。 The selector circuit 2307 receives SLS_MOD_NEXT output from each of the SEL_WRAP circuits 2303-1 to 2303-32. The selector circuit 2307 further receives a value obtained by subtracting 1 from the data processing width P as a selection control signal. In this example, the selector circuit 2307 receives “7” as a selection control signal. The selector circuit 2307 selects the SLS_MOD_NEXT signal that is the value “8” output from the 7th (that is, 8th) SEL_WRAP circuit 2303-8 when the 0th is the start number, and supplies the SLS_MOD_230 signal to the SLS_MOD circuit 2301. As a result, in the next cycle, the SLS_MOD signal stored in the SLS_MOD circuit 2301 becomes “8”.

図２５に示すＡＤＤ＿ＯＦＦＳＥＴ回路２３０５において、ＳＬＳ≦Ｍであることにより、セレクタ回路２５０４およびセレクタ回路２５０５は、値「０」を選択して、値「０」のＰＯＰ＿ＮＥＸＴ信号および値「０」のＯＦＦＳＥＴ＿ＮＥＸＴ信号を出力する。これにより、図２２の動作例に示すように、ＯＦＦＳＥＴ信号およびＰＯＰ信号は、両方とも常に「０」となる。 In the ADD_OFFSET circuit 2305 shown in FIG. 25, since SLS ≦ M, the selector circuit 2504 and the selector circuit 2505 select the value “0” and the POP_NEXT signal having the value “0” and the OFFSET_NEXT signal having the value “0”. Is output. Thereby, as shown in the operation example of FIG. 22, both the OFFSET signal and the POP signal are always “0”.

図２６は、ＳＬＳ≦Ｍの場合の各信号の生成アルゴリズムを示す図である。ＳＬＳ≦Ｍの場合には、図２６に示す生成アルゴリズムにより、ＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号、ＷＭ制御信号ＳＥＬ、およびＰＯＰ信号が生成される。 FIG. 26 is a diagram illustrating a generation algorithm of each signal when SLS ≦ M. In the case of SLS ≦ M, the SLS_MOD_NEXT signal, the WM control signal SEL, and the POP signal are generated by the generation algorithm shown in FIG.

図２７は、ＳＬＳ＞Ｍの場合の各信号の生成アルゴリズムを示す図である。ＳＬＳ＞Ｍの場合には、図２７に示す生成アルゴリズムにより、ＰＯＰ信号、ＯＦＦＳＥＴ信号、およびＷＭ制御信号ＳＥＬが生成される。 FIG. 27 is a diagram illustrating a generation algorithm of each signal when SLS> M. In the case of SLS> M, the POP signal, the OFFSET signal, and the WM control signal SEL are generated by the generation algorithm shown in FIG.

図２８は、ＷＭ制御回路の構成の別の一例を示す図である。図２８に示すＷＭ制御回路１３１２は、ＳＬＳ判定回路２８０１と、セレクタ回路２８０２と、ＳＬＳ＿ＭＯＤ回路２８０３と、セレクタ回路２８０４と、１加算回路２８０５と、ＳＬＳ＿ＭＯＤテーブル（ＳＬＳ＿ＭＯＤ＿ＴＢＬ）２８０６と、シフタ回路（ｓｈｉｆｔｅｒ３８４）２８０７と、を有する。ＷＭ制御回路１３１２は、さらに、ＯＦＦＳＥＴレジスタ２３０４と、ＡＤＤ＿ＯＦＦＳＥＴ回路２３０５と、Ｐ減算回路２３０６と、セレクタ回路２３０７と、を有する。図２８において、図２３と同一又は対応する構成要素は同一又は対応する番号で参照し、その説明は適宜省略する。 FIG. 28 is a diagram illustrating another example of the configuration of the WM control circuit. The WM control circuit 1312 shown in FIG. 28 includes an SLS determination circuit 2801, a selector circuit 2802, an SLS_MOD circuit 2803, a selector circuit 2804, a 1-adder circuit 2805, an SLS_MOD table (SLS_MOD_TBL) 2806, and a shifter circuit (shifter 384). 2807. The WM control circuit 1312 further includes an OFFSET register 2304, an ADD_OFFSET circuit 2305, a P subtraction circuit 2306, and a selector circuit 2307. 28, the same or corresponding elements as those of FIG. 23 are referred to by the same or corresponding numerals, and a description thereof will be omitted as appropriate.

図２９は、ＳＬＳ＿ＭＯＤテーブルのデータの一例を示す図である。図２９に示すように、ＳＬＳ＿ＭＯＤテーブル２８０６には、１番から３３番までの３３個の行に対して、６４個の位置データが格納されている。例えば、値が「０」の位置データは、図１９の結合回路１９１３の出力するデータＢＵＦＯＵＴの６４個の単位データのうち、０番（一番左端）の単位データを選択する。同様に、値がｎ（ｎ：０〜６３の整数）の位置データは、図１９の結合回路１９１３の出力するデータＢＵＦＯＵＴの６４個の単位データのうち、ｎ番の単位データを選択する。このように、ＳＬＳ＿ＭＯＤテーブル２８０６は、幅２Ｍのデータから選択する各単位データの選択位置を示す位置データを格納したテーブルである。 FIG. 29 is a diagram illustrating an example of data in the SLS_MOD table. As shown in FIG. 29, the SLS_MOD table 2806 stores 64 pieces of position data for 33 rows from No. 1 to No. 33. For example, as the position data having the value “0”, the unit data of the 0th (leftmost) unit is selected from the 64 unit data of the data BUFOUT output from the coupling circuit 1913 in FIG. Similarly, as the position data having a value n (n: integer from 0 to 63), the nth unit data is selected from the 64 unit data of the data BUFOUT output from the coupling circuit 1913 in FIG. As described above, the SLS_MOD table 2806 is a table that stores position data indicating the selection position of each unit data selected from data having a width of 2M.

また、図２８に示すシフタ回路２８０７は、ＳＬＳ＿ＭＯＤテーブル２８０６から位置データを受け取り、受け取った位置データをシフトし、シフトした位置データをマルチプレクサ１３１１にＷＭ制御信号ＳＥＬ００〜ＳＥＬ３１として供給する。この構成により、データ選択部１９０１のマルチプレクサ１３１１により、適切な単位データを選択することができる。 The shifter circuit 2807 shown in FIG. 28 receives position data from the SLS_MOD table 2806, shifts the received position data, and supplies the shifted position data to the multiplexer 1311 as the WM control signals SEL00 to SEL31. With this configuration, appropriate unit data can be selected by the multiplexer 1311 of the data selection unit 1901.

図２８において、ＳＬＳ判定回路２８０１は、ストリーム長ＳＬＳがＭ以下であるか否かを判定する。ＳＬＳ＞Ｍである場合、ＳＬＳ判定回路２８０１の出力は「０」となり、セレクタ回路２８０２は値「３３」を選択して出力する。従って、この場合、ＳＬＳ＿ＭＯＤテーブル２８０６の３３番の行が選択され、図２９の３３番の行に示すように「０」から「６３」の６４個の位置データが出力される。このときセレクタ回路２８０４は、ＯＦＦＳＥＴレジスタ２３０４に格納されるＯＦＦＳＥＴ信号の値を選択し、１加算回路２８０５が、セレクタ回路２８０４により選択された値に「１」を加算し、加算後の値をシフタ回路２８０７に供給する。シフタ回路２８０７は、ＳＬＳ＿ＭＯＤテーブル２８０６から供給された６４個の位置データを、ＯＦＦＳＥＴ信号の値に応じてシフトし、シフト後の６４個の位置データをＷＭ制御信号ＳＥＬとして出力する。これにより、上述したＷＭ制御信号ＳＥＬが生成されることになる。 In FIG. 28, the SLS determination circuit 2801 determines whether or not the stream length SLS is M or less. When SLS> M, the output of the SLS determination circuit 2801 is “0”, and the selector circuit 2802 selects and outputs the value “33”. Therefore, in this case, the 33rd row of the SLS_MOD table 2806 is selected, and 64 pieces of position data “0” to “63” are output as shown in the 33rd row of FIG. At this time, the selector circuit 2804 selects the value of the OFFSET signal stored in the OFFSET register 2304, the 1 addition circuit 2805 adds “1” to the value selected by the selector circuit 2804, and the value after the addition is shifted. This is supplied to the circuit 2807. The shifter circuit 2807 shifts the 64 position data supplied from the SLS_MOD table 2806 according to the value of the OFFSET signal, and outputs the shifted 64 position data as the WM control signal SEL. As a result, the WM control signal SEL described above is generated.

ＳＬＳ≦Ｍである場合、ＳＬＳ判定回路２８０１の出力は「１」となり、セレクタ回路２８０２は、ストリーム長ＳＬＳの値を選択して出力する。この結果、例えば、図２２に示すようにストリーム長ＳＬＳが１２である場合、ＳＬＳ＿ＭＯＤテーブル２８０６の１２番の行が選択される。即ち、図２９の１２番の行に示すように「０」から「１１」の値を繰り返す６４個の位置データが、ＳＬＳ＿ＭＯＤテーブル２８０６から出力される。このときセレクタ回路２８０４は、ＳＬＳ＿ＭＯＤ回路２８０３に格納されるＳＬＳ＿ＭＯＤ信号の値を選択し、１加算回路２８０５が、セレクタ回路２８０４により選択された値に「１」を加算し、加算後の値をシフタ回路２８０７に供給する。シフタ回路２８０７は、ＳＬＳ＿ＭＯＤテーブル２８０６から供給された６４個の位置データを、ＳＬＳ＿ＭＯＤ信号の値に応じてシフトし、シフト後の６４個の位置データをＷＭ制御信号ＳＥＬとして出力する。これにより、図１３に示すようなＷＭ制御信号ＳＥＬが生成されることになる。 When SLS ≦ M, the output of the SLS determination circuit 2801 is “1”, and the selector circuit 2802 selects and outputs the value of the stream length SLS. As a result, for example, as shown in FIG. 22, when the stream length SLS is 12, the 12th row of the SLS_MOD table 2806 is selected. That is, as shown in the twelfth line in FIG. 29, 64 pieces of position data that repeat values “0” to “11” are output from the SLS_MOD table 2806. At this time, the selector circuit 2804 selects the value of the SLS_MOD signal stored in the SLS_MOD circuit 2803, the 1 addition circuit 2805 adds “1” to the value selected by the selector circuit 2804, and the value after the addition is shifted. This is supplied to the circuit 2807. The shifter circuit 2807 shifts the 64 position data supplied from the SLS_MOD table 2806 according to the value of the SLS_MOD signal, and outputs the shifted 64 position data as the WM control signal SEL. As a result, a WM control signal SEL as shown in FIG. 13 is generated.

図２３のＷＭ制御回路１３１２では、ＳＥＬ＿ＷＲＡＰ回路２３０３−１〜２３０３−３２が３２段に縦続接続されている。従って、ＳＬＳ＿ＭＯＤ＿ＮＥＸＴ信号が各段を伝搬していくのに時間かかり、データ供給回路３０１による選択動作を十分に高速に実行できない可能性がある。それに対し図２８に示すＷＭ制御回路１３１２では、シフタ回路２８０７による少数段の遅延が発生するのみであり、データ供給回路３０１による選択動作を十分に高速に実行することができる。 In the WM control circuit 1312 of FIG. 23, the SEL_WRAP circuits 2303-1 to 2303-32 are cascaded in 32 stages. Therefore, it takes time for the SLS_MOD_NEXT signal to propagate through each stage, and there is a possibility that the selection operation by the data supply circuit 301 cannot be performed sufficiently fast. On the other hand, in the WM control circuit 1312 shown in FIG. 28, only a few stages of delay are generated by the shifter circuit 2807, and the selection operation by the data supply circuit 301 can be executed sufficiently fast.

図３０は、データ処理装置の構成の別の一例を示す図である。図３０において、図３と同一又は対応する構成要素は同一又は対応する番号で参照し、その説明は適宜省略する。 FIG. 30 is a diagram illustrating another example of the configuration of the data processing device. In FIG. 30, the same or corresponding elements as those of FIG. 3 are referred to by the same or corresponding numerals, and a description thereof will be omitted as appropriate.

図３０のデータ処理装置は、複数のデータ供給回路３０１−１〜３０１−ｎと、演算データパス３０２と、データストア回路３０３と、を有する。データ供給回路３０１−１〜３０１−ｎは、データメモリ２１４に格納される複数ｎ個のソースデータ（オペランド）をそれぞれ読み出し、演算データパス３０２に供給する。図４に示す例のように２つのソースデータｓｒｃ０とｓｒｃ１とが演算対象となる場合、データ供給回路３０１−１がソースデータｓｒｃ０を読み出し、データ供給回路３０１−２がソースデータｓｒｃ１を読み出してよい。データ供給回路３０１−１〜３０１−ｎの各々の構成および動作は、前述のデータメモリ２１４の構成および動作と基本的に同一であってよい。図３０のデータ処理装置では、複数ｎ個のソースデータ（オペランド）に対応することが可能となる。 30 includes a plurality of data supply circuits 301-1 to 301-n, an operation data path 302, and a data store circuit 303. The data supply circuits 301-1 to 301-n respectively read a plurality of n source data (operands) stored in the data memory 214 and supply them to the operation data path 302. When two source data src0 and src1 are to be operated as in the example shown in FIG. 4, the data supply circuit 301-1 may read the source data src0, and the data supply circuit 301-2 may read the source data src1. . The configuration and operation of each of the data supply circuits 301-1 to 301-n may be basically the same as the configuration and operation of the data memory 214 described above. In the data processing apparatus of FIG. 30, it is possible to correspond to a plurality of n source data (operands).

以上説明したように、データ処理装置１００は、入力行列と該入力行列の複数の要素の並びを入れ替えた変換行列とのうちの演算対象となるいずれかの行列を出力して演算を行う。これにより、行列変換と行列演算とを１命令で行うことができ、データロード数の低減を図ることができる。したがって、行列のデータのロードから行列演算までに要する時間の短縮化を図ることができる。 As described above, the data processing apparatus 100 performs an operation by outputting one of the matrices to be calculated from the input matrix and the transformation matrix obtained by replacing the arrangement of a plurality of elements of the input matrix. Thereby, matrix conversion and matrix operation can be performed with one instruction, and the number of data loads can be reduced. Therefore, it is possible to shorten the time required from loading of matrix data to matrix calculation.

また、データ処理装置１００は、第１行列と、第２行列と、のうち、いずれかを出力する選択部と、演算命令によって、第１行列と、第２行列と、のうち、いずれかを選択部に出力させる制御を行う選択制御部と、を有する。これにより、簡単な制御処理によって入力行列を演算対象の行列に変換することができる。 In addition, the data processing apparatus 100 may select one of the first matrix and the second matrix by a selection unit that outputs one of the first matrix and the second matrix, and an arithmetic instruction. A selection control unit that performs control to be output to the selection unit. As a result, the input matrix can be converted into a matrix to be calculated by a simple control process.

また、データ処理装置１００は、変換後の行の要素の数および列の要素の数に対応した選択部に出力させる複数の要素の並びを指示する制御信号を記憶するインデックステーブルによって演算対象の行列を指示する。これにより、簡単な処理によって入力行列を演算対象の行列に変換することができる。 In addition, the data processing apparatus 100 uses the index table that stores a control signal instructing the arrangement of a plurality of elements to be output to the selection unit corresponding to the number of row elements and the number of column elements after conversion. Instruct. As a result, the input matrix can be converted into a matrix to be calculated by simple processing.

また、データ処理装置１００は、選択部がバッファに格納された所定数の要素を入力可能な所定数の選択回路の各々が、バッファに格納された入力行列に含まれる複数の要素のうちのいずれかを選択して出力する。これにより、入力行列の各要素の数が任意であっても演算対象の行列に変換することができる。 Further, in the data processing device 100, each of a predetermined number of selection circuits to which the selection unit can input a predetermined number of elements stored in the buffer is selected from among a plurality of elements included in the input matrix stored in the buffer. Select or output. Thereby, even if the number of each element of an input matrix is arbitrary, it can convert into the matrix of calculation object.

また、データ処理装置１００は、バッファに格納された要素のうち第１行列の要素の位置情報と、第１行列の要素の入れ替え後の位置を指示する第１制御信号と、に基づき、バッファに格納された要素から入れ替え後の各位置を指示する第２制御信号を生成する。これにより、マルチプレクサの回路規模を縮小することができる。半導体集積回路であるデータ処理装置の面積を小さくすることができると、一枚の半導体ウェーハ上から得られる半導体集積回路の個数を多くすることができるため、半導体集積回路の単価を下げることができる。 In addition, the data processing device 100 stores the position information of the elements of the first matrix among the elements stored in the buffer, and the first control signal that indicates the position after replacement of the elements of the first matrix. A second control signal indicating each position after replacement is generated from the stored elements. As a result, the circuit scale of the multiplexer can be reduced. If the area of the data processing device which is a semiconductor integrated circuit can be reduced, the number of semiconductor integrated circuits obtained from one semiconductor wafer can be increased, so that the unit price of the semiconductor integrated circuit can be reduced. .

また、第１行列と第２行列とは、複数の行列であってもよい。これにより、複数の行列に対して一括して変換行列を得ることができ、データロードから演算処理までに要する時間の短縮化を図ることができる。 Further, the first matrix and the second matrix may be a plurality of matrices. Thereby, a conversion matrix can be obtained for a plurality of matrices at once, and the time required from the data load to the arithmetic processing can be shortened.

なお、本実施の形態で説明したデータ処理方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本プログラムは、磁気ディスク、光ディスク、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）フラッシュメモリなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。 The data processing method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. The program is recorded on a computer-readable recording medium such as a magnetic disk, an optical disk, or a USB (Universal Serial Bus) flash memory, and is executed by being read from the recording medium by the computer.

また、上述したように、本実施の形態で説明したデータ処理装置１００は、スタンダードセルやストラクチャードＡＳＩＣなどのＡＳＩＣやＦＰＧＡなどのＰＬＤによっても実現することができる。具体的には、例えば、上述したデータ処理装置１００の機能をＨＤＬ記述によって機能定義し、そのＨＤＬ記述を論理合成してＡＳＩＣやＰＬＤに与えることにより、データ処理装置１００を製造することができる。 Further, as described above, the data processing apparatus 100 described in the present embodiment can also be realized by an ASIC such as a standard cell or a structured ASIC, or a PLD such as an FPGA. Specifically, for example, the data processing apparatus 100 can be manufactured by defining the functions of the data processing apparatus 100 described above by HDL description, logically synthesizing the HDL description, and giving the ASIC or PLD.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）対象の演算命令によってメモリから取得した第１行列を受け付けて、受け付けた前記第１行列と、前記第１行列に含まれる複数の要素と同一の要素を含み前記第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた第２行列と、のうち、前記演算命令が指示する演算対象の行列を出力する変換部と、
前記変換部によって出力された前記演算対象の行列に対して、前記演算命令が指示する演算を行う演算部と、
を有することを特徴とするデータ処理装置。 (Supplementary Note 1) A first matrix acquired from a memory by a target operation instruction is received, and the received first matrix includes the same elements as the plurality of elements included in the first matrix and is included in the first matrix A second matrix in which the arrangement of at least any two of the plurality of elements is exchanged, and a conversion unit that outputs a matrix to be calculated indicated by the calculation instruction,
An arithmetic unit that performs an operation instructed by the arithmetic instruction on the matrix to be calculated output by the conversion unit;
A data processing apparatus comprising:

（付記２）前記変換部は、
前記第１行列と、前記第２行列と、のうち、いずれかを出力する選択部と、
前記演算命令によって、前記第１行列と、前記第２行列と、のうち、いずれかを前記選択部に出力させる制御を行う選択制御部と、
を有することを特徴とする付記１に記載のデータ処理装置。 (Supplementary note 2)
A selection unit that outputs one of the first matrix and the second matrix;
A selection control unit that performs control to cause the selection unit to output one of the first matrix and the second matrix in accordance with the arithmetic instruction;
The data processing apparatus according to claim 1, further comprising:

（付記３）前記選択制御部は、
行の要素の数および列の要素の数に対応する前記第１行列に含まれる特定の並びの複数の要素について前記選択部に出力させる前記第１行列に含まれる複数の要素の並びを指示する制御信号の値を記憶する記憶部から、前記演算命令が指示する前記演算対象の行列についての行の要素の数および列の要素の数に基づいて、前記制御信号の値を読み出して前記選択部に出力し、
前記選択部は、
前記第１行列に含まれる複数の要素を、前記選択制御部から出力された前記制御信号が指示する並びにした複数の要素を含む前記演算対象の行列を出力する、
ことを特徴とする付記２に記載のデータ処理装置。 (Supplementary Note 3) The selection control unit
Instructing the arrangement of a plurality of elements included in the first matrix to be output to the selection unit for a plurality of elements in a specific arrangement included in the first matrix corresponding to the number of elements in a row and the number of elements in a column Based on the number of elements in a row and the number of elements in a column for the matrix to be calculated indicated by the calculation instruction from the storage unit that stores the value of the control signal, the value of the control signal is read out and the selection unit Output to
The selection unit includes:
Outputting a plurality of elements included in the first matrix, the calculation target matrix including a plurality of elements instructed by the control signal output from the selection control unit;
The data processing apparatus according to appendix 2, characterized in that:

（付記４）前記第１行列に含まれる複数の要素の数以上の所定数の要素を格納可能なバッファであって、前記メモリから取得した前記第１行列に含まれる複数の要素を特定の並びで格納するバッファを有し、
前記選択部は、
前記バッファに格納された前記所定数の要素を入力可能な前記所定数の選択回路の各々が、前記バッファに格納された前記第１行列に含まれる複数の要素のうちのいずれかを制御信号に基づき選択して出力し、
前記選択制御部は、
前記バッファに格納された前記第１行列に含まれる複数の要素のうち前記選択回路の各々に選択させる要素を指示する制御信号を出力する、
ことを特徴とする付記２または３に記載のデータ処理装置。 (Supplementary Note 4) A buffer capable of storing a predetermined number of elements equal to or greater than the number of elements included in the first matrix, wherein the elements included in the first matrix acquired from the memory are arranged in a specific sequence. Has a buffer to store in,
The selection unit includes:
Each of the predetermined number of selection circuits capable of inputting the predetermined number of elements stored in the buffer uses any one of a plurality of elements included in the first matrix stored in the buffer as a control signal. Select and output based on
The selection control unit
Outputting a control signal indicating an element to be selected by each of the selection circuits among a plurality of elements included in the first matrix stored in the buffer;
The data processing apparatus according to appendix 2 or 3, characterized by the above.

（付記５）前記選択部は、
前記バッファに格納された前記所定数の要素のうち前記第１行列に含まれる複数の要素が格納された位置データと、前記制御信号（以下、「第１制御信号」と称する。）と、に基づいて、前記バッファに格納された前記所定数の要素のうち前記選択回路の各々に選択させる要素を指示する第２制御信号を生成し、前記選択回路の各々が生成した前記第２制御信号に基づいて前記バッファに格納された前記所定数の要素のうちのいずれかを選択して出力することを特徴とする付記４に記載のデータ処理装置。 (Supplementary note 5)
Position data storing a plurality of elements included in the first matrix among the predetermined number of elements stored in the buffer, and the control signal (hereinafter referred to as “first control signal”). Based on the predetermined number of elements stored in the buffer, a second control signal is generated to indicate an element to be selected by each of the selection circuits, and the second control signal generated by each of the selection circuits is generated. The data processing apparatus according to appendix 4, wherein any one of the predetermined number of elements stored in the buffer is selected and output based on the selection.

（付記６）前記第１行列は、行および列の要素の数が互いに同じ複数の第１行列であって、
前記第２行列は、行および列の要素の数が互いに同じ複数の第２行列であって、
前記複数の第２行列の各々は、前記複数の第２行列の各々に対応する前記複数の第１行列の各々について前記第１行列に含まれる前記複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた行列であることを特徴とする付記１〜５のいずれか一つに記載のデータ処理装置。 (Supplementary Note 6) The first matrix is a plurality of first matrices having the same number of row and column elements,
The second matrix is a plurality of second matrices having the same number of row and column elements,
Each of the plurality of second matrices includes at least any two elements of the plurality of elements included in the first matrix for each of the plurality of first matrices corresponding to each of the plurality of second matrices. The data processing device according to any one of appendices 1 to 5, wherein the data processing device is a matrix in which the arrangement of

（付記７）前記演算対象の行列に含まれる複数の要素の各々が虚部と実部とを有する場合に、
前記変換部は、
前記演算対象の行列に含まれる複数の要素の各々が有する虚部の符号を入れ替えて、入れ替えた後の前記演算対象の行列を出力する共役部を有することを特徴とする付記１〜６のいずれか一つに記載のデータ処理装置。 (Supplementary note 7) When each of a plurality of elements included in the calculation target matrix has an imaginary part and a real part,
The converter is
Any one of Supplementary notes 1 to 6, further comprising: a conjugate part that outputs the matrix of the operation target after replacing the sign of the imaginary part of each of the plurality of elements included in the operation target matrix A data processing device according to any one of the above.

（付記８）前記第２行列は、前記第１行列の転置行列であることを特徴とする付記１〜７のいずれか一つに記載のデータ処理装置。 (Supplementary note 8) The data processing device according to any one of supplementary notes 1 to 7, wherein the second matrix is a transposed matrix of the first matrix.

（付記９）コンピュータが、
対象の演算命令によってメモリから取得した第１行列を受け付けて、受け付けた前記第１行列と、前記第１行列に含まれる複数の要素と同一の要素を含み前記第１行列に含まれる複数の要素のうちの少なくともいずれか２つの要素の並びを入れ替えた第２行列と、のうち、前記演算命令が指示する演算対象の行列を出力し、
出力した前記演算対象の行列に対して、前記演算命令が指示する演算を行う、
ことを特徴とするデータ処理方法。 (Supplementary note 9)
The first matrix acquired from the memory by the target operation instruction is received, and the received first matrix and the plurality of elements included in the first matrix including the same elements as the plurality of elements included in the first matrix Out of the second matrix in which the arrangement of at least any two of the elements is replaced, and outputs a matrix to be calculated indicated by the calculation instruction,
Performing the operation indicated by the operation instruction on the output matrix to be operated;
A data processing method.

１００データ処理装置
１０１メモリ
１１１変換部
１１２演算部
２００ベースバンド処理ＬＳＩ
２０１ＲＦ部
２０２専用ハードウェア
２０３ＤＳＰ
２０５アンテナ
２１１プロセッサ
２１２周辺回路
２１３命令メモリ
２１４データメモリ
２２１ＣＰＵ
３０１データ供給回路
３０２演算データパス
３０３データストア回路
３０４命令デコーダ
４０１ラップアラウンド処理回路
４０２行列変換回路
５０１，８０２，１３１１マルチプレクサ
６０１バッファ
６１１選択部
６１２選択制御部
６２１記憶部
８０１入力バッファ
８０３行列変換制御回路
８０４共役回路
８０５出力バッファ
８１１インデックステーブル
８１２共役制御回路
１３０１入力ＦＩＦＯ
１３１２ＷＭ制御回路
１５０１インデックス選択マルチプレクサ
１５０２データ選択マルチプレクサ
ｓｒｃ０，ｓｒｃ１ソースデータ
ｓｅｌ０〜ｓｅｌ１５データ選択制御信号，ＴＦ制御信号
ＳＥＬ００〜ＳＥＬ３１ＷＭ制御信号 DESCRIPTION OF SYMBOLS 100 Data processing apparatus 101 Memory 111 Conversion part 112 Operation part 200 Baseband processing LSI
201 RF unit 202 Dedicated hardware 203 DSP
205 Antenna 211 Processor 212 Peripheral Circuit 213 Instruction Memory 214 Data Memory 221 CPU
DESCRIPTION OF SYMBOLS 301 Data supply circuit 302 Operation data path 303 Data store circuit 304 Instruction decoder 401 Wraparound processing circuit 402 Matrix conversion circuit 501, 802, 1311 Multiplexer 601 Buffer 611 Selection part 612 Selection control part 621 Storage part 801 Input buffer 803 Matrix conversion control circuit 804 Conjugate circuit 805 Output buffer 811 Index table 812 Conjugate control circuit 1301 Input FIFO
1312 WM control circuit 1501 Index selection multiplexer 1502 Data selection multiplexer src0, src1 Source data sel0 to sel15 Data selection control signal, TF control signal SEL00 to SEL31 WM control signal

Claims

The first matrix acquired from the memory by the target operation instruction is received, and the received first matrix and the plurality of elements included in the first matrix including the same elements as the plurality of elements included in the first matrix A second matrix in which the arrangement of at least any two of the elements is replaced, and a conversion unit that outputs a matrix to be calculated indicated by the calculation instruction;
An arithmetic unit that performs an operation instructed by the arithmetic instruction on the matrix to be calculated output by the conversion unit;
A data processing apparatus comprising:

The converter is
A selection unit that outputs one of the first matrix and the second matrix;
A selection control unit that performs control to cause the selection unit to output one of the first matrix and the second matrix in accordance with the arithmetic instruction;
The data processing apparatus according to claim 1, further comprising:

The selection control unit
Instructing the arrangement of a plurality of elements included in the first matrix to be output to the selection unit for a plurality of elements in a specific arrangement included in the first matrix corresponding to the number of elements in a row and the number of elements in a column Based on the number of elements in a row and the number of elements in a column for the matrix to be calculated indicated by the calculation instruction from the storage unit that stores the value of the control signal, the value of the control signal is read out and the selection unit Output to
The selection unit includes:
Outputting a plurality of elements included in the first matrix, the calculation target matrix including a plurality of elements instructed by the control signal output from the selection control unit;
The data processing apparatus according to claim 2.

A buffer capable of storing a predetermined number of elements equal to or greater than the number of elements included in the first matrix, the buffer storing a plurality of elements included in the first matrix acquired from the memory in a specific sequence Have
The selection unit includes:
Each of the predetermined number of selection circuits capable of inputting the predetermined number of elements stored in the buffer uses any one of a plurality of elements included in the first matrix stored in the buffer as a control signal. Select and output based on
The selection control unit
Outputting a control signal indicating an element to be selected by each of the selection circuits among a plurality of elements included in the first matrix stored in the buffer;
The data processing apparatus according to claim 2 or 3, wherein

The selection unit includes:
Position data storing a plurality of elements included in the first matrix among the predetermined number of elements stored in the buffer, and the control signal (hereinafter referred to as “first control signal”). Based on the predetermined number of elements stored in the buffer, a second control signal is generated to indicate an element to be selected by each of the selection circuits, and the second control signal generated by each of the selection circuits is generated. 5. The data processing apparatus according to claim 4, wherein one of the predetermined number of elements stored in the buffer is selected and output based on the selected element.

The first matrix is a plurality of first matrices having the same number of row and column elements,
The second matrix is a plurality of second matrices having the same number of row and column elements,
Each of the plurality of second matrices includes at least any two elements of the plurality of elements included in the first matrix for each of the plurality of first matrices corresponding to each of the plurality of second matrices. The data processing apparatus according to claim 1, wherein the data processing apparatus is a matrix in which the arrangement of the above is exchanged.

When each of the plurality of elements included in the matrix to be operated has an imaginary part and a real part,
The converter is
The imaginary part of each of a plurality of elements included in the matrix to be calculated is replaced with a sign, and a conjugate part that outputs the matrix to be calculated after the replacement is provided. The data processing device according to any one of the above.

Computer
The first matrix acquired from the memory by the target operation instruction is received, and the received first matrix and the plurality of elements included in the first matrix including the same elements as the plurality of elements included in the first matrix Out of the second matrix in which the arrangement of at least any two of the elements is replaced, and outputs a matrix to be calculated indicated by the calculation instruction,
Performing the operation indicated by the operation instruction on the output matrix to be operated;
A data processing method.