JP2011048735A

JP2011048735A - Simd microprocessor

Info

Publication number: JP2011048735A
Application number: JP2009198016A
Authority: JP
Inventors: Hidehito Kitamura; 秀仁北村
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2009-08-28
Filing date: 2009-08-28
Publication date: 2011-03-10
Anticipated expiration: 2029-08-28
Also published as: JP5463799B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a SIMD microprocessor that can execute processing requiring operations involving adjacent pixel data in fewer machine cycles than before by a simple configuration. <P>SOLUTION: A logic circuit 13 for conditional reference to a Z1 register 12 of an adjacent PE at the execution of a conditional instruction in a PE part 2 is disposed to enable reference to the adjacent right or left Z1 register 12. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は１つの演算命令により複数のデータ等を並列処理するＳＩＭＤ（Single Instruction-stream, Multiple Data-stream）型マイクロプロセッサに関する。 The present invention relates to a SIMD (Single Instruction-stream, Multiple Data-stream) type microprocessor that processes a plurality of data in parallel with one arithmetic instruction.

近年、デジタル複写機やファクシミリ装置などの画像処理装置において、画素数を増加したり、或いはカラー対応にするなどといった画像の向上が図られている。そして、この画像の向上に伴い、処理すべきデータ数が増加している。ところで、複写機等の画像処理装置におけるデータ処理は、全ての画素に対して同じ演算処理を施すことが多い。そこで、１つの命令で複数のデータに対して同時に同じ演算処理を行うＳＩＭＤ方式のマイクロプロセッサが用いられるようになっている。 In recent years, image processing apparatuses such as digital copying machines and facsimile machines have been improved in image quality by increasing the number of pixels or making them compatible with color. As the image is improved, the number of data to be processed has increased. By the way, in data processing in an image processing apparatus such as a copying machine, the same arithmetic processing is often performed on all pixels. Therefore, SIMD type microprocessors that simultaneously perform the same arithmetic processing on a plurality of data with one instruction are used.

図１５に従来の一般的なＳＩＭＤ型マイクロプロセッサを示す。図１５に示したＳＩＭＤ型マイクロプロセッサ１０１は、プロセッサエレメント部１０２と、グローバルプロセッサ１０３と、外部入出力１０４と、画像メモリ１０５と、を備えている。 FIG. 15 shows a conventional general SIMD type microprocessor. The SIMD type microprocessor 101 illustrated in FIG. 15 includes a processor element unit 102, a global processor 103, an external input / output 104, and an image memory 105.

プロセッサエレメント（以下、ＰＥと呼ぶ）部１０２は、複数のＰＥから構成され、各ＰＥはレジスタファイル１０６と、演算部１０７と、を備えている。レジスタファイル１０６では、ＰＥ命令で処理されるデータを保持している。ＰＥ部１０２に対する処理命令であるＰＥ命令はＳＩＭＤタイプの命令であり、レジスタファイル１０６に保持されている複数のデータに同時に同じ処理を行う。このレジスタファイル１０６からのデータの読み出し／書き込みの制御はグローバルプロセッサ１０３からの制御によって行われる。読み出されたデータは演算部１０７に送られ、演算部１０７での演算処理後にレジスタファイル１０６に書き込まれる。また、レジスタファイル１０６はプロセッサ外部からのアクセスが可能であり、グローバルプロセッサ１０３の制御とは別に外部から特定のレジスタの読み出し／書き込みが行われる。演算部１０７では、ＰＥ命令の演算処理が行われる。処理の制御はすべてグローバルプロセッサ１０３から行われる。 The processor element (hereinafter referred to as PE) unit 102 includes a plurality of PEs, and each PE includes a register file 106 and a calculation unit 107. The register file 106 holds data processed by the PE instruction. The PE instruction which is a processing instruction for the PE unit 102 is a SIMD type instruction, and simultaneously performs the same processing on a plurality of data held in the register file 106. Control of reading / writing of data from the register file 106 is performed by control from the global processor 103. The read data is sent to the arithmetic unit 107 and written into the register file 106 after arithmetic processing in the arithmetic unit 107. The register file 106 can be accessed from the outside of the processor, and a specific register is read / written from the outside separately from the control of the global processor 103. The arithmetic unit 107 performs PE instruction arithmetic processing. All processes are controlled from the global processor 103.

グローバルプロセッサ（以下、ＧＰと呼ぶ）１０３は、いわゆるＳＩＳＤ（Single Instruction-stream, Single Data-stream）方式のプロセッサであり、プログラムＲＡＭとデータＲＡＭを内蔵し、プログラムを解読し各種制御信号を生成する。この制御信号は内蔵する各種ブロックの制御以外にもレジスタファイル１０６、演算部１０７へも供給される。また、ＧＰ１０３内の演算器等に対する命令であるＧＰ命令実行時は内蔵する汎用レジスタ、ＡＬＵ（算術論理演算器）などを使用して各種演算処理、プログラム制御処理を行う。 A global processor (hereinafter referred to as GP) 103 is a so-called SISD (Single Instruction-stream, Single Data-stream) processor, which includes a program RAM and a data RAM, decodes the program, and generates various control signals. . This control signal is supplied to the register file 106 and the arithmetic unit 107 in addition to the control of various built-in blocks. In addition, when a GP instruction, which is an instruction for the arithmetic unit in the GP 103, is executed, various arithmetic processes and program control processes are performed using a built-in general-purpose register, ALU (arithmetic logic arithmetic unit), and the like.

外部入出力１０４は、画像メモリ１０５から処理する元の画像データを読み出しＰＥ部１０２のレジスタファイル１０６に書き込む、あるいはレジスタファイル１０６から処理後の画像データを読み出し画像メモリ１０５に書き込む装置である。 The external input / output 104 is a device that reads the original image data to be processed from the image memory 105 and writes it to the register file 106 of the PE unit 102, or reads the processed image data from the register file 106 and writes it to the image memory 105.

画像メモリ１０５は、処理する元の画像データを記憶、処理後の画像データを記憶する記憶装置である。 The image memory 105 is a storage device that stores original image data to be processed and stores processed image data.

上述した構成のＳＩＭＤ型マイクロプロセッサ１０１における画像処理において、２値化された画像データがある場合に、０と１の境界を判定して、その結果を記憶させておき、後の処理で使用する場合がある。例えば、ラベリング処理の中の一部がこれに該当する。２値化された画像データが並んでいる場合に、０と１の境界を判定してその結果を格納するまでをＳＩＭＤ型マイクロプロセッサ１０１で行う場合の従来手法について以下に説明する。 In the image processing in the SIMD type microprocessor 101 having the above-described configuration, when there is binarized image data, the boundary between 0 and 1 is determined, the result is stored, and used in later processing. There is a case. For example, a part of the labeling process corresponds to this. A conventional method in the case where the SIMD microprocessor 101 performs the process from determining the boundary between 0 and 1 to storing the result when binarized image data is arranged will be described below.

図１６はＳＩＭＤ型マイクロプロセッサ１０１のＰＥ部１０２内の構成の一部を抜粋して示している。レジスタファイル１０６は、１６ビットのレジスタＲ０〜Ｒ１５の１６本を備えており、算術演算器（ＡＬＵ）１１０への経路を持つ。レジスタファイル１０６からのデータは、ＰＥシフト１０８により自身のレジスタファイル１０６からのデータおよび隣接するＰＥのレジスタファイル１０６からのデータおよび２つ隣のＰＥのレジスタファイル１０６からのデータのうちいずれかから選択される。ＰＥシフト１０８後のデータは、パイプラインレジスタ１０９に格納される。次に一旦パイプラインレジスタ１０９に格納されたデータがＡＬＵ１１０で演算され、アキュムレータである結果格納レジスタ（Ａレジスタ）１１１に格納される。さらに、Ｚ１レジスタ１１２とＺ２レジスタ１１３は、ＡＬＵ１１０での演算結果がゼロとなった場合に１を格納するゼロフラグレジスタである。Ｔレジスタ１１５は、Ｚ１レジスタ１１２とＺ２レジスタ１１３との論理演算結果を格納する条件レジスタである。図では省略しているが、Ａレジスタ１１１から、自身のレジスタファイル１０６および隣接ＰＥのレジスタファイル１０６および２つ隣のＰＥのレジスタファイル１０６のいずれかに書き込むことが可能となっている。 FIG. 16 shows a part of the configuration in the PE unit 102 of the SIMD type microprocessor 101. The register file 106 includes 16 16-bit registers R0 to R15 and has a path to an arithmetic unit (ALU) 110. The data from the register file 106 is selected from the data from its own register file 106, the data from the register file 106 of the adjacent PE, and the data from the register file 106 of the two adjacent PEs by the PE shift 108. Is done. Data after the PE shift 108 is stored in the pipeline register 109. Next, the data once stored in the pipeline register 109 is calculated by the ALU 110 and stored in the result storage register (A register) 111 which is an accumulator. Further, the Z1 register 112 and the Z2 register 113 are zero flag registers that store 1 when the operation result in the ALU 110 becomes zero. The T register 115 is a condition register that stores a logical operation result of the Z1 register 112 and the Z2 register 113. Although not shown in the drawing, it is possible to write from the A register 111 to either the register file 106 of its own, the register file 106 of the adjacent PE, or the register file 106 of the two adjacent PEs.

次に、図１６に示したような構成になっているＳＩＭＤ型マイクロプロセッサ１０１で、例えば図１７上段に示す０と１で２値化された画像データがあり、このときの０と１の境界を判定してその結果を格納するまでの動作を説明する。画像データは、各ＰＥのレジスタＲ０に格納されており、判定結果は条件レジスタであるＴレジスタ１１５に格納するものとする。このような動作は、図１８に示す複数のＰＥ命令によって実施される。 Next, in the SIMD type microprocessor 101 configured as shown in FIG. 16, for example, there is image data binarized by 0 and 1 shown in the upper part of FIG. 17, and the boundary between 0 and 1 at this time The operation from the determination to storing the result will be described. The image data is stored in the register R0 of each PE, and the determination result is stored in the T register 115 which is a condition register. Such an operation is performed by a plurality of PE instructions shown in FIG.

まず命令（１）で、各ＰＥが自身のレジスタＲ０の値を即値０と比較し、そのときＡＬＵ１１０での減算結果がゼロとなる場合にＺ１レジスタ１１２に１が格納される。次に命令（２）で、各ＰＥが右隣のＰＥのレジスタＲ０の値を即値０と比較し、そのときＡＬＵ１１０での減算結果がゼロとなる場合にＺ２レジスタ１１３に１が格納される。最後に命令（３）によって、命令（１）及び命令（２）で求めたＺ１レジスタ１１２とＺ２レジスタ１１３との排他的論理和演算を論理回路１１４にて行い、その結果をＴレジスタ１１５に格納する。これにより、０と１の境界の判定結果を得ることができた（図１７下段）。 First, in the instruction (1), each PE compares the value of its register R0 with the immediate value 0, and when the subtraction result in the ALU 110 becomes zero at that time, 1 is stored in the Z1 register 112. Next, in the instruction (2), each PE compares the value of the register R0 of the right adjacent PE with the immediate value 0, and if the subtraction result in the ALU 110 becomes zero at that time, 1 is stored in the Z2 register 113. Finally, by the instruction (3), an exclusive OR operation of the Z1 register 112 and the Z2 register 113 obtained by the instruction (1) and the instruction (2) is performed in the logic circuit 114, and the result is stored in the T register 115. To do. As a result, the determination result of the boundary between 0 and 1 could be obtained (lower row in FIG. 17).

図１８に示した命令により、３マシンサイクルで、２値化画像データの境界を求めることができる。なお、上述したＳＩＭＤ型マイクロプロセッサ１０１では、１マシンサイクルで、比較演算などを実施して、その結果のフラグまでを決定できるものである。その後に、さらなる論理演算を実施して、条件レジスタやフラグレジスタ等を更新するのは別の命令で実行する必要がある。 With the instruction shown in FIG. 18, the boundary of the binarized image data can be obtained in 3 machine cycles. In the SIMD type microprocessor 101 described above, a comparison operation or the like can be performed in one machine cycle and a flag up to the result can be determined. After that, it is necessary to execute a further logical operation and update the condition register, the flag register, etc. with another instruction.

また、画像処理では、隣接する数画素（３〜５画素）と比較して、その中の最大値を求めて、その値を特徴量とすることがある。次に、隣接する画素と含めて３画素の中の最大値を求める処理をＳＩＭＤ型マイクロプロセッサ１０１で行う場合を例に挙げ説明する。 In the image processing, the maximum value among them may be obtained by comparing with several adjacent pixels (3 to 5 pixels), and the value may be used as a feature amount. Next, the case where the SIMD microprocessor 101 performs processing for obtaining the maximum value among three pixels including adjacent pixels will be described as an example.

図１９は、図１６と同様に、ＳＩＭＤ型マイクロプロセッサ１０１のＰＥ部１０２内の構成の一部を抜粋して示している。図１９では、Ｚ１レジスタ１１２、Ｚ２レジスタ１１３、論理回路１１４、Ｔレジスタ１１５に代えて、ＡＬＵ１１０の大小比較演算の結果を示す大小比較演算フラグを格納するＣレジスタ１１６が追加されている。 FIG. 19 shows a part of the configuration in the PE unit 102 of the SIMD type microprocessor 101 in the same manner as FIG. In FIG. 19, instead of the Z1 register 112, the Z2 register 113, the logic circuit 114, and the T register 115, a C register 116 for storing a size comparison operation flag indicating the result of the size comparison operation of the ALU 110 is added.

次に、図１９に示したような構成になっているＳＩＭＤ型マイクロプロセッサ１０１で、例えば図２０上段に示す画像データがあり、このうちＰＥ４を対象画像データとして、その両隣の画像データとの３画素中での最大値を求める場合を例に挙げる。画像データは、符号なしの値と考え、各ＰＥのレジスタＲ０に格納されているとする。そのとき、最大値は図２１に示す命令を実施することで求められる。まず命令（１）で、各ＰＥのＡレジスタ１１１にレジスタＲ０の画像データを格納する。次に命令（２）によって、各ＰＥのＡレジスタ１１１のデータとその左隣（ＰＥ番号の小さい方）のレジスタＲ０のデータと比較する。このとき、「Ａレジスタ１１１のデータ＜その左隣のＰＥのレジスタＲ０のデータ」が成立するならば、Ｃレジスタ１１６に１が格納される。これはＡＬＵ演算時のボローフラグがＣレジスタ１１６に入力されることと同じである。「Ａレジスタ１１１のデータ＜その左隣のＰＥのレジスタＲ０のデータ」が成立しないならば、Ｃレジスタ１１６に０が格納される。次に命令（３）によって、対象ＰＥのＡレジスタ１１１を、Ｃレジスタ１１６が１ならば左隣のＰＥのレジスタＲ０のデータで更新し、Ｃレジスタ１１６が０ならばＡレジスタ１１１のデータをそのままとする。 Next, in the SIMD type microprocessor 101 configured as shown in FIG. 19, there is, for example, the image data shown in the upper part of FIG. 20. Among these, PE4 is the target image data, and 3 of the image data on both sides thereof. An example of obtaining the maximum value in a pixel will be described. The image data is considered as an unsigned value and is assumed to be stored in the register R0 of each PE. At that time, the maximum value is obtained by executing the command shown in FIG. First, with the instruction (1), the image data of the register R0 is stored in the A register 111 of each PE. Next, the instruction (2) compares the data in the A register 111 of each PE with the data in the register R0 on the left side (the smaller PE number). At this time, if “data in A register 111 <data in register R0 of PE adjacent to the left” is satisfied, 1 is stored in C register 116. This is the same as inputting the borrow flag to the C register 116 at the time of the ALU operation. If “data in A register 111 <data in register R0 of PE adjacent to the left” does not hold, 0 is stored in C register 116. Next, according to the instruction (3), if the C register 116 is 1, the A register 111 of the target PE is updated with the data in the register R0 of the left adjacent PE. If the C register 116 is 0, the data in the A register 111 is left as it is. And

次に命令（４）によって、左隣の場合と同様に、各ＰＥのＡレジスタ１１１のデータとその右隣（ＰＥ番号の大きい方）のレジスタＲ０のデータと比較する。このとき、「Ａレジスタ１１１のデータ＜その右隣のレジスタＲ０のデータ」が成立するならば、Ｃレジスタ１１６に１が格納される。「Ａレジスタ１１１のデータ＜その右隣のレジスタＲ０のデータ」が成立しないならば、Ｃレジスタ１１６に０が格納される。次に命令（５）によって、左隣の場合と同様に、対象ＰＥのＡレジスタ１１１を、Ｃレジスタ１１６が１ならば右隣のＰＥのレジスタＲ０のデータで更新し、Ｃレジスタが０ならばＡレジスタ１１１のデータをそのままとする。図２１に示した命令により、Ａレジスタ１１１に画像データを設定してから、命令（２）から（５）までの合わせて４マシンサイクルで、３つのデータの最大値を求めている。 Next, the instruction (4) compares the data in the A register 111 of each PE with the data in the register R0 on the right side (the one with the larger PE number) as in the case of the left side. At this time, if “the data of the A register 111 <the data of the register R0 adjacent to the right” is established, 1 is stored in the C register 116. If “data in A register 111 <data in register R0 adjacent to the right” is not satisfied, 0 is stored in C register 116. Next, as in the case of the left neighbor, the instruction (5) updates the A register 111 of the target PE with the data in the register R0 of the right neighbor PE if the C register 116 is 1, and if the C register is 0. The data in the A register 111 is left as it is. By setting the image data in the A register 111 by the instruction shown in FIG. 21, the maximum values of the three data are obtained in four machine cycles in total from the instructions (2) to (5).

また、上述した方法以外に、例えば、特許文献１に記載のＳＩＭＤ型マイクロプロセッサに提案された構成でも隣接する数画素の最大値を求めることができる。 In addition to the method described above, for example, the maximum value of several adjacent pixels can be obtained even in the configuration proposed in the SIMD type microprocessor described in Patent Document 1.

ＳＩＭＤ型マイクロプロセッサにおける画像処理において、隣接する画素データとの演算を必要とする処理は、上述した処理も含め数多く存在する。そのために、できるだけ高速に大量の画像データを処理する場合には、このような処理をできる限り少ない命令数、即ちできるだけ少ないマシンサイクルで実施できることが求められている。 In image processing in a SIMD type microprocessor, there are many processes that require computation with adjacent pixel data, including the processes described above. Therefore, when processing a large amount of image data as fast as possible, it is required that such processing can be performed with as few instructions as possible, that is, with as few machine cycles as possible.

特許文献１に記載されたＳＩＭＤ型マイクロプロセッサは、マシンサイクル数を削減することはできるが、選択ビットや補助ビットをプロセッサエレメントに設け、これらを予め設定する必要があり、これらのビットパターンが複数ある場合はその分を設定するサイクル数や格納領域も必要となる。さらに、命令によって選択ビットや補助ビットの値を変更するためには、変更を指示するための制御信号等も設ける必要がある。 Although the SIMD type microprocessor described in Patent Document 1 can reduce the number of machine cycles, it is necessary to provide selection bits and auxiliary bits in a processor element and set them in advance, and there are a plurality of these bit patterns. In some cases, the number of cycles and the storage area for setting that amount are also required. Further, in order to change the value of the selection bit or the auxiliary bit by an instruction, it is necessary to provide a control signal for instructing the change.

本発明はかかる問題を解決することを目的としている。 The present invention aims to solve such problems.

すなわち、本発明は、隣接する画素データとの演算を必要とする処理を行う際に、簡単な構成で従来よりも少ないマシンサイクルで実行することができるＳＩＭＤ型マイクロプロセッサを提供することを目的としている。 That is, an object of the present invention is to provide a SIMD type microprocessor that can be executed with a simple configuration and fewer machine cycles than before when performing processing that requires computation with adjacent pixel data. Yes.

請求項１に記載された発明は、データ格納手段、演算手段、演算結果格納手段、演算結果フラグが設けられたプロセッサエレメントを複数個備えて構成されるプロセッサエレメント部と、プログラムを解読しプロセッサエレメント部に制御信号を供給するグローバルプロセッサと、を備えたＳＩＭＤ型マイクロプロセッサにおいて、前記プロセッサエレメントが、条件付き命令を実行する際に、条件として隣接する前記プロセッサエレメントの前記演算結果フラグを参照する参照手段が設けられていることを特徴とするＳＩＭＤ型マイクロプロセッサである。 According to the first aspect of the present invention, there is provided a processor element unit comprising a plurality of processor elements provided with data storage means, arithmetic means, arithmetic result storage means, and arithmetic result flags, and a processor element for decoding a program. A reference processor that refers to the operation result flag of the adjacent processor element as a condition when the processor element executes a conditional instruction. A SIMD type microprocessor characterized in that means is provided.

請求項２に記載された発明は、請求項１に記載された発明において、前記プロセッサエレメントが、前記参照手段において参照した結果、自身の前記演算結果格納手段に格納されている演算データと、隣接する前記プロセッサエレメントの前記演算結果格納手段に格納されている演算データと、を選択して自身の前記演算結果格納手段に格納する選択手段が設けられていることを特徴とする。 According to a second aspect of the present invention, in the first aspect of the present invention, the processor element is adjacent to the operation data stored in the operation result storage unit of the processor element as a result of referring to the reference unit. Selecting means for selecting calculation data stored in the calculation result storage means of the processor element and storing the calculation data in the calculation result storage means of the processor element.

請求項３に記載された発明は、請求項１または２に記載された発明において、前記プロセッサエレメントが、前記演算結果フラグとして、現在の命令による演算結果フラグと、一つ前の命令による演算結果フラグと、を備え、前記参照手段が、自プロセッサエレメントの前記現在の命令による演算結果フラグと、前記一つ前の命令による演算結果フラグと、隣接する前記プロセッサエレメントの前記現在の命令による演算結果フラグと、前記一つ前の命令による演算結果フラグと、のうち、少なくとも３つ以上の演算結果フラグを参照して、前記選択手段に演算データを選択させることを特徴とする。 According to a third aspect of the present invention, in the first or second aspect of the present invention, the processor element includes a calculation result flag based on a current instruction and a calculation result based on a previous instruction as the calculation result flag. A calculation result flag based on the current instruction of the processor element; an operation result flag based on the previous instruction; and a calculation result based on the current instruction of the adjacent processor element. The selection means is made to select calculation data with reference to at least three calculation result flags out of the flag and the calculation result flag of the previous instruction.

請求項４に記載された発明は、請求項３に記載された発明において、前記参照手段が、自プロセッサエレメントの前記現在の命令による演算結果フラグまたは前記一つ前の命令による演算結果フラグのうちいずれか一方と、隣接する前記プロセッサエレメントの前記現在の命令による演算結果フラグと、前記一つ前の命令による演算結果フラグと、を参照していることを特徴とする。 According to a fourth aspect of the present invention, in the invention according to the third aspect, the reference means includes an operation result flag based on the current instruction of the processor element or an operation result flag based on the previous instruction. Reference is made to either one of the calculation result flag based on the current instruction of the adjacent processor element and the calculation result flag based on the previous instruction.

請求項１に記載の発明によれば、ＰＥ部で条件付き命令を実行する際に、条件として隣接するＰＥの演算結果フラグを参照するための参照手段が設けられているので、隣接する左右の演算結果フラグを参照することが可能となり、隣接するＰＥの演算結果フラグを何等かの条件として使用するような処理がある場合において、全体の処理のマシンサイクル数を削減することができる。 According to the first aspect of the present invention, when the conditional instruction is executed in the PE unit, the reference means for referring to the operation result flag of the adjacent PE as a condition is provided. It becomes possible to refer to the operation result flag, and when there is a process that uses the operation result flag of the adjacent PE as any condition, the number of machine cycles of the entire process can be reduced.

請求項２に記載の発明によれば、参照手段の参照の結果、自身の演算結果格納手段に格納されている演算データと、隣接するＰＥの演算結果格納手段に格納されている演算データと、を選択して自身の演算結果格納手段に格納する選択手段が設けられているので、条件付き命令の条件として使用できる、隣接する左右の演算結果フラグを参照して、自ＰＥの演算結果格納手段もしくは、隣接するＰＥの演算結果格納手段の値を選択して、自ＰＥの演算結果格納手段に格納することができる。 According to the invention described in claim 2, as a result of referring to the reference means, the calculation data stored in its own calculation result storage means, the calculation data stored in the calculation result storage means of the adjacent PE, Is selected and stored in its own calculation result storage means, so that it can be used as a condition of a conditional instruction and refers to adjacent left and right calculation result flags, and its PE calculation result storage means Alternatively, the value of the calculation result storage means of the adjacent PE can be selected and stored in the calculation result storage means of the own PE.

請求項３に記載の発明によれば、参照手段が、自プロセッサエレメントの現在の命令による演算結果フラグと、一つ前の命令による演算結果フラグと、隣接するプロセッサエレメントの現在の命令による演算結果フラグと、一つ前の命令による演算結果フラグと、のうち、少なくとも３つ以上の演算結果フラグを参照して、選択手段を制御しているので、自ＰＥの演算結果格納手段と、隣接または近傍のＰＥの演算結果格納手段とのうちの最大値を求めるといった複数のＰＥの演算結果を用いた動作を従来よりも少ないマシンサイクルで行うことができる。 According to the third aspect of the present invention, the reference means includes an operation result flag based on the current instruction of the processor element, an operation result flag based on the previous instruction, and an operation result based on the current instruction of the adjacent processor element. Since the selection unit is controlled with reference to at least three or more calculation result flags among the flag and the calculation result flag by the previous instruction, the calculation result storage unit of the own PE is adjacent to or An operation using a plurality of PE calculation results such as obtaining the maximum value of the calculation results storage means of neighboring PEs can be performed with fewer machine cycles than in the past.

請求項４に記載の発明によれば、参照手段が、自プロセッサエレメントの現在の命令による演算結果フラグまたは一つ前の命令による演算結果フラグのうちいずれか一方と、隣接するプロセッサエレメントの現在の命令による演算結果フラグと、一つ前の命令による演算結果フラグと、を参照しているので、自ＰＥの演算結果格納手段と、隣接または近傍のＰＥの演算結果格納手段とのうちの最大値を求める動作を従来よりも少ないマシンサイクルで行うことができる。 According to the fourth aspect of the present invention, the reference means includes either one of the operation result flag based on the current instruction of the own processor element or the operation result flag based on the previous instruction, and the current value of the adjacent processor element. Since the operation result flag by the instruction and the operation result flag by the previous instruction are referenced, the maximum value of the operation result storage means of the own PE and the operation result storage means of the adjacent or neighboring PE Can be performed with fewer machine cycles than before.

本発明の第１の実施形態にかかるＳＩＭＤ型マイクロプロセッサのブロック図である。1 is a block diagram of a SIMD type microprocessor according to a first embodiment of the present invention. 図１に示したＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。FIG. 2 is a configuration diagram in which a part of a configuration in a processor element unit of the SIMD type microprocessor illustrated in FIG. 1 is extracted. 画像データ０と１の境界を検出する動作の説明図である。It is explanatory drawing of the operation | movement which detects the boundary of image data 0 and 1. FIG. 図１に示したＳＩＭＤ型マイクロプロセッサで画像データの０と１の境界を判定するプログラムである。This is a program for determining the boundary between 0 and 1 of image data by the SIMD type microprocessor shown in FIG. 本発明の第２の実施形態にかかるＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。It is the block diagram which extracted a part of structure in the processor element part of the SIMD type | mold microprocessor concerning the 2nd Embodiment of this invention. 最大値を求める画像データとそれらの画像データを比較して得られるフラグを示した説明図である。It is explanatory drawing which showed the flag obtained by comparing the image data which calculates | requires the maximum value, and those image data. 図５に示したＳＩＭＤ型マイクロプロセッサで自身とその両隣の画像データとの３画素中での最大値を求めるプログラムである。This is a program for calculating the maximum value in three pixels of the SIMD type microprocessor shown in FIG. フラグレジスタの組み合わせによる選択されるＡレジスタを示した真理値表である。It is a truth table showing an A register selected by a combination of flag registers. 図５に示しされたプロセッサエレメント部の論理回路の回路図及び真理値表である。6 is a circuit diagram and a truth table of the logic circuit of the processor element section shown in FIG. 5. 本発明の第３の実施形態にかかるＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。It is the block diagram which extracted a part of structure in the processor element part of the SIMD type | mold microprocessor concerning the 3rd Embodiment of this invention. 最大値を求める画像データとそれらの画像データを比較して得られるフラグを示した説明図である。It is explanatory drawing which showed the flag obtained by comparing the image data which calculates | requires the maximum value, and those image data. 図１０に示したＳＩＭＤ型マイクロプロセッサで自身とその両隣の画像データとの３画素中での最大値を求めるプログラムである。This is a program for obtaining a maximum value in three pixels of the SIMD type microprocessor shown in FIG. フラグレジスタの組み合わせによる選択されるＡレジスタを示した真理値表である。It is a truth table showing an A register selected by a combination of flag registers. 図１０に示しされたプロセッサエレメント部の論理回路の回路図及び真理値表である。FIG. 11 is a circuit diagram and a truth table of the logic circuit of the processor element unit shown in FIG. 10. 従来のＳＩＭＤ型マイクロプロセッサのブロック図である。It is a block diagram of a conventional SIMD type microprocessor. 図１５に示したＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。FIG. 16 is a configuration diagram excerpting a part of the configuration in the processor element section of the SIMD type microprocessor shown in FIG. 15. 画像データ０と１の境界を検出する動作の説明図である。It is explanatory drawing of the operation | movement which detects the boundary of image data 0 and 1. FIG. 図１５に示したＳＩＭＤ型マイクロプロセッサで画像データの０と１の境界を判定するプログラムである。This is a program for determining the boundary between 0 and 1 of image data by the SIMD type microprocessor shown in FIG. 図１５に示したＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。FIG. 16 is a configuration diagram excerpting a part of the configuration in the processor element section of the SIMD type microprocessor shown in FIG. 15. 自身と隣接するＰＥとから最大値を求める動作の説明図である。It is explanatory drawing of the operation | movement which calculates | requires the maximum value from self and adjacent PE. 図１９に示したＳＩＭＤ型マイクロプロセッサで自身とその両隣の画像データとの３画素中での最大値を求めるプログラムである。This is a program for obtaining a maximum value in three pixels of the SIMD type microprocessor shown in FIG.

［第１実施形態］
以下、本発明の第１の実施形態を、図１ないし図４を参照して説明する。図１は、本発明の第１の実施形態にかかるＳＩＭＤ型マイクロプロセッサのブロック図である。図２は、図１に示したＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。図３は、画像データ０と１の境界を検出する動作の説明図である。図４は、図１に示したＳＩＭＤ型マイクロプロセッサで画像データの０と１の境界を判定するプログラムである。 [First Embodiment]
Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram of a SIMD type microprocessor according to the first embodiment of the present invention. FIG. 2 is a configuration diagram in which a part of the configuration in the processor element part of the SIMD type microprocessor shown in FIG. 1 is extracted. FIG. 3 is an explanatory diagram of the operation of detecting the boundary between the image data 0 and 1. FIG. 4 shows a program for determining the boundary between 0 and 1 of image data by the SIMD type microprocessor shown in FIG.

図１に本発明の第１の実施形態にかかるＳＩＭＤ型マイクロプロセッサ１を示す。図１に示したＳＩＭＤ型マイクロプロセッサ１は、プロセッサエレメント（ＰＥ）部２と、グローバルプロセッサ（ＧＰ）３と、外部入出力４と、画像メモリ５と、を備えている。 FIG. 1 shows a SIMD type microprocessor 1 according to a first embodiment of the present invention. The SIMD type microprocessor 1 shown in FIG. 1 includes a processor element (PE) unit 2, a global processor (GP) 3, an external input / output 4, and an image memory 5.

ＰＥ部２は、複数のＰＥから構成され、各ＰＥはデータ格納手段としてのレジスタファイル６と、演算部７と、を備えている。レジスタファイル６は、ＰＥ命令で処理されるデータを保持している。ＰＥ部２に対する処理命令であるＰＥ命令はＳＩＭＤタイプの命令であり、レジスタファイル６に保持されている複数のデータに同時に同じ処理を行う。このレジスタファイル６からのデータの読み出し／書き込みの制御はＧＰ３からの制御によって行われる。読み出されたデータは演算部７に送られ、演算部７での演算処理後にレジスタファイル６に書き込まれる。また、レジスタファイル６はプロセッサ外部からのアクセスが可能であり、ＧＰ３の制御とは別に外部から特定のレジスタの読み出し／書き込みが行われる。演算部７は、ＰＥ命令の演算処理が行われる。処理の制御はすべてＧＰ３から行われる。また、演算部７は、各ＰＥの演算部７がアレイ状に構成されている。 The PE unit 2 is composed of a plurality of PEs, and each PE includes a register file 6 as a data storage unit and an arithmetic unit 7. The register file 6 holds data processed by the PE instruction. The PE instruction which is a processing instruction for the PE unit 2 is a SIMD type instruction, and simultaneously performs the same processing on a plurality of data held in the register file 6. Control of reading / writing of data from the register file 6 is performed by control from the GP 3. The read data is sent to the calculation unit 7 and written into the register file 6 after the calculation process in the calculation unit 7. The register file 6 can be accessed from the outside of the processor, and a specific register is read / written from the outside separately from the control of the GP 3. The arithmetic unit 7 performs processing of the PE instruction. All processing control is performed from GP3. Further, in the calculation unit 7, the calculation units 7 of each PE are configured in an array.

ＧＰ３は、いわゆるＳＩＳＤ（Single Instruction-stream, Single Data-stream）方式のプロセッサであり、プログラムＲＡＭとデータＲＡＭを内蔵し、プログラムを解読し各種制御信号を生成する。この制御信号を内蔵する各種ブロックの制御以外にもレジスタファイル６、演算部７へも供給される。また、ＧＰ３内の演算器等に対する命令であるＧＰ命令実行時は内蔵する汎用レジスタ、ＡＬＵ（算術論理演算器）などを使用して各種演算処理、プログラム制御処理を行う。 The GP 3 is a so-called SISD (Single Instruction-stream, Single Data-stream) processor, which includes a program RAM and a data RAM, decodes the program, and generates various control signals. The control signal is supplied to the register file 6 and the arithmetic unit 7 in addition to the control of various blocks incorporating the control signal. In addition, when a GP instruction, which is an instruction to an arithmetic unit in GP3, is executed, various arithmetic processes and program control processes are performed using a built-in general-purpose register, an ALU (arithmetic logic arithmetic unit), and the like.

外部入出力４は、画像メモリ５から処理する元の画像データを読み出しＰＥ部２のレジスタファイル６に書き込む、あるいはレジスタファイル６から処理後の画像データを読み出し画像メモリ５に書き込む装置である。 The external input / output 4 is a device that reads the original image data to be processed from the image memory 5 and writes it to the register file 6 of the PE unit 2, or reads the processed image data from the register file 6 and writes it to the image memory 5.

画像メモリ５は、処理する元の画像データを記憶、処理後の画像データを記憶する記憶装置である。 The image memory 5 is a storage device that stores original image data to be processed and stores processed image data.

図２はＳＩＭＤ型マイクロプロセッサ１のＰＥ部２内の構成の一部を抜粋して示している。図２では、ＰＥ３、ＰＥ４、ＰＥ５の３つのＰＥを抜粋している。また、ＰＥ３、ＰＥ４、ＰＥ５と表示した数字部分はＰＥ番号を示し、本実施形態では、ＰＥ４から見て、ＰＥ３は左隣に配置、ＰＥ５は右隣に配置されているものとする。 FIG. 2 shows a part of the configuration in the PE unit 2 of the SIMD type microprocessor 1. In FIG. 2, three PEs PE3, PE4, and PE5 are extracted. In addition, the numeral portions displayed as PE3, PE4, and PE5 indicate PE numbers. In this embodiment, when viewed from PE4, PE3 is arranged on the left side and PE5 is arranged on the right side.

レジスタファイル６は、１６ビットのレジスタＲ０〜Ｒ１５の１６本を備えており、後述する演算部７の算術演算器（ＡＬＵ）１０への経路を持つ。 The register file 6 includes 16 16-bit registers R0 to R15, and has a path to an arithmetic operator (ALU) 10 of the arithmetic unit 7 described later.

演算部７は、ＰＥシフト８と、パイプラインレジスタ９、演算手段としてのＡＬＵ１０と、演算結果格納手段としての結果格納レジスタ１１と、演算結果フラグとしてのＺ１レジスタ１２と、参照手段としての論理回路１３と、Ｔレジスタ１４と、を備えている。 The arithmetic unit 7 includes a PE shift 8, a pipeline register 9, an ALU 10 as arithmetic means, a result storage register 11 as arithmetic result storage means, a Z1 register 12 as an arithmetic result flag, and a logic circuit as reference means 13 and a T register 14.

レジスタファイル６からのデータは、ＰＥシフト８により自身のレジスタファイル６からのデータおよび隣接するＰＥのレジスタファイル６および２つ隣のＰＥのレジスタファイル６からのデータのうちいずれかから選択される。ＰＥシフト８後のデータは、パイプラインレジスタ９に格納される。次に一旦パイプラインレジスタ９に格納されたデータがＡＬＵ１０で演算され、アキュムレータである結果格納レジスタ（Ａレジスタ）１１に格納される。Ｚ１レジスタ１２は、ＡＬＵ１０での演算結果がゼロとなった場合に１を格納するゼロフラグレジスタである。論理回路１３は自ＰＥのＺ１レジスタ１２の値と隣接するＰＥ（本実施形態では右隣のＰＥ）のＺ１レジスタ１２の値との論理演算を行う。Ｔレジスタ１４は、論理回路１３の結果を格納する条件レジスタである。図では省略しているが、Ａレジスタ１１から、自身のレジスタファイル６および隣接ＰＥのレジスタファイル６および２つ隣のＰＥのレジスタファイル６に書き込むことが可能となっている。 The data from the register file 6 is selected by the PE shift 8 from one of the data from its own register file 6 and the data from the register file 6 of the adjacent PE and the register file 6 of the two adjacent PEs. Data after the PE shift 8 is stored in the pipeline register 9. Next, the data once stored in the pipeline register 9 is calculated by the ALU 10 and stored in the result storage register (A register) 11 which is an accumulator. The Z1 register 12 is a zero flag register that stores 1 when the operation result in the ALU 10 becomes zero. The logic circuit 13 performs a logical operation between the value of the Z1 register 12 of its own PE and the value of the Z1 register 12 of the adjacent PE (right adjacent PE in this embodiment). The T register 14 is a condition register that stores the result of the logic circuit 13. Although not shown in the figure, it is possible to write from the A register 11 to the register file 6 of its own, the register file 6 of the adjacent PE, and the register file 6 of the two adjacent PEs.

次に、従来技術において例に挙げた画像処理を、図１や図２に示した構成で実施した場合を説明する。図３上段に示す画像データは図１７と同じデータである。この画像データの０と１の境界を判定してその結果を格納するまでを説明する。画像データは、各ＰＥのレジスタＲ０に格納されており、判定結果は条件レジスタであるＴレジスタ１４に格納するものとする。 Next, a description will be given of a case where the image processing exemplified in the prior art is performed with the configuration shown in FIGS. The image data shown in the upper part of FIG. 3 is the same data as FIG. The process until the boundary between 0 and 1 of the image data is determined and the result is stored will be described. The image data is stored in the register R0 of each PE, and the determination result is stored in the T register 14 which is a condition register.

このような動作は、図４に示す複数のＰＥ命令によって実施される。まず命令（１）で、各ＰＥが自身のレジスタＲ０の値を即値０と比較し、そのときＡＬＵ１０での減算結果がゼロとなる場合にＺ１レジスタ１２に１が格納される。そして、命令（２）で、自ＰＥのＺ１レジスタ１２と右隣のＰＥのＺ１レジスタの排他的論理和を論理回路１３で演算し、その結果を自ＰＥのＴレジスタ１４に格納する。これにより、０と１の境界の判定結果を得ることができた（図３下段）。つまり、本実施形態では命令（２）が条件付き命令に該当し、命令実行時に隣接するＰＥの演算結果フラグを直接参照している。そして、本実施形態では、２マシンサイクルで画像データの０と１の境界を判定し結果を格納することができる。 Such an operation is performed by a plurality of PE instructions shown in FIG. First, in the instruction (1), each PE compares the value of its own register R0 with the immediate value 0, and when the subtraction result in the ALU 10 becomes zero, 1 is stored in the Z1 register 12. Then, with the instruction (2), an exclusive OR of the Z1 register 12 of the own PE and the Z1 register of the right adjacent PE is calculated by the logic circuit 13, and the result is stored in the T register 14 of the own PE. Thereby, the determination result of the boundary between 0 and 1 could be obtained (the lower part of FIG. 3). That is, in this embodiment, the instruction (2) corresponds to a conditional instruction, and directly refers to the operation result flag of the adjacent PE when the instruction is executed. In this embodiment, the boundary between 0 and 1 of image data can be determined and stored in two machine cycles.

なお、本実施形態では図２に示したように右隣のＰＥのＺ１レジスタを参照していたが、左隣を参照してもよい。或いは参照する方向を左右切り替えられるようにしてもよい。 In this embodiment, the Z1 register of the right adjacent PE is referred to as shown in FIG. 2, but the left adjacent may be referred to. Or you may enable it to switch the direction referred to right and left.

本実施例によれば、ＰＥ部２で条件付き命令を実行する際に、条件として隣接するＰＥのＺ１レジスタ１２を参照するための論理回路１３が設けられているので、隣接する左右のＺ１レジスタ１２を参照することが可能となり、隣接するＰＥのＺ１レジスタ１２を用いて０と１の境界を検出する処理において、従来の３マシンサイクルから２マシンサイクルへ１マシンサイクル削減することができる。 According to the present embodiment, when the conditional instruction is executed in the PE unit 2, the logic circuit 13 is provided for referring to the Z1 register 12 of the adjacent PE as a condition. 12 can be referred to, and in the process of detecting the boundary between 0 and 1 using the Z1 register 12 of the adjacent PE, one machine cycle can be reduced from the conventional three machine cycles to two machine cycles.

［第２実施形態］
次に、本発明の第２の実施形態を図５ないし図９を参照して説明する。なお、前述した第１の実施形態と同一部分には、同一符号を付して説明を省略する。図５は、本発明の第２の実施形態にかかるＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。図６は、最大値を求める画像データとそれらの画像データを比較して得られるフラグを示した説明図である。図７は、図５に示したＳＩＭＤ型マイクロプロセッサで自身とその両隣の画像データとの３画素中での最大値を求めるプログラムである。図８は、フラグレジスタの組み合わせによる選択されるＡレジスタを示した真理値表である。図９は、図５に示されたプロセッサエレメント部の論理回路の回路図及び真理値表である。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIGS. Note that the same parts as those of the first embodiment described above are denoted by the same reference numerals and description thereof is omitted. FIG. 5 is a configuration diagram excerpting a part of the configuration in the processor element section of the SIMD type microprocessor according to the second embodiment of the present invention. FIG. 6 is an explanatory diagram showing image data for obtaining the maximum value and a flag obtained by comparing the image data. FIG. 7 shows a program for obtaining the maximum value in three pixels of the SIMD type microprocessor shown in FIG. FIG. 8 is a truth table showing an A register selected by a combination of flag registers. FIG. 9 is a circuit diagram and a truth table of the logic circuit of the processor element unit shown in FIG.

本実施形態では、第１の実施形態に対して、Ｚ１レジスタ１２と、論理回路１３と、Ｔレジスタ１４を削除して、選択手段としてのセレクタ１５と、現在の命令による演算結果フラグとしてのＣ１レジスタ１６と、一つ前の命令による演算結果フラグとしてのＣ２レジスタ１７と、参照手段としての論理回路１８と、が追加されている。なお、Ｚ１レジスタ１２と、論理回路１３と、Ｔレジスタ１４は削除せずに残しても良い。 In the present embodiment, compared to the first embodiment, the Z1 register 12, the logic circuit 13, and the T register 14 are deleted, and a selector 15 as selection means and C1 as an operation result flag by the current instruction are deleted. A register 16, a C2 register 17 as an operation result flag by the previous instruction, and a logic circuit 18 as a reference means are added. The Z1 register 12, the logic circuit 13, and the T register 14 may be left without being deleted.

本実施形態では、Ａレジスタ１１の手前にセレクタ１５があり、この入力として、自身のＡＬＵ１０の演算結果（自身のＡレジスタ１１の値）の他に、左右両隣のＰＥのＡレジスタ１１の値を選択することが可能となっている。即ち、自プロセッサエレメントの演算結果格納手段に格納されている演算データと、隣接する前記プロセッサエレメントの演算結果格納手段に格納されている演算データと、を選択して自プロセッサエレメントの演算結果格納手段に格納している。 In the present embodiment, there is a selector 15 in front of the A register 11, and as the input, in addition to the calculation result of its own ALU 10 (the value of its own A register 11), the value of the A register 11 of the left and right PEs It is possible to select. That is, the operation data stored in the operation result storage means of the own processor element and the operation data stored in the operation result storage means of the adjacent processor element are selected and the operation result storage means of the own processor element is selected. Is stored.

Ｃ１レジスタ１６は、ＡＬＵ１０での大小比較演算結果を示す大小比較演算結果フラグレジスタである。Ｃ２レジスタ１７は、一つ前の命令による大小比較演算結果を示す大小比較演算結果フラグレジスタである。論理回路１８は、自ＰＥのＣ２レジスタ１７の値と、隣接するＰＥの（本実施形態では右隣のＰＥ）のＣ１レジスタ１６およびＣ２レジスタ１７の値との論理演算を行う。即ち、自プロセッサエレメントの一つ前の命令による演算結果フラグと、隣接するプロセッサエレメントの現在の命令による演算結果フラグと、一つ前の命令による演算結果フラグと、を参照している。 The C1 register 16 is a size comparison calculation result flag register indicating the size comparison calculation result in the ALU 10. The C2 register 17 is a size comparison calculation result flag register indicating the size comparison calculation result of the previous instruction. The logic circuit 18 performs a logical operation on the value of the C2 register 17 of its own PE and the values of the C1 register 16 and the C2 register 17 of the adjacent PE (right adjacent PE in this embodiment). That is, the operation result flag by the instruction immediately before the own processor element, the operation result flag by the current instruction of the adjacent processor element, and the operation result flag by the previous instruction are referred to.

次に、図５に示したような構成になっているＳＩＭＤ型マイクロプロセッサ１で、例えば図６上段に示す画像データがあり、このうちＰＥ４を対象画像データとして、その両隣の画像データとの３画素中での最大値を求める場合の動作を説明する。画像データは、符号なしの値と考え、各ＰＥのレジスタＲ０に格納されているとする。そのとき、最大値は図７に示す命令を実施することで求められる。まず命令（１）で、各ＰＥのＡレジスタ１１にレジスタＲ０の画像データを格納する。次に命令（２）で比較演算を実施し、フラグを更新している。この命令によって、各ＰＥのＡレジスタ１１のデータとその左隣（ＰＥ番号の小さい方）のレジスタＲ０のデータと比較する。このとき、「Ａレジスタ１１のデータ＜その左隣のレジスタＲ０のデータ」が成立するならば、その結果Ｃ１レジスタ１６に１が格納される。これはＡＬＵ演算時のボローフラグがＣ１レジスタ１６に入ることと同じである。「Ａレジスタ１１のデータ＜その左隣のレジスタＲ０のデータ」が成立しないならば、Ｃ１レジスタ１６に０が格納される。 Next, in the SIMD type microprocessor 1 configured as shown in FIG. 5, there is image data shown in the upper part of FIG. 6, for example, of which PE4 is the target image data and 3 of the image data on both sides thereof. The operation for obtaining the maximum value in the pixel will be described. The image data is considered as an unsigned value and is assumed to be stored in the register R0 of each PE. At that time, the maximum value is obtained by executing the command shown in FIG. First, the image data of the register R0 is stored in the A register 11 of each PE by the instruction (1). Next, a comparison operation is performed by the instruction (2), and the flag is updated. By this instruction, the data in the A register 11 of each PE is compared with the data in the register R0 on the left side (the smaller PE number). At this time, if “data in the A register 11 <data in the register R0 adjacent to the left” is established, 1 is stored in the C1 register 16 as a result. This is the same as the borrow flag entered in the C1 register 16 during the ALU operation. If “data in A register 11 <data in register R0 adjacent to the left” does not hold, 0 is stored in C1 register 16.

次に、命令（３）でも比較演算を実施し、フラグを更新している。この命令によって、各ＰＥのＡレジスタ１１のデータとその２つ左隣（ＰＥ番号の小さい方）のレジスタＲ０のデータと比較する。このときも命令（２）と同様に、「Ａレジスタ１１のデータ＜その２つ左隣のレジスタＲ０のデータ」が成立するならば、その結果Ｃ１レジスタ１６に１が格納される。「Ａレジスタ１１のデータ＜その２つ左隣のレジスタＲ０のデータ」が成立しないならばＣ１レジスタ１６に０が格納される。同時にＣ１レジスタ１６からＣ２レジスタ１７へ前回命令（演算）のＣ１レジスタ１６の結果をＣ２レジスタ１７に退避する。命令（２）と（３）により、Ｃ１レジスタ１６とＣ２レジスタ１７には、それぞれ、２つ左隣のレジスタＲ０のデータとの比較結果フラグと、左隣のレジスタＲ０のデータとの比較結果フラグが格納された。最後に命令（４）では、対象ＰＥのＡレジスタ１１を、自ＰＥもしくは、左右のＰＥの３つの中の最大値であるＡレジスタ１１のデータで更新する。このとき、最大値は、隣接ＰＥのＣ１レジスタ１６とＣ２レジスタ１７の結果を論理回路１８が参照して決定する。この論理の真理値表を図８に、論理回路１８の回路図を図９に示す。つまり、本実施形態では命令（４）が条件付き命令に該当し、命令実行時に演算結果フラグを参照している。 Next, the comparison operation is also performed in the instruction (3), and the flag is updated. By this instruction, the data of the A register 11 of each PE is compared with the data of the register R0 adjacent to the left of the two (the smaller PE number). At this time, as in the case of the instruction (2), if “data in register A <data in register R0 adjacent to the two left” is satisfied, 1 is stored in register C1 as a result. If “the data in the A register 11 <the data in the register R0 adjacent to the left of the two” does not hold, 0 is stored in the C1 register 16. At the same time, the result of the previous instruction (calculation) in the C1 register 16 is saved in the C2 register 17 from the C1 register 16 to the C2 register 17. By the instructions (2) and (3), the C1 register 16 and the C2 register 17 are respectively compared with the comparison result flag of the data in the register R0 on the left by two and the comparison result flag of the data in the register R0 on the left Was stored. Finally, in the instruction (4), the A register 11 of the target PE is updated with the data of the A register 11 which is the maximum value among the three of the own PE or the left and right PEs. At this time, the maximum value is determined by the logic circuit 18 referring to the results of the C1 register 16 and the C2 register 17 of the adjacent PE. A logic truth table is shown in FIG. 8, and a circuit diagram of the logic circuit 18 is shown in FIG. That is, in this embodiment, the instruction (4) corresponds to a conditional instruction, and the operation result flag is referred to when the instruction is executed.

図８は上述したようにＰＥ４を対象画像データ（自ＰＥ）とした場合の真理値表である。Ａ選択とはＡレジスタのうちどれが選択されるかを示している。状態無しとは、その状態がとり得ないことを示している。例えば、Ｃ２（ＰＥ４）が０、Ｃ２（ＰＥ５）が０、Ｃ１（ＰＥ５）が１の組み合わせは、ＰＥ３＜ＰＥ４＜ＰＥ５かつＰＥ３＞ＰＥ５であることを示しており、このような状態はとり得ない状態であることが分かる。また、図９の回路において、出力ＴＸは勿論セレクタ１５の選択制御信号である。 FIG. 8 is a truth table when PE4 is the target image data (own PE) as described above. A selection indicates which of the A registers is selected. “No state” indicates that the state cannot be taken. For example, the combination of C2 (PE4) is 0, C2 (PE5) is 0, and C1 (PE5) is 1 indicates that PE3 <PE4 <PE5 and PE3> PE5. It can be seen that there is no state. In the circuit of FIG. 9, the output TX is of course the selection control signal of the selector 15.

本実施形態で図６の画像データを図７のプログラムで処理した場合、左隣のＰＥ３のデータが最大値と求まる。そして、Ａレジスタ１１に画像データを設定してから、命令（２）、命令（３）、命令（４）の合わせて３マシンサイクルで、３つのデータの最大値を求めることが可能となる。 In the present embodiment, when the image data of FIG. 6 is processed by the program of FIG. 7, the data of the PE3 adjacent to the left is obtained as the maximum value. Then, after setting the image data in the A register 11, the maximum value of the three data can be obtained in three machine cycles in total including the instruction (2), the instruction (3), and the instruction (4).

なお、Ｃ１レジスタ１６とＣ２レジスタ１７は、このようにパイプライン構成に限られることは無く、Ｃ２レジスタに一つ前の演算結果によるフラグを格納できれば自由に構成してよい。 The C1 register 16 and the C2 register 17 are not limited to the pipeline configuration as described above, and may be freely configured as long as the flag based on the previous calculation result can be stored in the C2 register.

本実施形態によれば、右隣のＰＥのＣ１レジスタ１６の値と、Ｃ２レジスタ１７の値と、自ＰＥのＣ２レジスタ１７の値と、を参照して、セレクタ１５を制御しているので、自ＰＥのＡレジスタ１１と、隣接ＰＥのＡレジスタ１１とのうちの最大値を求める動作を従来の４マシンサイクルから３マシンサイクルへ１マシンサイクル削減することができる。 According to the present embodiment, the selector 15 is controlled with reference to the value of the C1 register 16 of the right adjacent PE, the value of the C2 register 17, and the value of the C2 register 17 of the own PE. The operation for obtaining the maximum value of the A register 11 of the own PE and the A register 11 of the adjacent PE can be reduced by 1 machine cycle from the conventional 4 machine cycles to 3 machine cycles.

また、上述した実施形態では右隣のＰＥのＣ１レジスタおよびＣ２レジスタ（演算結果フラグ）のみを参照していたが、左隣のＰＥのＣ１レジスタおよびＣ２レジスタを参照してもよい。また、左右のＰＥのＣ１レジスタまたはＣ２レジスタを参照しても実現可能である。例えば、上述した実施形態で命令（２）を左隣のＰＥのレジスタＲ０と比較するのではなく右隣のＰＥのレジスタＲ０と比較するように変更する。すると、最大値を求める際に必要な演算結果フラグが左隣のＰＥのＣ２レジスタと自ＰＥのＣ２レジスタと右隣のＰＥのＣ１レジスタとなる。このようにしても上述した実施形態と同様に最大値を求めることができる。 In the above-described embodiment, only the C1 register and C2 register (operation result flag) of the right adjacent PE are referred to. However, the C1 register and C2 register of the left adjacent PE may be referred to. It can also be realized by referring to the C1 register or C2 register of the left and right PEs. For example, in the embodiment described above, the instruction (2) is changed not to be compared with the register R0 of the right adjacent PE but to the register R0 of the right adjacent PE. Then, the calculation result flags necessary for obtaining the maximum value are the C2 register of the PE adjacent to the left, the C2 register of the self PE, and the C1 register of the PE adjacent to the right. In this way, the maximum value can be obtained in the same manner as in the above-described embodiment.

［第３実施形態］
次に、本発明の第３の実施形態を図１０ないし図１４を参照して説明する。なお、前述した第１、第２の実施形態と同一部分には、同一符号を付して説明を省略する。図１０は、本発明の第３の実施形態にかかるＳＩＭＤ型マイクロプロセッサのプロセッサエレメント部内の構成の一部を抜粋した構成図である。図１１は、最大値を求める画像データとそれらの画像データを比較して得られるフラグを示した説明図である。図１２は、図１０に示したＳＩＭＤ型マイクロプロセッサで自身とその両隣の画像データとの３画素中での最大値を求めるプログラムである。図１３は、フラグレジスタの組み合わせによる選択されるＡレジスタを示した真理値表である。図１４は、図１０に示されたプロセッサエレメント部の論理回路の回路図及び真理値表である。 [Third Embodiment]
Next, a third embodiment of the present invention will be described with reference to FIGS. The same parts as those in the first and second embodiments described above are denoted by the same reference numerals and description thereof is omitted. FIG. 10 is a configuration diagram excerpting a part of the configuration in the processor element section of the SIMD type microprocessor according to the third embodiment of the present invention. FIG. 11 is an explanatory diagram showing image data for obtaining the maximum value and a flag obtained by comparing the image data. FIG. 12 is a program for obtaining the maximum value in three pixels of the SIMD type microprocessor shown in FIG. 10 and the image data on both sides of the SIMD microprocessor. FIG. 13 is a truth table showing the A register selected by the combination of flag registers. FIG. 14 is a circuit diagram and a truth table of the logic circuit of the processor element unit shown in FIG.

本実施形態では、第２の実施形態に対して、参照手段としての論理回路１８´に入力される自ＰＥ側のレジスタがＣ２レジスタ１７からＣ１レジスタ１６に変更されている点が異なる。即ち、自プロセッサエレメントの現在の命令による演算結果フラグと、隣接するプロセッサエレメントの現在の命令による演算結果フラグと、一つ前の命令による演算結果フラグと、を参照している。 This embodiment is different from the second embodiment in that the register on the PE side that is input to the logic circuit 18 ′ serving as a reference unit is changed from the C2 register 17 to the C1 register 16. That is, the calculation result flag by the current instruction of the processor element, the calculation result flag by the current instruction of the adjacent processor element, and the calculation result flag by the previous instruction are referred to.

次に、図１０に示したような構成になっているＳＩＭＤ型マイクロプロセッサ１で、例えば図１１上段に示す画像データがあり、このうちＰＥ４を対象画像データとして、その両隣の画像データとの３画素中での最大値を求める場合の動作を説明する。画像データは、符号なしの値と考え、各ＰＥのレジスタＲ０に格納されているとする。そのとき、最大値は図１２に示す命令を実施することで求められる。図１２の命令列における図７との違いは、命令（２）と命令（３）の命令順が入れ替わったことのみである。まず命令（１）で、各ＰＥのＡレジスタ１１にレジスタＲ０の画像データを格納する。次に命令（２）で比較演算を実施し、フラグを更新している。この命令によって、各ＰＥのＡレジスタ１１のデータとその２つ左隣（ＰＥ番号の小さい方）のレジスタＲ０のデータと比較する。このとき、「Ａレジスタ１１のデータ＜その２つ左隣のレジスタＲ０のデータ」が成立するならば、その結果Ｃ１レジスタ１６に１が格納される。これはＡＬＵ演算時のボローフラグがＣ１レジスタ１６に入ることと同じである。「Ａレジスタ１１のデータ＜その２つ左隣のレジスタＲ０のデータ」が成立しないならば、Ｃ１レジスタ１６に０が格納される。 Next, in the SIMD type microprocessor 1 configured as shown in FIG. 10, there is, for example, the image data shown in the upper part of FIG. The operation for obtaining the maximum value in the pixel will be described. The image data is considered as an unsigned value and is assumed to be stored in the register R0 of each PE. At that time, the maximum value is obtained by executing the command shown in FIG. The only difference from FIG. 7 in the instruction sequence of FIG. 12 is that the instruction order of the instruction (2) and the instruction (3) is switched. First, the image data of the register R0 is stored in the A register 11 of each PE by the instruction (1). Next, a comparison operation is performed by the instruction (2), and the flag is updated. By this instruction, the data of the A register 11 of each PE is compared with the data of the register R0 adjacent to the left of the two (the smaller PE number). At this time, if “the data of the A register 11 <the data of the register R0 adjacent to the left of the two” is established, 1 is stored in the C1 register 16 as a result. This is the same as the borrow flag entered in the C1 register 16 during the ALU operation. If “data in register A <data in register R0 adjacent to the left of the two” does not hold, 0 is stored in C1 register 16.

次に、命令（３）でも比較演算を実施し、フラグを更新している。この命令によって、各ＰＥのＡレジスタ１１のデータとその左隣（ＰＥ番号の小さい方）のレジスタＲ０のデータと比較する。このときも命令（２）と同様に、「Ａレジスタ１１のデータ＜その左隣のレジスタＲ０のデータ」が成立するならば、その結果Ｃ１レジスタ１６に１が格納される。「Ａレジスタ１１のデータ＜その左隣のレジスタＲ０のデータ」が成立しないならばＣ１レジスタ１６に０が格納される。同時にＣ１レジスタ１６からＣ２レジスタ１７へ前回命令（演算）のＣ１レジスタ１６の結果をＣ２レジスタ１７に退避する。命令（２）と（３）により、Ｃ１レジスタとＣ２レジスタには、それぞれ、左隣のレジスタＲ０のデータとの比較結果フラグと、２つ左隣のレジスタＲ０のデータとの比較結果フラグが格納された。最後に命令（４）では、対象ＰＥのＡレジスタ１１を、自ＰＥもしくは、左右のＰＥの３つの中の最大値であるＡレジスタ１１のデータで更新する。このとき、最大値は、隣接ＰＥのＣ１レジスタとＣ２レジスタの結果を論理回路１８´が参照して決定する。この論理の真理値表を図１３に、論理回路１８の回路図を図１４に示す。 Next, the comparison operation is also performed in the instruction (3), and the flag is updated. By this instruction, the data in the A register 11 of each PE is compared with the data in the register R0 on the left side (the smaller PE number). At this time, similarly to the instruction (2), if “data in the A register 11 <data in the register R0 adjacent to the left” is established, 1 is stored in the C1 register 16 as a result. If “the data of the A register 11 <the data of the register R0 adjacent to the left” does not hold, 0 is stored in the C1 register 16. At the same time, the result of the previous instruction (calculation) in the C1 register 16 is saved in the C2 register 17 from the C1 register 16 to the C2 register 17. By the instructions (2) and (3), the comparison result flag with the data in the left adjacent register R0 and the comparison result flag with the data in the two left adjacent registers R0 are stored in the C1 register and the C2 register, respectively. It was done. Finally, in the instruction (4), the A register 11 of the target PE is updated with the data of the A register 11 which is the maximum value among the three of the own PE or the left and right PEs. At this time, the maximum value is determined by the logic circuit 18 'referring to the results of the C1 register and C2 register of the adjacent PE. The logic truth table is shown in FIG. 13, and the circuit diagram of the logic circuit 18 is shown in FIG.

本実施形態で図１１の画像データを図１２のプログラムで処理した場合、左隣のＰＥ３のデータが最大値と求まる。そして、Ａレジスタ１１に画像データを設定してから、命令（２）、命令（３）、命令（４）の合わせて３マシンサイクルで、３つのデータの最大値を求めることが可能となる。 In the present embodiment, when the image data of FIG. 11 is processed by the program of FIG. 12, the data of the PE3 on the left is determined as the maximum value. Then, after setting the image data in the A register 11, the maximum value of the three data can be obtained in three machine cycles in total including the instruction (2), the instruction (3), and the instruction (4).

本実施形態によれば、右隣のＰＥのＣ１レジスタ１６の値と、Ｃ２レジスタ１７の値と、自ＰＥのＣ１レジスタ１６の値と、を参照して、セレクタ１５を制御しているので、自ＰＥのＡレジスタ１１と、隣接ＰＥのＡレジスタ１１とのうちの最大値を求める動作を従来の４マシンサイクルから３マシンサイクルへ１マシンサイクル削減することができる。 According to the present embodiment, the selector 15 is controlled with reference to the value of the C1 register 16 of the right adjacent PE, the value of the C2 register 17, and the value of the C1 register 16 of the own PE. The operation for obtaining the maximum value of the A register 11 of the own PE and the A register 11 of the adjacent PE can be reduced by 1 machine cycle from the conventional 4 machine cycles to 3 machine cycles.

また、上述した第２、第３の実施形態では、自ＰＥと近傍のＰＥ（自ＰＥを中心として左右２ＰＥ）との５つのデータの中の最大値に関しても、上述した動作を２度繰り返し実行すればよく、６マシンサイクルで可能である。また、最小値を求めたい場合も同様に実施できる。 In the second and third embodiments described above, the above-described operation is repeated twice for the maximum value among the five data of the own PE and neighboring PEs (left and right 2PE centered on the own PE). This can be done in 6 machine cycles. Also, the same can be done when it is desired to obtain the minimum value.

なお、本発明は上記実施形態に限定されるものではない。即ち、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。 The present invention is not limited to the above embodiment. That is, various modifications can be made without departing from the scope of the present invention.

１ＳＩＭＤ型マイクロプロセッサ
２プロセッサエレメント部
３グローバルプロセッサ
６レジスタファイル（データ格納手段）
７演算部
１０ＡＬＵ（演算手段）
１１結果格納レジスタ（演算結果格納手段）
１２Ｚ１レジスタ（演算結果フラグ）
１３論理回路（参照手段）
１４Ｔレジスタ
１５セレクタ（選択手段）
１６Ｃ１レジスタ（演算結果フラグ）
１７Ｃ２レジスタ（一つ前の命令による演算結果フラグ）
１８論理回路（参照手段）
１８´ 論理回路（参照手段） 1 SIMD type microprocessor 2 Processor element section 3 Global processor 6 Register file (data storage means)
7 arithmetic unit 10 ALU (arithmetic means)
11 Result storage register (operation result storage means)
12 Z1 register (operation result flag)
13 logic circuit (reference means)
14 T register 15 Selector (selection means)
16 C1 register (operation result flag)
17 C2 register (operation result flag by the previous instruction)
18 logic circuit (reference means)
18 'logic circuit (reference means)

特開２００２−２２９９６２号公報Japanese Patent Application Laid-Open No. 2002-229962

Claims

A data storage means, a calculation means, a calculation result storage means, a processor element section comprising a plurality of processor elements provided with calculation result flags, and a global processor for decoding a program and supplying a control signal to the processor element section In a SIMD type microprocessor comprising
A SIMD type microprocessor characterized in that, when the processor element executes a conditional instruction, reference means for referring to the operation result flag of the adjacent processor element as a condition is provided.

As a result of the processor element referring to the reference means, calculation data stored in its calculation result storage means and calculation data stored in the calculation result storage means of the adjacent processor element, 2. The SIMD type microprocessor according to claim 1, further comprising selection means for selecting and storing the calculation result storage means in its own calculation result storage means.

The processor element includes, as the operation result flag, an operation result flag based on a current instruction, and an operation result flag based on a previous instruction,
The reference means includes an operation result flag based on the current instruction of the own processor element, an operation result flag based on the previous instruction, an operation result flag based on the current instruction of the adjacent processor element, and the one 3. The SIMD micro of claim 1, wherein the selection unit selects operation data with reference to at least three or more operation result flags among the operation result flags of the previous instruction. Processor.

The reference means includes either an operation result flag based on the current instruction of the own processor element or an operation result flag based on the previous instruction, and an operation result flag based on the current instruction of the adjacent processor element. 4. The SIMD type microprocessor according to claim 3, wherein an operation result flag by the immediately preceding instruction is referred to.