JP3804591B2

JP3804591B2 - Arithmetic processing unit

Info

Publication number: JP3804591B2
Application number: JP2002206940A
Authority: JP
Inventors: 龍一祖田; 喜孝柏木
Original assignee: Yaskawa Electric Corp
Current assignee: Yaskawa Electric Corp
Priority date: 2002-07-16
Filing date: 2002-07-16
Publication date: 2006-08-02
Anticipated expiration: 2022-07-16
Also published as: JP2004054335A

Description

【０００１】
【発明の属する技術分野】
本発明は、マイクロプロセッサやディジタルシグナルプロセッサ（ＤＳＰ）を利用した数値の演算処理を行う演算処理装置に関するものである。
【０００２】
【従来の技術】
従来のマイクロプロセッサやディジタルシグナルプロセッサ（ＤＳＰ）を利用した数値の演算処理を行う演算処理装置は図６に示すようになっている。図６は従来例を示す演算処理装置のブロック図である。
図６において、１は加減乗除、論理演算および数値比較などを行う算術演算ユニット、２は算術演算ユニット１へ入力される値を保持する入力メモリで、ＡとＢの二つで構成される。３は演算結果を保持する出力メモリＲ、４は算術演算ユニット１にて処理される機能を切りかえるためのオペコード設定メモリである。
ここで、加算処理を行う際を考えてみると、まず、加算を実行するためのオペコードをセットし、入力値としてメモリＡに１０、メモリＢに２０の数値をセットした場合、算術演算ユニット１からの出力値としては、(１０＋２０＝)３０を得ることになる。
この例は、簡単な演算処理であるが、一般的な演算処理を行うシステムでは、複雑な演算が数多く処理されており、算術演算ユニット１にて演算した結果を入力メモリ２にフィードバックし、１つの算術演算ユニットを何度も利用してシステム全体の演算処理を行っている。
また、最近では画像処理のように複雑な演算を行うために、画像処理プロセッサとして算術演算ユニットの中に複数の機能モジュールを配置したものが提案されている(例えば,特表２０００−５０３４２７号)。
【０００３】
【発明が解決しようとする課題】
ところが、従来技術のような算術演算ユニットを利用して演算処理を行う場合、演算の高速化を実現するためには、マイクロプロセッサやＤＳＰの動作周波数を高くして対応することが一般的である。これは、チップの発熱や放射ノイズの原因となり、製品の信頼性を低減させる要因となる。
また、算術演算ユニットを並列に配置することで、高速化を実現する手段もある。特表２０００−５０３４２７号の画像処理プロセッサでは、並列演算により最終的には１つの演算処理結果を出力することが可能となっているが、複数の演算処理結果を出力できないため並列演算性を生かした処理が実現できていない。
さらに、従来技術のマイクロプロセッサやＤＳＰを、複数のタスクからなるリアルタイム処理システムに使用した場合、１つの算術演算ユニットを複数の異なる処理のために切り替えて演算を行うため、一定時間ごとに演算結果を出力することが困難となっている(演算処理時間は一定ではない)。つまり、現状ではリアルタイム処理専用の演算ユニットは存在していない。
【０００４】
本発明は上記課題を解決するためになされたものであり、動作周波数を高くすることなく、低い動作周波数でも演算性能を向上することができる、リアルタイム制御演算処理に最適な演算処理装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
上記問題を解決するため、請求項１記載の本発明は,、複数の入力データを保持するための入力メモリ(２)と、加減乗除、論理演算および数値比較を行うための複数の算術演算ユニット(１)と、前記算術演算ユニットにて実行する機能を決定するための複数のオペコード設定メモリ(４)と、前記算術演算ユニットからの演算結果を保持する出力メモリ(３)とを備えた数値の演算処理を行う演算処理装置において、前記算術演算ユニット(１)にて実行された演算結果を用いてシフト処理する複数のバレルシフタ回路(５)と、該演算結果のシフト量を決定する設定メモリ(９)と、
前記バレルシフタ回路(５)で得られた複数の演算結果を切替えて出力するマルチプレクサ(７)と、前記マルチプレクサからの出力値を選択する演算機能コード設定メモリ(８)と、備えたものである。
請求項１記載の演算処理装置によれば、内蔵する各算術演算ユニットのオペコード設定および、マルチプレクサ出力値を選択する演算機能コードを利用することで、本演算ユニットにて処理できる演算機能を数多く実現でき、一度に複雑な演算処理も可能となる。特に、モータ制御システムのような演算処理にて、頻繁に利用される演算パターンを本演算ユニットとして構成することで、従来実現が困難であった高度な演算処理も可能となる。また、演算ユニットの内部構成を並列して演算処理できるようにすることで、更なる処理性能の向上につながる。さらに、本演算ユニットでは、演算処理に要するクロック数が毎回同一クロック数であるため、処理時間の予測が可能となり、決められた時間内に演算処理を完了しなければならないリアルタイム制御に最適となる。すなわち、演算ユニットの動作周波数を高くしなくても、演算処理性能の向上とリアルタイム性を実現し、チップの発熱の抑制および放射ノイズの防止につながることができる。
【０００６】
請求項２記載の本発明は,請求項１記載の演算処理装置において、リミット演算処理を行うためのコンパレータ(１０)と、上限および下限リミット設定値を保存するためのメモリ(１１)と、前記コンパレータ機能を有効とするフラグメモリ(１２)を備えたものである。
請求項２記載の演算処理装置によれば、最終的な演算結果の値にリミット処理を行いたい場合、上限と下限のリミット処理を一度に実行できるため、処理の高速化につながる。また、リミット処理を行わない場合は、有効フラグを無効にしておくことで、コンパレータへの入力データをスルー状態とする。これにより、リミット処理の演算実行あり／なしに関わらず、一定の演算クロック数にて処理を完了させることができ、リアルタイム性も維持できることになる。
【０００７】
請求項３記載の本発明は, 請求項１または２に記載の演算処理装置において、前記算術演算ユニット(１)は、加算結果にキャリーが発生する場合には１を出力し、ボローが発生する場合には−１を出力する加算器(１３)と、この加算器の機能を有効とするためのフラグメモリ(１４)を内蔵することを特徴とする。
請求項３記載の演算処理装置によれば、従来では条件判断(if文などによる処理)が必要であった、キャリーとボローを利用した演算処理において、本加算器のキャリーおよびボロー処理回路を有効にフラグ設定するのみで演算を実行できることになる。つまり、数クロックを要する条件判断処理が１クロックで実現可能となり、演算処理性能を向上させることができる。
【０００８】
請求項４記載の本発明は、請求項１〜３に記載の演算処理装置において、前記演算処理装置を複数個並列に配置したものである。
請求項４記載の演算処理装置によれば、請求項１から３記載の演算ユニットにて処理される演算を、複数の演算ユニットにて並列処理させることで、更なる演算の高速化を実現できる。また、多軸モータ制御システムのように、同様の演算処理を並行して実行する場合、複数個並列に配置した演算ユニットの効果が顕著となる。すなわち、本特許の目的とする演算ユニットの動作周波数を高くしなくても、演算処理性能の向上とリアルタイム性を実現し、チップの発熱の抑制および放射ノイズの防止につなげることができる。
【０００９】
【発明の実施の形態】
以下、本発明の実施例を図に基づいて説明する。
図１は本発明の第1実施例を示す演算処理装置のブロック図である。
１は算術演算ユニット、２は入力メモリ、３は出力メモリ、４はオペコード設定メモリ、５はバレルシフタ回路、６は回路接続網、７はマルチプレクサ、８は演算機能コード設定メモリ、９はシフト量設定メモリである。なお、構成要素のうち、算術演算ユニット１と、入力メモリ２と、出力メモリ３と、オペコード設定メモリ４については本装置に複数設けた構成以外は従来と同じであるため、その説明を省略する。
【００１０】
本発明の特徴は以下のとおりである。
すなわち、演算処理装置は、算術演算ユニット１にて実行された演算結果を用いてシフト処理する複数のバレルシフタ回路５と、該演算結果のシフト量を決定する設定メモ９と、バレルシフタ回路５で得られた複数の演算結果を切替えて出力するマルチプレクサ７と、マルチプレクサ７からの出力値を選択する演算機能コード設定メモリ８を備えた点である。
【００１１】
次に動作について説明する。
演算処理措置は、複数の入力データを保持するための入力メモリ２として、ＩＮ１、ＩＮ２‥ＩＮｍが配置されており、加減乗除、論理演算および数値比較を行うための複数の算術演算ユニット１（ＡＬＵ１、ＡＬＵ２、‥ＡＬＵｎ）に接続されている。算術演算ユニットＡＬＵ１には、ＩＮ１とＩＮ２のデータが入力され、算術演算ユニットＡＬＵ２には、ＩＮ３とＩＮ４のデータが入力されている。各算術演算ユニットにて実行する機能を決定するオペコード設定メモリ４は、算術演算ユニットの数だけ配置されており、前記ＡＬＵ１とＡＬＵ２にはオペコード１とオペコード２が対応する。各算術演算ユニット１にて実行された演算結果は、左右にシフト処理するバレルシフタ回路５に接続され、シフト量を決定する設定メモリ９に格納された数値によりシフト処理される。図ではＡＬＵ１の演算結果はＢＲＬ１に、ＡＬＵ２の演算結果はＢＲＬ２に接続されており、シフト量はＳｈｉｆｔ１とＳｈｉｆｔ２のメモリ値により決定される。ＢＲＬ１とＢＲＬ２からの出力値は、ＡＬＵ３に入力され、オペコード３に対応する演算処理が実行され、その演算結果はＢＲＬ３へ入力される。一方、ＢＲＬ１とＢＲＬ２およびＢＲＬ３の出力値、さらにＩＮｍの入力データは、算術演算ユニット１とバレルシフタ回路５を複数含む回路接続網６に接続され、その他の演算処理が可能となっている。回路接続網６からの出力値は、複数の演算結果を切り替えて出力するマルチプレクサ７に接続される場合もあれば、また更なる算術演算ユニットＡＬＵｎに入力され、オペコードｎに対応する演算処理を実行し、その演算結果がＢＲＬｎへ入力されシフト処理を行い、最終的にマルチプレクサに入力される場合もある。前記マルチプレクサには、出力値を選択する演算機能コード設定メモリ(８)からのデータが入力され、演算出力値を保持する出力メモリ(３)にデータが格納される。
【００１２】
上記算術演算ユニットにて実行可能な処理例として、以下のようなものがある。ここでは、入力されるデータをＡとＢとした場合について記述する。
・Ａ＋Ｂ (加算)
・Ａ − Ｂ (減算)
・Ａ × Ｂ (乗算)
・Ａ ÷ Ｂ (除算)
・｜Ａ＋Ｂ｜ (加算絶対値)
・Ａ＆Ｂ (論理ＡＮＤ)
・Ａ｜Ｂ (論理ＯＲ)
・Ａ＾Ｂ (論理ＥＸ−ＯＲ)
・Ａ＜Ｂ，Ａ＞Ｂ (数値の大小比較)
・Ａ＝Ｂ，Ａ！＝Ｂ (数値の一致，不一致)
などが、各算術演算ユニットにて実行できる。
【００１３】
よって、上記演算処理装置によれば、次のような演算処理を一度に実行可能とすることができる。説明を簡略化するため、この演算例ではＢＲＬ３の演算結果が回路接続網６にて何も処理されずスルーしてＡＬＵｎに接続され、シフト処理は乗算を実行する場合のみ行っているものとする。
【００１４】
Ｒ１＝（ＩＮ１＋ＩＮ２）
Ｒ２＝（ＩＮ３−ＩＮ４）
Ｒ３＝（（ＩＮ１＋ＩＮ２）×（ＩＮ３−ＩＮ４））＞＞Ｓｈｉｆｔ３
・
・
Ｒｎ＝（（（ＩＮ１＋ＩＮ２）×（ＩＮ３−ＩＮ４））＞＞Ｓｈｉｆｔ３）＋ＩＮｍ
【００１５】
本発明の第１実施例は上記に述べた構成にしたので、内蔵する各算術演算ユニットのオペコード設定、およびマルチプレクサ出力値を選択する演算機能コードを利用することで、本演算処理装置にて処理できる演算機能を数多く実現でき、一度に複雑な演算処理も可能となり、演算性能を向上させることできる。
【００１６】
図２は、本発明の演算処理装置におけるタイミングチャートを示す図であり、図１と合わせて以下に説明する。
図２において、１６はクロック、１７はＳｔａｒｔ信号、１８は演算機能コードデータ、１９はオペコードデータ、２０はシフトデータである。
図２のように、本発明の演算処理装置はクロック１６に同期して動作し、演算ユニットの処理はＳｔａｒｔ信号１７が有効となっている期間に行われる。データ処理の流れに沿って説明すると、まず演算処理装置にて演算を開始する前に、演算機能コードデータを演算機能コード設定メモリ８に、オペコードデータをオペコード設定メモリ４に、シフトデータをシフト量設定メモリ９に設定する。
次に、Ｓｔａｒｔ信号１７を有効とすることで、前記のメモリに設定した演算機能コードデータ１８（ＭＵＸＣｏｄｅ）と、オペコードデータ１９（ＯＰ１〜ＯＰｎ）と、シフトデータ２０（Ｓｈｉｆｔ１〜Ｓｈｉｆｔｎ）が演算ユニットに入力される。
これにより、演算処理装置内部の処理が開始されることになり、最初のクロックサイクルでは、入力メモリ２(ＩＮ１〜ＩＮｍ)に演算を行うためのデータがセットされる。次のクロックサイクルでは、ＡＬＵ１とＢＲＬ１およびＡＬＵ２とＢＲＬ２の演算処理が並行して実行される。引き続くクロックサイクルでは、ＡＬＵ３とＢＲＬ３の演算処理が実行され、その後のクロックサイクルでは算術演算ユニット１とバレルシフタ回路５を複数含む回路接続網６における演算処理、ＡＬＵｎとＢＲＬｎの演算処理が実行されることになる。ここで、回路接続網６に内蔵される算術演算ユニット１とバレルシフタ回路５の個数によって演算処理に必要とされるクロック数は異なることになる。
以上の各算術演算ユニット１とバレルシフタ回路５による演算処理が完了した後、最後のクロックサイクルにて演算機能コードデータの設定値によりマルチプレクサ７から必要な演算結果データを出力メモリにセットすることになる。この出力メモリへのデータセットが完了すると共に、Ｓｔａｒｔ信号は無効となり演算処理装置での処理が完了する。ここで、本演算処理装置による演算処理を複数回実行する場合は、図２による動作を繰り返して行うことになる。
【００１７】
よって、図２に示されているように、１回の演算処理装置での処理に要するクロック数は一定であることから、処理時間の予測が可能となり、決められた時間内に演算処理を完了しなければならないリアルタイム処理システムに最適であるといえる。
【００１８】
次に本発明の第２実施例を説明する。
図３は本発明の第２実施例を示す演算処理装置のブロック図である。
図３において、１０はコンパレータ、１１は上限／下限リミット設定メモリ、１２はコンパレータ機能有効フラグメモリ、２１はコンパレータ入力データＬＩＮである。
第２実施例が第１実施例と異なる点は、リミット演算処理を行うためのコンパレータ１０と、上限および下限リミット設定値を保存するためのメモリ１１と、コンパレータ機能を有効とするフラグメモリ１２を備えたことである。
次に動作説明を簡略化するため、図１との相違点についてのみ説明するものとする。
演算処理装置に設けた回路接続網から出力されるデータは、コンパレータ１０(ＣＭＰ)への入力データ２１（ＬＩＮ）となる。予め、上限／下限リミット設定メモリ１１にセットされたデータにより、ＬＩＮはリミット処理されることになる。コンパレータ１０では、ＬＩＮが上限リミット設定値から下限リミット設定値の範囲内である場合、ＬＩＮをそのまま出力する。しかし、ＬＩＮが上限リミット設定値以上の場合は、上限リミット設定値を出力し、ＬＩＮが下限リミット設定値以下の場合は、下限リミット設定値を出力する。また、このコンパレータ１０でのリミット処理が実行されるのは、コンパレータ機能を有効とするフラグメモリ１２が有効にセットされている場合のみである。コンパレータ機能が無効の場合は、入力データＬＩＮはコンパレータ１０をスルーして出力されることになる。
【００１９】
本発明の第２実施例は上記に述べた構成にしたので、本演算処理装置では、リミット処理を算術演算ユニットの大小比較(＜，＞など)にて実行する場合と異なり、上限と下限のリミット処理を一度に１クロックで実行可能なため、処理の高速化を実現できる。また、リミット処理の演算実行ありなしに関わらず、一定の演算クロック数にて処理を完了させるため、リアルタイム性を維持することができる。
【００２０】
次に本発明の第３実施例を説明する。
図４は本発明の第３実施例を示す算術演算ユニットに内蔵する加算器の構成図である。
図４において、１３は加算器、１４は加算器機能有効フラグメモリ、
第３実施例が第１、第２実施例と異なる点は,算術演算ユニット１が、加算結果にキャリーが発生する場合には１を出力し、ボローが発生する場合には−１を出力する加算器１３と、この加算器の機能を有効とするためのフラグメモリ１４を備えたものである。この有効フラグが無効の場合は、通常の加算器として機能するものである。
【００２１】
次に動作を説明する。
この加算器１３を内蔵する算術演算ユニット１を使用することで、従来の加算器にて加算結果にキャリーまたはボローが発生した場合、各状態を示すフラグを利用し、ソフトウェアにより一致／不一致の演算(＝，！＝など)にて条件判断を行い、キャリーの場合は上位ワードに１を加算し、ボローの場合は１を減算していた処理が削減可能となる。つまり、加算器機能フラグメモリ１４を有効とし、キャリー／ボローどちらの処理でも加算結果と上位ワードを加算すれば良いことになる。
本発明の第３実施例は上記に述べた構成にしたので、数クロックを要していた条件判断処理が１クロックで実現可能となり、演算処理性能を向上させることができる。
【００２２】
次に本発明の第４実施例を説明する。
図５は本発明の第４実施例を示す演算処理装置のブロック図である。
第４実施例が第１、第２実施例と異なる点は第１、第２実施例で示した演算処理装置を複数並列に配置した点である。各演算ユニット内部での処理内容は、既に実施例として記載しているとおりである。しかし、これら演算処理装置を並列に配置することで、本発明の演算ユニットを１個利用した場合、数回に分けて演算を行っていた処理を、並列演算することが可能となる。
【００２３】
本発明の第４実施例は上記に述べた構成にしたので、並列した演算処理を実行するシステムでは、本演算ユニットの並列演算機能を活用することで、大幅な演算性能を向上させることができる。
【００２４】
【発明の効果】
請求項１記載の演算処理装置によれば、内蔵する各算術演算ユニットのオペコード設定および、マルチプレクサ出力値を選択する演算機能コードを利用することで、本演算ユニットにて処理できる演算機能を数多く実現でき、一度に複雑な演算処理も可能となる。特に、モータ制御システムのような演算処理にて、頻繁に利用される演算パターンを本演算ユニットとして構成することで、従来実現が困難であった高度な演算処理も可能となる。また、演算ユニットの内部構成を並列して演算処理できるようにすることで、更なる処理性能の向上につながる。さらに、本演算ユニットでは、演算処理に要するクロック数が毎回同一クロック数であるため、処理時間の予測が可能となり、決められた時間内に演算処理を完了しなければならないリアルタイム制御に最適となる。すなわち、演算ユニットの動作周波数を高くしなくても、演算処理性能の向上とリアルタイム性を実現し、チップの発熱の抑制および放射ノイズの防止につなげることができる。
【００２５】
請求項２記載の演算処理装置によれば、最終的な演算結果の値にリミット処理を行う場合、上限と下限のリミット処理を一度に実行できるため、処理の高速化につながる。また、リミット処理を行わない場合は、有効フラグを無効にしておくことで、コンパレータへの入力データをスルー状態とする。これにより、リミット処理の演算実行ありなしに関わらず、一定の演算クロック数にて処理を完了させることができ、リアルタイム性が維持されることになる。
【００２６】
請求項３記載の演算処理装置によれば、従来では条件判断(if文などによる処理)が必要であった、キャリーとボローを利用した演算処理において、本発明の加算器のキャリーおよびボロー処理回路を有効にフラグ設定するのみで演算を実行できることになる。つまり、数クロックを要する条件判断処理が１クロックで実現可能となり、演算処理性能を向上させることができる。
【００２７】
請求項４記載の演算処理装置によれば、請求項１から３記載の演算ユニットにて処理される演算を別の演算ユニットに処理させることで、更なる演算の高速化を実現できる。また、多軸モータ制御のように、同様の演算処理を並行して複数実行する場合、複数個並列に配置した演算ユニットの効果が顕著となる。すなわち、本特許の目的とする演算ユニットの動作周波数を高くしなくても、演算処理性能の向上とリアルタイム性を実現し、チップの発熱の抑制および放射ノイズの防止につなげることができる。
【図面の簡単な説明】
【図１】本発明の第1実施例を示す演算処理装置のブロック図である。
【図２】本発明のタイミングチャートを示す図である。
【図３】本発明の第２実施例を示す演算処理装置のブロック図である。
【図４】本発明の第３実施例を示す算術演算ユニットに内蔵する加算器の構成図である。
【図５】本発明の第４実施例を示す演算処理装置のブロック図である。
【図６】従来例を示す演算処理装置のブロック図である。
【符号の説明】
１：算術演算ユニット
２：入力メモリ
３：出力メモリ
４：オペコード設定メモリ
５：バレルシフタ回路
６：回路接続網
７：マルチプレクサ
８：演算機能コード設定メモリ
９：シフト量設定メモリ
１０：コンパレータ
１１：上限／下限リミット設定メモリ
１２：コンパレータ機能有効フラグメモリ
１３：加算器
１４：加算器機能有効フラグメモリ
１５：演算処理装置
１６：クロック
１７：Ｓｔａｒｔ信号
１８：演算機能コードデータ
１９：オペコードデータ
２０：シフトデータ
２１：コンパレータ入力データＬＩＮ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an arithmetic processing apparatus that performs numerical arithmetic processing using a microprocessor or a digital signal processor (DSP).
[0002]
[Prior art]
FIG. 6 shows an arithmetic processing unit that performs numerical arithmetic processing using a conventional microprocessor or digital signal processor (DSP). FIG. 6 is a block diagram of an arithmetic processing unit showing a conventional example.
In FIG. 6, 1 is an arithmetic operation unit for performing addition / subtraction / division / division, logical operation and numerical comparison, and 2 is an input memory for holding a value input to the arithmetic operation unit 1 and is composed of two of A and B. Reference numeral 3 denotes an output memory R that holds operation results, and reference numeral 4 denotes an operation code setting memory for switching a function processed by the arithmetic operation unit 1.
Here, when considering the addition process, first, when an operation code for executing addition is set and a numerical value of 10 is set in the memory A and 20 is set as the input value, the arithmetic unit 1 (10 + 20 =) 30 is obtained as an output value from.
This example is a simple arithmetic process, but in a general arithmetic processing system, a lot of complicated arithmetic processes are processed, and the result calculated by the arithmetic operation unit 1 is fed back to the input memory 2 and 1 The arithmetic processing of the entire system is performed by using one arithmetic operation unit many times.
Recently, in order to perform complex operations such as image processing, an image processing processor in which a plurality of functional modules are arranged in an arithmetic operation unit has been proposed (for example, JP 2000-503427 A). .
[0003]
[Problems to be solved by the invention]
However, when performing arithmetic processing using an arithmetic operation unit as in the prior art, it is common to increase the operating frequency of a microprocessor or DSP in order to achieve high-speed operation. . This causes heat generation of the chip and radiation noise, and reduces the reliability of the product.
There is also a means for realizing high speed by arranging arithmetic operation units in parallel. In the image processing processor of JP-T-2000-503427, it is possible to finally output one calculation processing result by parallel calculation. However, since a plurality of calculation processing results cannot be output, the parallel processing property is utilized. Processing has not been realized.
In addition, when a conventional microprocessor or DSP is used in a real-time processing system composed of a plurality of tasks, one arithmetic operation unit is switched for a plurality of different processes to perform the operation. Is difficult to output (the processing time is not constant). That is, at present, there is no arithmetic unit dedicated to real-time processing.
[0004]
The present invention has been made to solve the above-described problems, and provides an arithmetic processing apparatus optimal for real-time control arithmetic processing that can improve arithmetic performance even at a low operating frequency without increasing the operating frequency. For the purpose.
[0005]
[Means for Solving the Problems]
In order to solve the above problems, the present invention includes an input memory (2) for holding a plurality of input data, and a plurality of arithmetic operation units for performing addition / subtraction / multiplication / division, logical operation and numerical comparison. (1), a numerical value including a plurality of operation code setting memories (4) for determining a function to be executed by the arithmetic operation unit, and an output memory (3) for holding an operation result from the arithmetic operation unit A plurality of barrel shifter circuits (5) for performing shift processing using the calculation result executed by the arithmetic operation unit (1), and a setting memory for determining the shift amount of the calculation result (9)
A multiplexer (7) for switching and outputting a plurality of computation results obtained by the barrel shifter circuit (5), and a computation function code setting memory (8) for selecting an output value from the multiplexer are provided.
According to the arithmetic processing device of claim 1, by using the operation code setting of each built-in arithmetic operation unit and the arithmetic function code for selecting the multiplexer output value, many arithmetic functions that can be processed by this arithmetic unit are realized. It is possible to perform complicated arithmetic processing at a time. In particular, by constructing a frequently used calculation pattern as this calculation unit in calculation processing such as in a motor control system, it is possible to perform advanced calculation processing that has been difficult to realize in the past. In addition, by allowing the internal configuration of the arithmetic unit to perform arithmetic processing in parallel, the processing performance is further improved. Furthermore, in this arithmetic unit, since the number of clocks required for arithmetic processing is the same clock number every time, processing time can be predicted, and it is optimal for real-time control in which arithmetic processing must be completed within a predetermined time. . That is, even if the operating frequency of the arithmetic unit is not increased, the arithmetic processing performance can be improved and real-time performance can be realized, leading to suppression of heat generation of the chip and prevention of radiation noise.
[0006]
According to a second aspect of the present invention, in the arithmetic processing apparatus according to the first aspect, a comparator (10) for performing limit arithmetic processing, a memory (11) for storing upper and lower limit setting values, A flag memory (12) for enabling the comparator function is provided.
According to the arithmetic processing apparatus of the second aspect, when it is desired to perform the limit process on the final calculation result value, the upper limit and the lower limit process can be executed at a time, which leads to an increase in the processing speed. Further, when the limit process is not performed, the input data to the comparator is set to the through state by disabling the valid flag. As a result, regardless of whether or not the limit processing is executed, the processing can be completed with a fixed number of operation clocks, and real-time performance can be maintained.
[0007]
According to a third aspect of the present invention, in the arithmetic processing unit according to the first or second aspect, the arithmetic operation unit (1) outputs 1 when a carry occurs in the addition result, and a borrow occurs. In this case, an adder (13) that outputs -1 and a flag memory (14) for enabling the function of the adder are incorporated.
According to the arithmetic processing device of claim 3, the carry and borrow processing circuit of the adder is effective in the arithmetic processing using carry and borrow, which conventionally required condition judgment (processing by an if statement or the like). The calculation can be executed only by setting the flag. That is, the condition determination process requiring several clocks can be realized with one clock, and the arithmetic processing performance can be improved.
[0008]
A fourth aspect of the present invention is the arithmetic processing device according to any one of the first to third aspects, wherein a plurality of the arithmetic processing devices are arranged in parallel.
According to the arithmetic processing device according to claim 4, further speeding up of the arithmetic operation can be realized by causing the arithmetic processing units according to claims 1 to 3 to perform parallel processing with a plurality of arithmetic units. . Further, when the same arithmetic processing is executed in parallel as in the multi-axis motor control system, the effect of the arithmetic units arranged in parallel becomes remarkable. That is, even if the operating frequency of the arithmetic unit that is the object of this patent is not increased, the arithmetic processing performance can be improved and real-time performance can be realized, and the chip heat generation can be suppressed and radiation noise can be prevented.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram of an arithmetic processing unit showing a first embodiment of the present invention.
1 is an arithmetic operation unit, 2 is an input memory, 3 is an output memory, 4 is an operation code setting memory, 5 is a barrel shifter circuit, 6 is a circuit connection network, 7 is a multiplexer, 8 is an operation function code setting memory, and 9 is a shift amount setting. It is memory. Among the constituent elements, the arithmetic operation unit 1, the input memory 2, the output memory 3, and the operation code setting memory 4 are the same as those in the prior art except for the configuration provided in the apparatus, and the description thereof is omitted. .
[0010]
The features of the present invention are as follows.
That is, the arithmetic processing unit obtains the plurality of barrel shifter circuits 5 that perform shift processing using the arithmetic results executed in the arithmetic arithmetic unit 1, the setting memo 9 that determines the shift amount of the arithmetic results, and the barrel shifter circuit 5. A multiplexer 7 for switching and outputting a plurality of computation results, and a computation function code setting memory 8 for selecting an output value from the multiplexer 7 are provided.
[0011]
Next, the operation will be described.
In the arithmetic processing measure, IN1, IN2,... INm are arranged as an input memory 2 for holding a plurality of input data, and a plurality of arithmetic operation units 1 (ALU1) for performing addition / subtraction / multiplication / division, logical operation and numerical comparison are provided. , ALU2,... ALUN). Data of IN1 and IN2 are input to the arithmetic operation unit ALU1, and data of IN3 and IN4 are input to the arithmetic operation unit ALU2. The operation code setting memory 4 for determining the function to be executed in each arithmetic operation unit is arranged by the number of arithmetic operation units, and the operation codes 1 and 2 correspond to the ALU1 and ALU2. The calculation result executed in each arithmetic operation unit 1 is connected to a barrel shifter circuit 5 that performs a shift process to the left and right, and is shifted by a numerical value stored in a setting memory 9 that determines a shift amount. In the figure, the calculation result of ALU1 is connected to BRL1, the calculation result of ALU2 is connected to BRL2, and the shift amount is determined by the memory values of Shift1 and Shift2. The output values from BRL1 and BRL2 are input to ALU3, the arithmetic processing corresponding to opcode 3 is executed, and the arithmetic result is input to BRL3. On the other hand, the output values of BRL1, BRL2 and BRL3, and further the input data of INm are connected to a circuit connection network 6 including a plurality of arithmetic operation units 1 and barrel shifter circuits 5, and other arithmetic processing is possible. The output value from the circuit connection network 6 may be connected to a multiplexer 7 for switching and outputting a plurality of calculation results, or may be input to a further arithmetic operation unit ALUn to execute an operation process corresponding to the operation code n. In some cases, the calculation result is input to BRLn, shifted, and finally input to the multiplexer. Data from the arithmetic function code setting memory (8) for selecting an output value is input to the multiplexer, and the data is stored in an output memory (3) for holding the arithmetic output value.
[0012]
Examples of processing that can be executed by the arithmetic unit include the following. Here, a case where input data is A and B will be described.
・ A + B (addition)
・ A-B (subtraction)
・ A x B (multiplication)
・ A ÷ B (Division)
・ | A + B | (Absolute value)
・ A & B (Logical AND)
・ A ｜ B (Logical OR)
・ A ^ B (Logic EX-OR)
・ A <B, A> B (Comparison of numerical values)
・ A = B, A! = B (numeric match, mismatch)
Etc. can be executed in each arithmetic unit.
[0013]
Therefore, according to the arithmetic processing device, the following arithmetic processing can be executed at a time. In order to simplify the explanation, in this calculation example, it is assumed that the calculation result of BRL3 is not processed in the circuit connection network 6 and is passed through and connected to ALUn, and shift processing is performed only when multiplication is executed. .
[0014]
R1 = (IN1 + IN2)
R2 = (IN3-IN4)
R3 = ((IN1 + IN2) × (IN3-IN4)) >> Shift3
・
・
Rn = (((IN1 + IN2) × (IN3−IN4)) >> Shift3) + I Nm
[0015]
Since the first embodiment of the present invention has the configuration described above, the operation processing unit performs processing by using the operation code setting for each built-in arithmetic operation unit and the operation function code for selecting the multiplexer output value. Many calculation functions that can be performed can be realized, and complicated calculation processing can be performed at once, thereby improving calculation performance.
[0016]
FIG. 2 is a diagram showing a timing chart in the arithmetic processing apparatus of the present invention, which will be described below in conjunction with FIG.
In FIG. 2, 16 is a clock, 17 is a Start signal, 18 is arithmetic function code data, 19 is opcode data, and 20 is shift data.
As shown in FIG. 2, the arithmetic processing unit of the present invention operates in synchronization with the clock 16, and processing of the arithmetic unit is performed during a period in which the Start signal 17 is valid. To explain along the flow of data processing, first, before the operation is started by the arithmetic processing unit, the operation function code data is stored in the operation function code setting memory 8, the operation code data is stored in the operation code setting memory 4, and the shift data is shifted by the shift amount. Set in setting memory 9.
Next, by enabling the Start signal 17, the calculation function code data 18 (MUX Code), the operation code data 19 (OP1 to OPn), and the shift data 20 (Shift1 to Shiftn) set in the memory are calculated. Input to the unit.
As a result, processing inside the arithmetic processing unit is started, and data for performing arithmetic is set in the input memory 2 (IN1 to INm) in the first clock cycle. In the next clock cycle, ALU1 and BRL1, and ALU2 and BRL2 are executed in parallel. In the subsequent clock cycle, arithmetic processing of ALU3 and BRL3 is executed, and in subsequent clock cycles, arithmetic processing in the circuit connection network 6 including a plurality of arithmetic operation units 1 and barrel shifter circuits 5, and arithmetic processing of ALUn and BRLn are executed. become. Here, the number of clocks required for the arithmetic processing differs depending on the number of arithmetic operation units 1 and barrel shifter circuits 5 incorporated in the circuit connection network 6.
After the arithmetic processing by each arithmetic operation unit 1 and the barrel shifter circuit 5 is completed, necessary operation result data is set in the output memory from the multiplexer 7 in accordance with the set value of the operation function code data in the last clock cycle. . When the data set to the output memory is completed, the Start signal becomes invalid and the processing in the arithmetic processing unit is completed. Here, when the arithmetic processing by the arithmetic processing apparatus is executed a plurality of times, the operation shown in FIG. 2 is repeated.
[0017]
Therefore, as shown in FIG. 2, since the number of clocks required for processing in one processing unit is constant, the processing time can be predicted, and the processing is completed within the determined time. It can be said that it is optimal for a real-time processing system that must be performed.
[0018]
Next, a second embodiment of the present invention will be described.
FIG. 3 is a block diagram of an arithmetic processing unit showing a second embodiment of the present invention.
In FIG. 3, 10 is a comparator, 11 is an upper limit / lower limit setting memory, 12 is a comparator function valid flag memory, and 21 is comparator input data LIN.
The second embodiment is different from the first embodiment in that a comparator 10 for performing limit calculation processing, a memory 11 for storing upper and lower limit setting values, and a flag memory 12 for enabling the comparator function are provided. It is to be prepared.
Next, only the differences from FIG. 1 will be described in order to simplify the operation description.
Data output from the circuit connection network provided in the arithmetic processing unit becomes input data 21 (LIN) to the comparator 10 (CMP). LIN is subjected to limit processing based on data set in the upper limit / lower limit setting memory 11 in advance. The comparator 10 outputs the LIN as it is when the LIN is within the range from the upper limit setting value to the lower limit setting value. However, when LIN is equal to or greater than the upper limit limit set value, the upper limit limit set value is output, and when LIN is equal to or lower than the lower limit limit set value, the lower limit limit set value is output. Further, the limit processing in the comparator 10 is executed only when the flag memory 12 that enables the comparator function is set to be effective. When the comparator function is invalid, the input data LIN is output through the comparator 10.
[0019]
Since the second embodiment of the present invention is configured as described above, the present arithmetic processing unit differs from the case where limit processing is executed by comparing the size of arithmetic operation units (<,>, etc.). Since limit processing can be executed in one clock at a time, processing speed can be increased. In addition, regardless of whether or not the limit processing is executed, the processing is completed with a fixed number of operation clocks, so that real-time performance can be maintained.
[0020]
Next, a third embodiment of the present invention will be described.
FIG. 4 is a block diagram of an adder built in the arithmetic operation unit according to the third embodiment of the present invention.
In FIG. 4, 13 is an adder, 14 is an adder function valid flag memory,
The third embodiment differs from the first and second embodiments in that the arithmetic unit 1 outputs 1 when a carry occurs in the addition result and outputs -1 when a borrow occurs. An adder 13 and a flag memory 14 for enabling the function of the adder are provided. When this valid flag is invalid, it functions as a normal adder.
[0021]
Next, the operation will be described.
By using the arithmetic operation unit 1 incorporating the adder 13, when a carry or borrow occurs in the addition result in the conventional adder, a flag indicating each state is used, and a match / mismatch operation is performed by software. (=,! =, Etc.) The condition is determined, and in the case of carry, 1 can be added to the upper word, and in the case of borrow, the process of subtracting 1 can be reduced. That is, the adder function flag memory 14 is validated, and the addition result and the upper word may be added in both carry / borrow processing.
Since the third embodiment of the present invention has the above-described configuration, the condition determination processing that required several clocks can be realized with one clock, and the arithmetic processing performance can be improved.
[0022]
Next, a fourth embodiment of the present invention will be described.
FIG. 5 is a block diagram of an arithmetic processing unit showing a fourth embodiment of the present invention.
The fourth embodiment is different from the first and second embodiments in that a plurality of arithmetic processing units shown in the first and second embodiments are arranged in parallel. The processing content inside each arithmetic unit is as already described in the embodiment. However, by arranging these arithmetic processing devices in parallel, when one arithmetic unit of the present invention is used, it is possible to perform parallel processing on the processing that has been performed in several steps.
[0023]
Since the fourth embodiment of the present invention is configured as described above, in a system that executes parallel arithmetic processing, it is possible to greatly improve the arithmetic performance by utilizing the parallel arithmetic function of this arithmetic unit. .
[0024]
【The invention's effect】
According to the arithmetic processing device of claim 1, by using the operation code setting of each built-in arithmetic operation unit and the arithmetic function code for selecting the multiplexer output value, many arithmetic functions that can be processed by this arithmetic unit are realized. It is possible to perform complicated arithmetic processing at a time. In particular, by constructing a frequently used calculation pattern as this calculation unit in calculation processing such as in a motor control system, it is possible to perform advanced calculation processing that has been difficult to realize in the past. In addition, by allowing the internal configuration of the arithmetic unit to perform arithmetic processing in parallel, the processing performance is further improved. Furthermore, in this arithmetic unit, since the number of clocks required for arithmetic processing is the same clock number every time, processing time can be predicted, and it is optimal for real-time control in which arithmetic processing must be completed within a predetermined time. . That is, even if the operating frequency of the arithmetic unit is not increased, the arithmetic processing performance can be improved and real-time performance can be realized, and the chip heat generation can be suppressed and radiation noise can be prevented.
[0025]
According to the arithmetic processing apparatus of the second aspect, when the limit process is performed on the final calculation result value, the upper limit and the lower limit process can be executed at a time, which leads to the speeding up of the process. Further, when the limit process is not performed, the input data to the comparator is set to the through state by disabling the valid flag. As a result, the processing can be completed with a constant number of operation clocks regardless of whether or not the limit processing is executed, and real-time performance is maintained.
[0026]
According to the arithmetic processing apparatus of claim 3, in the arithmetic processing using carry and borrow, which conventionally required condition judgment (processing by an if statement or the like), the carry and borrow processing circuit of the adder of the present invention The calculation can be executed only by setting the flag effectively. That is, the condition determination process requiring several clocks can be realized with one clock, and the arithmetic processing performance can be improved.
[0027]
According to the arithmetic processing device of the fourth aspect, it is possible to realize further speeding up of the arithmetic operation by causing another arithmetic unit to process the arithmetic processing processed by the arithmetic unit according to the first to third aspects. In addition, when a plurality of similar arithmetic processes are executed in parallel as in the multi-axis motor control, the effect of the arithmetic units arranged in parallel becomes remarkable. That is, even if the operating frequency of the arithmetic unit that is the object of this patent is not increased, the arithmetic processing performance can be improved and real-time performance can be realized, and the chip heat generation can be suppressed and radiation noise can be prevented.
[Brief description of the drawings]
FIG. 1 is a block diagram of an arithmetic processing unit showing a first embodiment of the present invention.
FIG. 2 is a timing chart of the present invention.
FIG. 3 is a block diagram of an arithmetic processing unit showing a second embodiment of the present invention.
FIG. 4 is a configuration diagram of an adder built in an arithmetic operation unit according to a third embodiment of the present invention.
FIG. 5 is a block diagram of an arithmetic processing unit showing a fourth embodiment of the present invention.
FIG. 6 is a block diagram of an arithmetic processing unit showing a conventional example.
[Explanation of symbols]
1: arithmetic operation unit 2: input memory 3: output memory 4: opcode setting memory 5: barrel shifter circuit 6: circuit connection network 7: multiplexer 8: arithmetic function code setting memory 9: shift amount setting memory 10: comparator 11: upper limit / Lower limit setting memory 12: Comparator function valid flag memory 13: Adder 14: Adder function valid flag memory 15: Arithmetic processor 16: Clock 17: Start signal 18: Arithmetic function code data 19: Opcode data 20: Shift data 21 : Comparator input data LIN

Claims

An input memory (2) for holding a plurality of input data, a plurality of arithmetic operation units (1) for performing addition / subtraction / multiplication / division, logical operation and numerical comparison, and a function to be executed by the arithmetic operation unit are determined. In an arithmetic processing apparatus for performing numerical arithmetic processing, comprising a plurality of operation code setting memories (4) for output and an output memory (3) for holding arithmetic results from the arithmetic arithmetic unit,
A plurality of barrel shifter circuits (5) for performing shift processing using the calculation results executed in the arithmetic operation unit (1);
A setting memory (9) for determining the shift amount of the calculation result;
A multiplexer (7) for switching and outputting a plurality of operation results obtained by the barrel shifter circuit (5);
An arithmetic processing apparatus comprising: an arithmetic function code setting memory (8) for selecting an output value from the multiplexer.

A comparator (10) for performing limit calculation processing;
The arithmetic processing unit according to claim 1, further comprising a memory (11) for storing upper and lower limit set values and a flag memory (12) for enabling the comparator function.

The arithmetic operation unit (1) outputs an adder (13) that outputs 1 when a carry occurs in the addition result, and outputs a -1 when a borrow occurs, and the function of this adder is enabled. An arithmetic processing unit according to claim 1 or 2, wherein a flag memory (14) is incorporated.

The arithmetic processing device according to claim 1, wherein a plurality of the arithmetic processing devices are arranged in parallel.