JP3708072B2

JP3708072B2 - Semiconductor computing device

Info

Publication number: JP3708072B2
Application number: JP2002280775A
Authority: JP
Inventors: 実藤島; 康文鈴木; 康祐斉藤; 真一大内; 紘一郎鳳
Original assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency; National Institute of Japan Science and Technology Agency
Priority date: 2002-09-26
Filing date: 2002-09-26
Publication date: 2005-10-19
Anticipated expiration: 2022-09-26
Also published as: JP2004118512A

Description

【０００１】
【発明の属する技術分野】
本発明は、半導体演算装置に関し、特に、Ｎビット（Ｎは自然数）で表現可能な２^N個のすべての論理値について演算を並列に実行する半導体演算装置に用いて好適なものである。
【０００２】
【従来の技術】
従来の計算機等が備える半導体演算装置（例えばＣＰＵ）は、１つのデータ値（論理値）を用いて演算処理や条件分岐処理等を逐次実行する従来の演算方式を用いて演算を行っていた。この従来から利用されている演算方式とは異なり、量子力学の原理に基づく新しい演算方式を用いて演算を行う量子コンピュータが提案されている。
【０００３】
量子コンピュータは、キュービット（量子ビット）という単位で構成された量子力学に基づく状態の重ね合わせを利用し、量子力学的な操作を当該状態に施すことにより各状態に対する演算を並列に実行する。したがって、量子コンピュータは、理論上、すべての可能な状態（値として存在可能なデータ値）について、１回の演算を行うだけですべての可能な状態の演算結果を同時に、かつ瞬時に得ることができる。
【０００４】
上記量子コンピュータは、量子力学に基づく状態の重ね合わせを利用するので、核磁気共鳴、マイクロ波、レーザー等を用いた実際に量子力学に基づく現象が観測可能な物理系で実現されてきた。最近では、量子コンピュータにおいて、２⁷個の状態が同時に存在する７キュービットまで演算可能な実際の物理系が提案されている。
【０００５】
【発明が解決しようとする課題】
しかしながら、従来の量子コンピュータは、量子力学に基づく現象が観測可能な実際の物理系を用いているので、装置自体が非常に大規模になり実用的ではなかった。さらに、量子コンピュータは、状態が互いに相関を持ちながら変化するようにして演算を行う必要があり、キュービット数の増加に伴い指数関数的に増加する位相状態を実際の物理系を用いた１つの装置に存在させて保存することが困難になってくる。
また、量子コンピュータは量子的な物理現象を利用しており、古典的な古典電磁気学に基づく物理現象を利用した集積回路を用いて、量子コンピュータをそのまま実現することは困難である。
【０００６】
そこで、本発明者らは、量子力学に基づいて重ね合わされたそれぞれの状態（論理値）に対応する複数のプロセッサエレメント（演算回路）を設け、それらを互いに通信可能なように接続することにより、集積回路を用いて量子コンピュータの技術を実現する特願２００１−２７９２８６に記載の並列処理プロセッサを提案した。この並列処理プロセッサは、上記複数のプロセッサエレメントが対応する状態の状態確率について演算を同時に行い、得られた演算結果を記憶するように複数のプロセッサエレメントを並列動作させることで、量子コンピュータと同様の演算機能を実現する。
【０００７】
図１２は、本発明者らが提案した並列処理プロセッサが備えるプロセッサエレメントの構成の概要を示すブロック図である。図１２においては、演算対象として注目するターゲットキュービットの値のみが異なり、他のキュービットの値が等しい第１の状態（ターゲットキュービットの値“０”）および第２の状態（ターゲットキュービットの値“１”）にそれぞれ対応するプロセッサエレメント１２１、１２６を一例として示している。
【０００８】
ここで、量子コンピュータによる量子計算において、データに該当する状態確率は位相情報を含み複素数を用いて表され、量子計算にて行われる演算はユニタリ変換である。すなわち、量子計算では複素数で表された状態確率に対してユニタリ行列を用いた行列演算を施すので、図１２に示すように１つのプロセッサエレメントは、２つの乗算器と１つの加算器とを有し、複素数の積和演算を行う。
【０００９】
ユニタリ行列の各成分の係数値をＵ１、Ｕ２，Ｕ３、Ｕ４とすると、図１２に示したプロセッサエレメント１２１では、第１の状態の状態確率（｜０＞）と係数値（Ｕ１）との乗算、および第２の状態の状態確率（｜１＞）と係数値（Ｕ２）との乗算が乗算器１２２、１２３でそれぞれ行われる。さらに、乗算器１２２、１２３の出力が加算器１２４で加算され、その演算結果が演算後の第１の状態の状態確率（｜０＞’）としてレジスタ１２５に記憶される。
【００１０】
同様に、プロセッサエレメント１２６では、第１の状態の状態確率（｜０＞）と係数値（Ｕ３）との乗算、および第２の状態の状態確率（｜１＞）と係数値（Ｕ４）との乗算が乗算器１２７、１２８で行われ、乗算器１２７、１２８の出力が加算器１２９で加算された後、演算後の第２の状態の状態確率（｜１＞’）としてレジスタ１３０に記憶される。
【００１１】
しかしながら、上記図１２に示したプロセッサエレメント１２１、１２６を用いて構成した並列処理プロセッサにおいて、複素数で表される状態確率は、実数部および虚数部がそれぞれ複数のビット（例えば、それぞれ８ビット）を用いて示される。したがって、上記並列処理プロセッサは、複数のビットを用いて表された状態確率について複素積和演算を行うために演算処理に多大な時間を要してしまい、演算を高速に実行することができないという問題があった。
【００１２】
また、上記並列処理プロセッサは、複数のビット演算に対応した２つの乗算器と１つの加算器とをそれぞれのプロセッサエレメントに備える必要があり、１つのプロセッサエレメントを構成するための回路面積（回路規模）が大きく、またプロセッサエレメントの数は、キュービット数の増加にともない指数関数的に増加するので、演算規模を大きくすることが容易ではないという問題があった。
【００１３】
本発明は、このような問題に鑑みてなされたものであり、量子コンピュータと同様の演算機能を有しながらも、演算を高速で実行することができるようにすることを目的とする。また、本発明は、演算の規模を容易に大きくできるとともに、演算を高速で実行することができるようにすることを第２の目的とする。
【００１４】
【課題を解決するための手段】
本発明の半導体演算装置は、Ｎビット（Ｎは自然数）で表現可能なすべての論理値の状態について演算を並列して行い、それぞれの演算結果を保持する半導体演算装置であって、所定の論理値の状態について演算を行う際、供給される論理値の状態を示すフラグを用いて論理演算を行い、演算結果を保持する複数の演算回路を備え、複数の演算回路は、互いに異なる論理値の状態について演算を並列して行うことを特徴とする。
【００１５】
本発明の半導体演算装置の他の特徴とするところは、論理値の状態を示すフラグは、論理値の状態を示す確率振幅に応じたフラグであることを特徴とする。
本発明の半導体演算装置のその他の特徴とするところは、論理値の状態を示すフラグは、論理値の状態を示す確率振幅が０とは異なる値であるか否かを１ビットで示すことを特徴とする。
【００１６】
本発明の半導体演算装置のその他の特徴とするところは、演算回路にそれぞれ対応して設けられた、複数の論理値の状態を記憶する複数の記憶回路をさらに備え、複数の記憶回路が記憶する複数の論理値の状態は、記憶回路毎に互いに異なり、演算回路は、対応する記憶回路に記憶された複数の論理値の状態について演算可能であることを特徴とする。
本発明の半導体演算装置のその他の特徴とするところは、記憶回路は、論理値の状態を示すフラグを記憶することを特徴とする。
【００１７】
本発明の半導体演算装置のその他の特徴とするところは、複数の演算回路は、ネットワークを介して互いに通信可能なように接続されていることを特徴とする。
本発明の半導体演算装置のその他の特徴とするところは、ネットワークは、論理値のハミング距離が１である論理値の状態について演算を行う演算回路を少なくとも互いに通信可能なように接続することを特徴とする。
本発明の半導体演算装置のその他の特徴とするところは、ネットワークは、論理値のハミング距離が１である論理値の状態について演算を行う演算回路をハイパーキューブ状に互いに通信可能なように接続することを特徴とする。
【００１８】
本発明の半導体演算装置のその他の特徴とするところは、論理値の状態についての演算結果に基づいて、観測命令操作により得られる解を記憶するレジスタをさらに備えることを特徴とする。
【００１９】
本発明の半導体演算装置のその他の特徴とするところは、演算回路は、複数の論理値の状態について演算を行う際、演算における互いに異なる処理を順次並列に行うことを特徴とする。
本発明の半導体演算装置のその他の特徴とするところは、演算回路は、所定の論理値の状態を示すフラグと、所定の論理値とは演算対象ビットの値だけが異なる論理値の状態を示すフラグとを用いた論理演算を行うことを特徴とする。
【００２０】
【発明の実施の形態】
以下、本発明の一実施形態を図面に基づいて説明する。
本発明の実施形態による半導体演算装置を適用した並列処理プロセッサは、集積回路を用いて量子コンピュータの技術を実現するものであり、量子計算を行うための公知の量子アルゴリズムと同様にして演算が実行される。
【００２１】
まず、量子アルゴリズムについて説明する。図１は量子アルゴリズムの流れを説明するための図であり、ショア（Ｓｈｏｒ）のアルゴリズムを初めとする量子アルゴリズムは、一般に図１に示すように４つのステージ（段階）で構成される。なお、図１は８キュービットの場合を一例として示しており、横方向に伸びる実線は各キュービットＱ１〜Ｑ８にそれぞれ対応する。
以下、各ステージについて説明する。
【００２２】
第１のステージ（Ｓｔａｇｅ１）では、ウォルシュ・アダマール変換（Walsh-Hadamard Transformation：以下、「Ｗ−Ｈ変換」と称す。）Ｓ１_-1、Ｓ１_-2、Ｓ１_-3、Ｓ１_-4、…を用いて、所定の状態に確率を分配する。これにより、第１のステージでは、所定の状態に等しい確率を割り当て、量子力学に基づく状態の重ね合わせの初期状態を生成する。
【００２３】
第２のステージ（Ｓｔａｇｅ２）では、ノット（ＮＯＴ）変換（制御ＮＯＴ操作）Ｓ２_-1、Ｓ２_-2、Ｓ２_-3、…を用いて、上記第１のステージにて割り当てられた確率を状態間で交換する。この第２ステージにおける確率の交換操作は、ＣＰＵ等を備えた従来の計算機における加算演算や乗算演算に相当する。
【００２４】
なお、図１において、例えばノット変換Ｓ２_-1は、制御キュービットがＱ３、ターゲットキュービットがＱ８であることを示している。このとき、制御キュービットＱ３の値が“１”であれば、ターゲットキュービットＱ８の値のみが異なり、他のキュービットの値が等しい状態間にて確率の交換をそれぞれ行う。一方、制御キュービットＱ３の値が“０”であれば、確率の交換は行わずに値を保持する。ノット変換Ｓ２_-2、Ｓ２_-3、…においても同様である。
【００２５】
第３のステージ（Ｓｔａｇｅ３）では、Ｗ−Ｈ変換Ｓ３_-1、Ｓ３_-3、…およびフェイズシフト変換Ｓ３_-2、Ｓ３_-4、…による量子フーリエ変換（Quantum Fourier Transformation：以下、「ＱＦＴ」と称す。）等を用いて干渉により解を一点に収束させる。さらに、第４のステージ（Ｓｔａｇｅ４）では、観測Ｓ４を行うことで解を求める。
【００２６】
なお、上述したノット変換Ｓ２_-1と同様に、例えばフェイズシフト変換Ｓ３_-2は、制御キュービットがＱ８、ターゲットキュービットがＱ７であることを示している。フェイズシフト変換Ｓ３_-2は、制御キュービットＱ８の値が“１”であれば、ターゲットキュービットＱ７に関して位相回転を行い、制御キュービットＱ８の値が“０”であれば値を保持する。フェイズシフト変換Ｓ３_-4、…においても同様である。
【００２７】
本発明者らは、上記図１に示したような量子アルゴリズムにおいて、Ｗ−Ｈ変換により所定の状態に確率を分配する第１のステージ、および制御キュービットの値等に応じて状態間での確率の交換を行う第２のステージが終了するまでは、各状態の状態確率（確率振幅）が、０あるいは値ｐ（０＜ｐ≦１）の何れか一方の値のみであることを見出した。すなわち、量子アルゴリズムにおける第１および第２のステージにおいては、複数のビットを用いて示される状態の状態確率の値そのものを用いなくとも、値を有するか否か（値が０であるかｐであるか）を示せば、情報量を損失することなく（情報量は変わらずに）等価な演算を行うことが可能であることを見出した。
【００２８】
そこで、本実施形態における並列処理プロセッサは、Ｎ（Ｎは自然数）ビットで表現可能なすべての論理値の状態について各状態の状態確率が値を有するか否かを１ビットで示す確率フラグを用い、さらにユニタリ変換に相当する演算を論理（ロジック）演算で実行する。これにより、本実施形態における並列処理プロセッサは、プロセッサエレメントにおける構成を単純化しながらも、簡単な構成で量子コンピュータと同等の演算機能を実現するようにしたものであり、プロセッサエレメントに要する回路面積を低減して、演算の並列度および単位ゲート当たりの演算量を向上させることが可能になる。
【００２９】
図２は、本実施形態における並列処理プロセッサ１の一構成例を示すブロック図である。
図２において、制御部２は、命令管理部３およびパイプライン生成部４を有し、複数のプロセッサエレメント８_-n（ｎは添え字であり、ｎ＝１、２、３、…）等の並列処理プロセッサ１内の各機能部を制御する。インタフェース７は、並列処理プロセッサ１と外部に接続された外部機器（外部回路）等との間で命令やデータの授受等を行うためのものである。
【００３０】
命令管理部３は、インタフェース７を介して外部機器から供給される命令をパイプライン生成部４に出力したり、後述する観測命令操作により得られた解をインタフェース７を介して外部機器に出力したりする。命令管理部３は、外部機器から供給される命令を一時的に記憶するための命令キャッシュ５、および観測命令操作により得られた解を記憶するためのアンサーレジスタ６を有する。
【００３１】
パイプライン生成部４は、命令管理部３から供給される命令に基づいて、各プロセッサエレメント８_-nに制御指示をそれぞれ出力し、複数のプロセッサエレメント８_-nがそれぞれパイプライン動作を並列して行うように制御する。
【００３２】
プロセッサエレメント８_-nは、パイプライン生成部４から供給される制御指示に応じて、量子力学に基づいて重ね合わされた状態における所定の状態について演算を行う。プロセッサエレメント８_-nは、複数の状態の状態確率を記憶するためのローカルメモリ９_-n（ｎは添え字であり、ｎ＝１、２、３、…）をそれぞれ有する。すなわち、本実施形態のプロセッサエレメント８_-nのそれぞれは、量子力学に基づいて重ね合わされた状態の中の複数の状態が対応付けられている。
【００３３】
なお、各プロセッサエレメント８_-nにそれぞれ対応付けられている状態は、複数のプロセッサエレメント８_-nにて重複しないとともに、並列処理プロセッサ１にて演算可能なキュービット数に応じた任意の状態（論理値）は、何れかのプロセッサエレメント８_-nに対応付けられている。また、本実施形態では各状態の状態確率は１ビットの確率フラグを用いて示されるので、ローカルメモリ９_-nは、プロセッサエレメント８_-nに対応付けられている状態数に等しいビット数だけの記憶容量を有すれば良い。
また、プロセッサエレメント８_-nは、ネットワーク１０を介して互いに通信可能なように接続されている。
【００３４】
次に、本実施形態におけるプロセッサエレメント（ＰＥ）について詳細に説明する。
図３は、本実施形態におけるプロセッサエレメントの要素的特徴を示す構成図である。図３においては、ターゲットキュービットの値が“０”である第１の状態に対応するプロセッサエレメント３１と、ターゲットキュービットの値が“１”である第２の状態に対応するプロセッサエレメント３４とを一例として示している。なお、第１および第２の状態において、ターゲットキュービットを除くキュービットの値は等しい。
【００３５】
図３において、プロセッサエレメント３１は、論理演算を行うロジック部３２と、ロジック部３２による演算結果を記憶するレジスタ３３とを有する。ロジック部３２は、第１の状態に係る確率フラグＰＡ０および第２の状態に係る確率フラグＰＡ１が入力され、確率フラグＰＡ０、ＰＡ１を用いて所定の論理演算を行う。さらに、ロジック部３２は、演算後の第１の状態に係る確率フラグＰＢ０として演算結果をレジスタ３３に出力する。レジスタ３３は、ロジック部３２から入力される確率フラグＰＢ０を記憶する。
【００３６】
同様に、プロセッサエレメント３４は、論理演算を行うロジック部３５と、その演算結果を記憶するレジスタ３６とを有する。ロジック部３５は、第１および第２の状態に係る確率フラグＰＡ０、ＰＡ１が入力され、所定の論理演算を行った後、演算後の第２の状態に係る確率フラグＰＢ１として演算結果をレジスタ３６に出力する。レジスタ３６は、ロジック部３５から入力される確率フラグＰＢ１を記憶する。
【００３７】
図４は、本実施形態におけるプロセッサエレメントの具体的な構成例を示すブロック図である。なお、図４に示したプロセッサエレメントは、演算処理の高速化および回路面積の低減化を図るために、６段のパイプライン構造を有する。
図４において、８はプロセッサエレメント、９はローカルメモリであり、上記図２に示したプロセッサエレメント８_-n、ローカルメモリ９_-nにそれぞれ対応する。
【００３８】
プロセッサエレメント８は、６つのレジスタ４１〜４５、４７およびロジック部４６により構成される。
レジスタ４１、４２、４３は、ローカルメモリ９の出力端子とロジック部４６の第１の入力端子との間にレジスタ４１、４２、４３の順に直列に接続され、レジスタ４１、４２、４３は、入力される確率フラグを一時的に記憶し、次段に接続されたレジスタ４２、４３およびロジック部４６にそれぞれ出力する。すなわち、ローカルメモリ９の所定の記憶領域から読み出された確率フラグは、レジスタ４１、４２、４３により順次伝達され、ロジック部４６に入力される。また、レジスタ４１は、記憶した確率フラグを他のプロセッサエレメントにおける上記図４に示したレジスタ４４に相当するレジスタに出力する。
【００３９】
レジスタ４４、４５は、ロジック部４６の第２の入力端子に対して直列に接続され、レジスタ４４、４５は、入力される確率フラグを一時的に記憶し、次段に接続されたレジスタ４５およびロジック部４６にそれぞれ出力する。すなわち、他のプロセッサエレメントにおける上記図４に示したレジスタ４１に相当するレジスタから供給された確率フラグは、レジスタ４４、４５により順次伝達され、ロジック部４６に入力される。
【００４０】
ロジック部４６は、レジスタ４７を有し、パイプライン生成部４から供給される制御指示に応じて、レジスタ４３、４５から第１および第２の入力端子を介してそれぞれ入力される確率フラグを用いた論理演算を行う。また、ロジック部４６は、上記論理演算の演算結果をレジスタ４７に一時的に記憶した後、ローカルメモリ９の上記所定の記憶領域に書き込む。なお、レジスタ４７はロジック部４６の外部に設けるようにしても良い。
【００４１】
ローカルメモリ９は、プロセッサエレメント８に対応付けられている複数の状態の確率フラグを記憶するためのものである。このようにローカルメモリ９を設けて、１つのプロセッサエレメント８に複数の状態を対応付けることで、１つのプロセッサエレメントに１つの状態を割り当てるよりも、プロセッサエレメント８を有効に活用し、１つの状態当たりに要するプロセッサエレメントの回路面積を低減することができる。
【００４２】
ここで、１つの状態に係る確率フラグは１ビットであるので、ローカルメモリ９は、対応付けられている状態数と同じビット数の記憶領域を少なくとも備えていれば良く、例えば、状態数が６キュービット分に相当する６４個である場合には、ローカルメモリ９は、６４ビットの記憶領域を少なくとも備えていれば良い。同様に、レジスタ４１〜４５、４７は少なくとも１ビットの情報をそれぞれ記憶できれば良い。
【００４３】
上記図４に示したプロセッサエレメント８は、クロック信号等の所定のタイミング信号に同期してレジスタ４１〜４５、４７が動作し、後述するように６段のパイプライン処理を行う。
図５は、本実施形態における並列処理プロセッサでのパイプライン制御の一例を示す図である。なお、図５において、ＣＬＫはパイプライン処理に用いるクロック信号等のタイミング信号であり、図５における上方から下方に時間は進行するものとする。
【００４４】
パイプライン制御は、パイプライン生成部４から各プロセッサエレメントに供給する制御指示に基づいて行われる。
まず、パイプライン生成部４は、所定の状態の確率フラグが記憶されているローカルメモリ９のアドレスを供給し、ローカルメモリ９からの確率フラグの読み出しをプロセッサエレメント８に指示する（read address）。ローカルメモリ９から読み出された確率フラグは、上記図４に示したレジスタ４１に記憶される。
【００４５】
パイプライン生成部４は、レジスタ４１から他のプロセッサエレメントに対して出力する確率フラグが所望のプロセッサエレメントに供給されるようにネットワーク１０を制御する（send switch）。これにより、自らに付随するローカルメモリ９から読み出した確率フラグがレジスタ４２に記憶され、他のプロセッサエレメントから供給される確率フラグがレジスタ４４に記憶される。
【００４６】
次に、パイプライン生成部４は、演算対象として注目するターゲットキュービットをプロセッサエレメント８に指示する（target）。このとき、演算に用いられる確率フラグは、レジスタ４３、４５にそれぞれ伝達されている。その後、パイプライン生成部４は、プロセッサエレメント８に確率フラグの論理演算を指示し（operation control）、ロジック部４６にて演算が行われ、演算結果である新たな確率フラグがレジスタ４７に記憶される。
【００４７】
次に、パイプライン生成部４は、演算の結果として得られた確率フラグを記憶するためのローカルメモリ９のアドレスおよび書き込み許可を通知し、ローカルメモリ９への確率フラグの書き込みをプロセッサエレメント８に指示する（write address, write enable）。
【００４８】
以上のようにして、それぞれのプロセッサエレメント８は、ローカルメモリ９から所定の状態の確率フラグを読み出して演算を行い、演算後の確率フラグをローカルメモリ９に書き戻す。なお、上述した説明では１つの状態についての処理の一連の流れを説明するために処理毎に分けて説明したが、通常のパイプライン制御と同じように各処理が順次同時に実行され、異なる状態についての処理が実行されていることは言うまでもない。
【００４９】
次に、本実施形態におけるプロセッサエレメント８にて実行される論理演算について説明する。上述したようにロジック部４６は、上記図１に示した量子アルゴリズムにおける第１および第２のステージ、すなわちＷ−Ｈ変換およびノット（ＮＯＴ）変換にそれぞれ相当する論理演算を行う。
【００５０】
図６は、ロジック部４６にて実行される論理演算の真理値表を示す図である。
図６において、確率フラグのフラグ値“０”は、状態の確率振幅の値が０であることを示し、フラグ値“１”は、状態の確率振幅の値がｐであることを示している。
【００５１】
また、図６においては、変換前（論理演算前）にてターゲットキュービットを除く他のキュービットの値が等しく、ターゲットキュービットの値が“０”である状態を「｜０＞」で示し、ターゲットキュービットの値が“１”である状態を「｜１＞」で示している。なお、状態｜０＞、｜１＞の変換後の状態は、「｜０＞’」、「｜１＞’」でそれぞれ示している。
【００５２】
図６に示したようにノット変換において、演算結果として得られる変換後の状態｜０＞’および状態｜１＞’のフラグ値は、それぞれ変換前の状態｜１＞および状態｜０＞のフラグ値である。したがって、ロジック部４６にて行われるノット変換に相当する論理演算は、状態｜０＞および状態｜１＞のフラグ値を入れ替えるような演算である。
【００５３】
当該演算は、ノット変換が指示された際に、ターゲットキュービットの値が異なる状態に係る確率フラグを選択するようにして実行しても良いし、状態｜０＞および状態｜１＞のフラグ値を入力するための信号線を状態｜１＞’および状態｜０＞’のフラグ値を出力するための信号線にそれぞれ単に接続するようにしても良い。
【００５４】
また、図６に示したＷ−Ｈ変換において、変換後の状態｜０＞’のフラグ値が“１”になるのは、変換前の状態｜０＞および状態｜１＞のフラグ値が（｜０＞のフラグ値，｜１＞のフラグ値）＝（０，１）、（１，０）、（１，１）のときである。したがって、状態｜０＞’についてロジック部４６にて行われるＷ−Ｈ変換に相当する論理演算は、状態｜０＞および状態｜１＞のフラグ値を用いた論理和（ＯＲ）演算である。当該演算は、ロジック部４６内に図７（Ａ）に示すように、状態｜０＞および状態｜１＞のフラグ値が入力され、その演算結果を状態｜０＞’のフラグ値として出力するＯＲ演算回路７１を設けることにより実現される。
【００５５】
同様に、Ｗ−Ｈ変換において、変換後の状態｜１＞’のフラグ値が“１”になるのは、変換前の状態｜０＞および状態｜１＞のフラグ値が（｜０＞のフラグ値，｜１＞のフラグ値）＝（０，１）、（１，０）のときである。したがって、状態｜１＞’についてロジック部４６にて行われるＷ−Ｈ変換に相当する論理演算は、状態｜０＞および状態｜１＞のフラグ値を用いた排他的論理和（ＥＸ（exclusive）−ＯＲ）演算である。当該演算は、ロジック部４６内に図７（Ｂ）に示すように、状態｜０＞および状態｜１＞のフラグ値が入力され、その演算結果を状態｜１＞’のフラグ値として出力するＥＸＯＲ演算回路７２を設けることにより実現される。
【００５６】
次に、ネットワーク１０について説明する。
本実施形態におけるネットワーク１０は、各プロセッサエレメント８_-nが他のプロセッサエレメント８_-nに対して、論理演算で用いる状態に係る確率フラグを供給できるように各プロセッサエレメント８_-nを互いに通信可能なように接続すれば良い。例えば、図８に概念図を示すようなハイパーキューブネットワークをネットワーク１０に適用することでネットワーク１０を適切に構築することができる。
【００５７】
図８は、ネットワーク１０に適用可能なハイパーキューブネットワークを説明するための概念図であり、図８においては、説明をわかり易くするために３キュービットの場合を一例として示している。図８に示すように、六面体の各頂点８０〜８７が状態“０００”〜“１１１”（２値論理値）にそれぞれ対応するものとする。
【００５８】
ここで、上述したようにプロセッサエレメント８_-nによる演算は、ターゲットキュービットの値のみが異なる状態に係る確率フラグを用いて行われる。すなわち、プロセッサエレメント８_-nによる論理演算では、演算対象である状態を示す論理値に対してハミング距離が“１”である論理値の状態に係る確率フラグが用いられる。
【００５９】
例えば、状態“０００”についての演算では、状態“００１”、“０１０”および“１００”の何れかの確率フラグが用いられるので、図８において、頂点８０と、頂点８１、８２および８４とをそれぞれ結ぶ辺を通信線に見立てる。すなわち、ネットワーク１０は、状態“０００”に対応したプロセッサエレメントと、状態“００１”、“０１０”および“１００”にそれぞれ対応したプロセッサエレメントとを、通信線ＮＷ１、ＮＷ２およびＮＷ３を介して通信可能なように接続する。
【００６０】
他の状態に対応するプロセッサエレメントについても、上述した説明と同様にして演算対象である状態を示す論理値に対してハミング距離が“１”である論理値の状態に対応するプロセッサエレメントを通信可能なように接続することでハイパーキューブネットワークを適用したネットワーク１０を構築することができる。
【００６１】
次に、上記図１に示した量子アルゴリズムと対応させて本実施形態における並列処理プロセッサ１の動作について説明する。
なお、以下の説明では、説明の便宜上、キュービット数は８（Ｑ１〜Ｑ８）とし、最下位のキュービットをＱ１、最上位のキュービットをＱ８とする。また、並列処理プロセッサ１の初期状態として、すべてのキュービットＱ１〜Ｑ８の値が“０”である状態“00000000”に係る確率フラグのみが“１”であり、他の状態に係る確率フラグは“０”であるとする。
【００６２】
まず、並列処理プロセッサ１は、外部機器からインタフェース７を介して上記図１に示した第１のステージでのＷ−Ｈ変換に対応する命令を受けると、制御部２（命令管理部３およびパイプライン生成部４）により、Ｗ−Ｈ変換に相当する論理演算を行い所定の状態に確率を分配するように各プロセッサエレメント８_-nに制御指示を出力する。
【００６３】
例えば、ターゲットキュービットがキュービットＱ１であるＷ−Ｈ変換に対応する命令を受けた場合には、各プロセッサエレメント８_-nのロジック部４６にてＷ−Ｈ変換に相当する論理演算がそれぞれ行われ、状態“00000000”および“00000001”のフラグ値が“１”になる。さらに、ターゲットキュービットがキュービットＱ２であるＷ−Ｈ変換に対応する命令を受けた場合には、同様にして状態“00000000”、“00000010”、“00000001”および“00000011”のフラグ値が“１”になる。
このようにして、並列処理プロセッサ１は、Ｗ−Ｈ変換に相当する論理演算を実行し、所定の状態に確率を分配する。
【００６４】
次に、外部機器からインタフェース７を介して上記第２のステージでのノット変換に対応する命令を受けると、並列処理プロセッサ１内の制御部２は、ノット変換に相当する論理演算を行うように各プロセッサエレメント８_-nに制御指示を出力する。これにより、各プロセッサエレメント８_-nのロジック部４６にてノット変換に相当する論理演算が行われ、所定の状態間にて確率フラグ値が交換される。
以上のようにして、上記図１に示した量子アルゴリズムにおける第１および第２のステージに相当する演算が並列処理プロセッサ１にて行われる。
【００６５】
ここで、上記図１に示した量子アルゴリズムにおいては、第３のステージにて、第２のステージまでの演算結果に量子フーリエ変換等を施すことで解を収束させ、第４のステージにて観測を行って解を求めている。しかしながら、本実施形態における並列処理プロセッサ１は、複素数で表された状態確率ではなく、状態確率が値を有するか否かを示す確率フラグを用いて演算を行っているので、上記図１に示した量子アルゴリズムと同様に量子フーリエ変換等により解を収束させて求めることはできない。
【００６６】
そこで、本実施形態における並列処理プロセッサ１では、インクァエリ命令を用いた観測命令操作を行うことにより解を求める。インクァエリ命令を用いた観測命令操作では、まず、ターゲットキュービットとアンマスク値とを指定してインクァエリ命令を発行する。発行されたインクァエリ命令を受けた並列処理プロセッサ１は、指定されたアンマスク値に対応する状態の中に、確率フラグが“１”である状態が存在するか否かを調べる。
【００６７】
その結果に応じて、並列処理プロセッサ１は、制御部２が有するアンサーレジスタ６内の指定されたターゲットキュービットに対応するフィールドに値（“０”または“１”）を書き込む。
上記操作を繰り返し行うことにより、本実施形態における並列処理プロセッサ１は、アンサーレジスタ６に解を記憶し、外部からの要求に応じてインタフェース７を介して出力する。
【００６８】
図９は、インクァエリ命令を用いた観測命令操作の具体例を示す図である。図９においては、状態“０１０”、“１００”、“１１０”および“１１１”に係る確率フラグが“１”であるとし、解として確率フラグが“１”である状態の値の最小値（“０１０”）を求める場合を一例として示している。
【００６９】
まず、インクァエリ命令にてターゲットキュービットを最上位キュービットに指定するとともに、アンマスク値ＢＭを“０＊＊”（＊はDon't care）に指定する。インクァエリ命令を受けた並列処理プロセッサ１は、指定されたアンマスク値ＢＭに対応した状態“０００”、“００１”、“０１０”および“０１１”の中に、確率フラグが“１”である状態が存在するか否かを調べる。その結果、確率フラグが“１”である状態が存在するので、並列処理プロセッサ１は、アンサーレジスタ６内の最上位ビットに“０”を書き込む。
【００７０】
次に、インクァエリ命令にてターゲットキュービットを３つの中の真中のキュービットに指定するとともに、上記結果を反映させアンマスク値ＢＭを“００＊”に指定する。インクァエリ命令を受けた並列処理プロセッサ１は、アンマスク値ＢＭに対応した状態“０００”および“００１”に、確率フラグが“１”である状態が存在しないので、アンサーレジスタ６内の真中のビットに“１”を書き込む。
【００７１】
続いて、インクァエリ命令にてターゲットキュービットを最下位のキュービットに指定するとともに、アンマスク値ＢＭを“０１０”に指定する。インクァエリ命令を受けた並列処理プロセッサ１は、アンマスク値ＢＭに対応した状態“０１０”の確率フラグが“１”であるので、アンサーレジスタ６内の最下位ビットに“０”を書き込む。
以上のようにして、インクァエリ命令を用いた観測命令操作により並列処理プロセッサ１にて例えば最小値等の所望の解を求めることができる。
【００７２】
次に、本実施形態における並列処理プロセッサ１にて用いられる命令フォーマットについて説明する。
図１０は、命令フォーマットの一例を示す図である。図１０において、１０１は命令を示す命令フィールドであり、１０２はターゲットキュービットを示すターゲットフィールドである。
【００７３】
また、１０３、１０４は、制御キュービットに応じて演算を行うか否かを指示するためのコントロールフィールドであり、（コントロール＿０，コントロール＿１）＝（０，０）の場合には、アンサーレジスタ６の値を返す。また、（コントロール＿０，コントロール＿１）＝（０，１）および（１，０）の場合には、それぞれ制御キュービットの値が“１”および“０”の状態について演算を行い、（コントロール＿０，コントロール＿１）＝（１，１）の場合には、制御キュービットの値にかかわらず演算を行う。
【００７４】
図１１は、プログラム可能な論理素子（ＦＰＧＡ：Field Programmable Gate Array、ＣＰＬＤ：Complex Programmable Logic Device等）を用いて作成した本実施形態における並列処理プロセッサとソフトウェア・シミュレーションとの演算性能を示す図である。
【００７５】
図１１に示したように、並列処理プロセッサは、１．５Ｍゲートを有するプログラム可能な論理素子を用いて、プロセッサエレメントと６４ビットの記憶容量を有するローカルメモリとをそれぞれ１０２４個配置し、周波数６０ＭＨｚのクロック信号で１０２４個のプロセッサエレメントを並列動作させる。すなわち、当該並列処理プロセッサは、１６キュービット相当の演算が可能である。このとき、並列処理プロセッサは、毎秒１．６Ｍ回の演算（１．６Ｍoperations/sec）を実行する。
【００７６】
一方、動作クロック６００ＭＨｚであり、５１２ＫＢのキャッシュメモリを有するＣＰＵを用いたソフトウェア・シミュレーションでは、毎秒０．８Ｋ回の演算（０．８Ｋoperations/sec）を実行する。
したがって、本実施形態における並列処理プロセッサの演算性能は、ソフトウェア・シミュレーションの演算性能の２０００倍程度の演算性能を有することがわかる。
【００７７】
以上、詳しく説明したように本実施形態によれば、Ｎビット（Ｎは自然数）で表現可能な論理値の状態について演算を行う際、複数のプロセッサエレメント８がそれぞれ備えるロジック部４６にて、論理値の状態確率が０と異なる値を有するか否かを１ビットで示した確率フラグを用い、命令に応じた論理演算を互いに異なる論理値の状態について並列して行う。
【００７８】
これにより、複数のビットを用い複素数で表される状態確率についての複素積和演算に相当する演算を、１ビットの確率フラグを用いた簡単な論理演算により行うことができるので、演算機能を損なうことなく、論理演算処理だけで演算を行うことにより演算を高速で実行することができる。
【００７９】
また、複数のビットを用いて表していた１つの論理値の状態を、１ビットの確率フラグにより表すことで、プロセッサエレメント等の回路構成が非常に簡素になり、１つの論理値の状態当たりの演算に要する回路面積を大幅に低減することができ、並列処理プロセッサ内での演算の並列度を向上させ、演算規模を容易に大きくすることができる。
【００８０】
なお、上述した本実施形態における並列処理プロセッサ１は、ショアの量子アルゴリズムに限らず、その他の量子アルゴリズムについても適用可能であり、例えばデータベース検索に関するグローバー（Ｇｒｏｖｅｒ）の量子アルゴリズムについても適用することができる。
【００８１】
また、上記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。
【００８２】
【発明の効果】
以上説明したように、本発明によれば、Ｎビットで表現可能な論理値の状態について演算を行う際、互いに異なる所定の論理値の状態について複数の演算回路が供給される論理値の状態を示すフラグを用いて論理演算を並列に行う。
これにより、複素積和演算で行っていた論理値の状態についての演算を論理演算で行うことができ、量子コンピュータと同等の演算機能を有しながらも、論理値の状態についての演算を高速で実行することができる。さらに、複数のビットで表されていた論理値の状態をフラグにより表すことで、１つの論理値の状態の演算に要する回路面積を小さくすることができ、演算速度を低下させることなく、演算規模の大規模化を容易に行うことができる。
【図面の簡単な説明】
【図１】量子アルゴリズムを説明するための図である。
【図２】本発明の実施形態による半導体演算装置を適用した並列処理プロセッサの一構成例を示すブロック図である。
【図３】プロセッサエレメントの要素的特徴を示す構成図である。
【図４】プロセッサエレメントの具体的な構成例を示す図である。
【図５】本実施形態における並列処理プロセッサでのパイプライン制御の一例を示す図である。
【図６】論理演算における真理値表を示す図である。
【図７】ロジック部の構成例を示す図である。
【図８】ハイパーキューブネットワークを説明するための概念図である。
【図９】観測命令操作を説明するための図である。
【図１０】命令フォーマットの一例を示す図である。
【図１１】本実施形態における並列処理プロセッサおよびソフトウェア・シミュレーションのそれぞれの演算性能を示す図である。
【図１２】複素数で表された状態の確率振幅を用いて演算を行うプロセッサエレメントの構成を示すブロック図である。
【符号の説明】
１並列処理プロセッサ
２制御部
３命令管理部
４パイプライン生成部
５命令キャッシュ
６アンサーレジスタ
７インタフェース
８_-1、８_-2、８_-3、… プロセッサエレメント（ＰＥ）
９_-1、９_-2、９_-3、… ローカルメモリ
１０ネットワーク[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a semiconductor arithmetic device, and in particular, 2 that can be expressed by N bits (N is a natural number). ^N It is suitable for use in a semiconductor arithmetic device that executes operations for all logical values in parallel.
[0002]
[Prior art]
A semiconductor arithmetic device (for example, CPU) provided in a conventional computer or the like performs an operation using a conventional operation method in which an arithmetic process, a conditional branch process, and the like are sequentially executed using one data value (logical value). Unlike the conventionally used calculation methods, quantum computers have been proposed that perform calculations using a new calculation method based on the principle of quantum mechanics.
[0003]
The quantum computer uses a superposition of states based on quantum mechanics configured in units of qubits (quantum bits), and performs operations on the respective states in parallel by performing quantum mechanical operations on the states. Therefore, in theory, the quantum computer can obtain operation results of all possible states simultaneously and instantaneously by performing only one operation for all possible states (data values that can exist as values). it can.
[0004]
Since the quantum computer uses superposition of states based on quantum mechanics, it has been realized in a physical system that can observe phenomena based on quantum mechanics using nuclear magnetic resonance, microwaves, lasers, and the like. Recently, in quantum computers, 2 ⁷ There has been proposed an actual physical system capable of computing up to 7 qubits in which a single state exists simultaneously.
[0005]
[Problems to be solved by the invention]
However, the conventional quantum computer uses an actual physical system capable of observing a phenomenon based on quantum mechanics, so that the apparatus itself is very large and is not practical. Furthermore, the quantum computer needs to perform an operation so that the states change while being correlated with each other, and a phase state that exponentially increases with an increase in the number of qubits is obtained by using one physical system. It becomes difficult to store in the device.
In addition, quantum computers use quantum physical phenomena, and it is difficult to realize a quantum computer as it is using an integrated circuit using physical phenomena based on classical classical electromagnetics.
[0006]
Therefore, the present inventors provide a plurality of processor elements (arithmetic circuits) corresponding to respective states (logical values) superimposed based on quantum mechanics, and connect them so that they can communicate with each other. A parallel processing processor described in Japanese Patent Application No. 2001-279286 that realizes the technology of a quantum computer using an integrated circuit has been proposed. This parallel processing processor performs the calculation simultaneously on the state probabilities of the states corresponding to the plurality of processor elements, and operates the plurality of processor elements in parallel so as to store the obtained calculation results, thereby similar to the quantum computer. Realize arithmetic functions.
[0007]
FIG. 12 is a block diagram showing an outline of a configuration of a processor element included in the parallel processor proposed by the present inventors. In FIG. 12, the first state (target qubit value “0”) and the second state (target qubit) are different only in the value of the target qubit that is focused on as a calculation target and the other qubit values are equal. The processor elements 121 and 126 respectively corresponding to the value “1”) are shown as an example.
[0008]
Here, in the quantum calculation by the quantum computer, the state probability corresponding to the data is expressed using complex numbers including phase information, and the operation performed in the quantum calculation is unitary transformation. That is, in the quantum calculation, a matrix operation using a unitary matrix is performed on the state probability represented by a complex number, so that one processor element has two multipliers and one adder as shown in FIG. Then, multiply and accumulate complex numbers.
[0009]
If the coefficient values of each component of the unitary matrix are U1, U2, U3, and U4, the processor element 121 shown in FIG. 12 multiplies the state probability (| 0>) of the first state and the coefficient value (U1). , And the state probability (| 1>) of the second state and the coefficient value (U2) are multiplied by multipliers 122 and 123, respectively. Further, the outputs of the multipliers 122 and 123 are added by the adder 124, and the calculation result is stored in the register 125 as the state probability (| 0> ′) of the first state after the calculation.
[0010]
Similarly, the processor element 126 multiplies the first state state probability (| 0>) and the coefficient value (U3), and the second state state probability (| 1>) and the coefficient value (U4). Are multiplied by the multipliers 127 and 128, and the outputs of the multipliers 127 and 128 are added by the adder 129, and then stored in the register 130 as the state probability (| 1>') of the second state after the calculation. Is done.
[0011]
However, in the parallel processor configured using the processor elements 121 and 126 shown in FIG. 12, the state probability represented by a complex number has a plurality of bits (for example, 8 bits each) in the real part and the imaginary part. Shown. Therefore, the parallel processing processor requires a great deal of time for the arithmetic processing to perform the complex product-sum operation on the state probability expressed using a plurality of bits, and cannot perform the operation at high speed. There was a problem.
[0012]
Further, the parallel processing processor needs to include two multipliers and one adder corresponding to a plurality of bit operations in each processor element, and a circuit area (circuit scale) for constituting one processor element is required. ) Is large, and the number of processor elements increases exponentially with the increase in the number of qubits. Therefore, there is a problem that it is not easy to increase the operation scale.
[0013]
The present invention has been made in view of such problems, and an object of the present invention is to enable computation to be executed at high speed while having the same computation function as that of a quantum computer. The second object of the present invention is to make it possible to easily increase the scale of the operation and to execute the operation at high speed.
[0014]
[Means for Solving the Problems]
The semiconductor arithmetic device of the present invention is a semiconductor arithmetic device that performs operations in parallel for all logical value states that can be expressed by N bits (N is a natural number) and holds the results of each operation. When performing an operation on a value state, a logic operation is performed using a flag indicating the state of the supplied logical value, and a plurality of operation circuits for holding operation results are provided, and the plurality of operation circuits have different logic values. It is characterized by performing operations in parallel on the state.
[0015]
Another feature of the semiconductor processing device of the present invention is that the flag indicating the logical value state is a flag corresponding to the probability amplitude indicating the logical value state.
Another feature of the semiconductor processing device of the present invention is that the flag indicating the logical value state indicates whether or not the probability amplitude indicating the logical value state is a value different from zero. Features.
[0016]
Another feature of the semiconductor arithmetic device according to the present invention is that the semiconductor arithmetic device further includes a plurality of storage circuits that are provided corresponding to the arithmetic circuits and store the states of a plurality of logical values, and the plurality of storage circuits store the plurality of storage circuits. The states of the plurality of logical values are different from one another for each storage circuit, and the arithmetic circuit can calculate the states of the plurality of logical values stored in the corresponding storage circuit.
Another feature of the semiconductor arithmetic device according to the present invention is that the storage circuit stores a flag indicating the state of the logical value.
[0017]
Another feature of the semiconductor arithmetic device of the present invention is that the plurality of arithmetic circuits are connected so as to communicate with each other via a network.
Another feature of the semiconductor arithmetic device according to the present invention is that the network connects arithmetic circuits that perform an arithmetic operation on a logical value state in which the Hamming distance of the logical value is 1, so that at least communication is possible. And
Another feature of the semiconductor arithmetic device according to the present invention is that the network connects arithmetic circuits that perform an arithmetic operation on a logical value state in which the Hamming distance of the logical value is 1 so that they can communicate with each other in a hypercube form. It is characterized by that.
[0018]
Another feature of the semiconductor arithmetic device according to the present invention is that the semiconductor arithmetic device further includes a register that stores a solution obtained by an observation instruction operation based on an arithmetic result regarding a logical value state.
[0019]
Another feature of the semiconductor arithmetic device according to the present invention is that the arithmetic circuit sequentially performs different processes in parallel when performing arithmetic operations on a plurality of logical value states.
Another feature of the semiconductor arithmetic device according to the present invention is that the arithmetic circuit indicates a flag indicating a state of a predetermined logical value and a state of a logical value that is different from the predetermined logical value only in the value of an operation target bit. A logical operation using a flag is performed.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
A parallel processor to which a semiconductor arithmetic device according to an embodiment of the present invention is applied realizes the technology of a quantum computer using an integrated circuit, and performs an operation in the same manner as a known quantum algorithm for performing quantum computation. Is done.
[0021]
First, the quantum algorithm will be described. FIG. 1 is a diagram for explaining the flow of a quantum algorithm, and a quantum algorithm including a Shore algorithm is generally composed of four stages as shown in FIG. FIG. 1 shows an example of 8 qubits, and a solid line extending in the horizontal direction corresponds to each qubit Q1 to Q8.
Hereinafter, each stage will be described.
[0022]
In the first stage (Stage 1), Walsh-Hadamard Transformation (hereinafter referred to as “WH transformation”) S1. _-1 , S1 _-2 , S1 _-3 , S1 _-Four ,... Are used to distribute the probability to a predetermined state. Thereby, in the first stage, a probability equal to a predetermined state is assigned, and an initial state of superposition of states based on quantum mechanics is generated.
[0023]
In the second stage (Stage 2), knot (NOT) conversion (control NOT operation) S2 _-1 , S2 _-2 , S2 _-3 ,... Are used to exchange the probabilities assigned in the first stage between the states. This probability exchange operation in the second stage corresponds to an addition operation or a multiplication operation in a conventional computer equipped with a CPU or the like.
[0024]
In FIG. 1, for example, the knot transformation S2 _-1 Indicates that the control queue bit is Q3 and the target queue bit is Q8. At this time, if the value of the control qubit Q3 is “1”, only the value of the target qubit Q8 is different, and the probability is exchanged between the states of the other qubits having the same value. On the other hand, if the value of the control qubit Q3 is “0”, the value is held without performing the probability exchange. Knot conversion S2 _-2 , S2 _-3 The same applies to.
[0025]
In the third stage (Stage 3), WH conversion S3 _-1 , S3 _-3 , ... and phase shift conversion S3 _-2 , S3 _-Four The solution is converged to one point by interference using quantum Fourier transformation (hereinafter referred to as “QFT”). Further, in the fourth stage (Stage 4), a solution is obtained by performing observation S4.
[0026]
The knot transformation S2 described above _-1 Similarly to, for example, phase shift conversion S3 _-2 Indicates that the control queue bit is Q8 and the target queue bit is Q7. Phase shift conversion S3 _-2 If the value of the control qubit Q8 is “1”, phase rotation is performed with respect to the target qubit Q7, and if the value of the control qubit Q8 is “0”, the value is held. Phase shift conversion S3 _-Four The same applies to.
[0027]
In the quantum algorithm as shown in FIG. 1 above, the present inventors have a first stage for distributing a probability to a predetermined state by WH conversion, and between states according to the value of the control qubit, etc. Until the end of the second stage for exchanging probabilities, it was found that the state probability (probability amplitude) of each state was only one of 0 or the value p (0 <p ≦ 1). . That is, in the first and second stages of the quantum algorithm, whether or not the value has a value (the value is 0 or p) without using the state probability value itself indicated by a plurality of bits. It has been found that an equivalent calculation can be performed without losing the amount of information (without changing the amount of information).
[0028]
Therefore, the parallel processing processor according to the present embodiment uses a probability flag that indicates whether or not the state probability of each state has a value for all logical value states that can be expressed by N (N is a natural number) bits. Further, an operation corresponding to unitary conversion is executed by a logic operation. As a result, the parallel processing processor according to the present embodiment realizes an arithmetic function equivalent to that of the quantum computer with a simple configuration while simplifying the configuration of the processor element, and reduces the circuit area required for the processor element. It is possible to reduce the parallelism of operations and the amount of operations per unit gate.
[0029]
FIG. 2 is a block diagram illustrating a configuration example of the parallel processing processor 1 in the present embodiment.
In FIG. 2, the control unit 2 includes an instruction management unit 3 and a pipeline generation unit 4, and includes a plurality of processor elements 8. _-n Each functional unit in the parallel processor 1 such as (n is a subscript and n = 1, 2, 3,...) Is controlled. The interface 7 is used to exchange commands and data between the parallel processing processor 1 and external devices (external circuits) connected to the outside.
[0030]
The instruction management unit 3 outputs an instruction supplied from an external device via the interface 7 to the pipeline generation unit 4 and outputs a solution obtained by an observation instruction operation described later to the external device via the interface 7. Or The instruction management unit 3 includes an instruction cache 5 for temporarily storing instructions supplied from an external device, and an answer register 6 for storing solutions obtained by observation instruction operations.
[0031]
The pipeline generation unit 4 is based on the instruction supplied from the instruction management unit 3 to each processor element 8. _-n Control instructions are output to the plurality of processor elements 8 respectively. _-n Control to perform pipeline operations in parallel.
[0032]
Processor element 8 _-n Performs a calculation on a predetermined state in a state of being overlapped based on quantum mechanics in accordance with a control instruction supplied from the pipeline generation unit 4. Processor element 8 _-n Is a local memory 9 for storing the state probabilities of a plurality of states. _-n (N is a subscript and n = 1, 2, 3,...). That is, the processor element 8 of the present embodiment _-n Each of these is associated with a plurality of states in a superposed state based on quantum mechanics.
[0033]
Each processor element 8 _-n The state associated with each of the plurality of processor elements 8 _-n And any state (logical value) corresponding to the number of qubits that can be calculated by the parallel processor 1 is any processor element 8. _-n Is associated with. In the present embodiment, the state probability of each state is indicated by using a 1-bit probability flag. _-n Processor element 8 _-n It suffices to have a storage capacity equal to the number of bits equal to the number of states associated with.
Processor element 8 _-n Are connected to each other via the network 10 so that they can communicate with each other.
[0034]
Next, the processor element (PE) in this embodiment will be described in detail.
FIG. 3 is a configuration diagram showing elemental features of the processor element in the present embodiment. In FIG. 3, the processor element 31 corresponding to the first state where the value of the target qubit is “0”, and the processor element 34 corresponding to the second state where the value of the target qubit is “1”; Is shown as an example. In the first and second states, the values of the qubits excluding the target qubit are equal.
[0035]
In FIG. 3, the processor element 31 includes a logic unit 32 that performs a logical operation and a register 33 that stores an operation result by the logic unit 32. The logic unit 32 receives the probability flag PA0 related to the first state and the probability flag PA1 related to the second state, and performs a predetermined logical operation using the probability flags PA0 and PA1. Further, the logic unit 32 outputs the calculation result to the register 33 as the probability flag PB0 related to the first state after the calculation. The register 33 stores a probability flag PB0 input from the logic unit 32.
[0036]
Similarly, the processor element 34 includes a logic unit 35 that performs a logical operation and a register 36 that stores the operation result. The logic unit 35 receives the probability flags PA0 and PA1 related to the first and second states, performs a predetermined logical operation, and then stores the operation result as the probability flag PB1 related to the second state after the operation. Output to. The register 36 stores a probability flag PB1 input from the logic unit 35.
[0037]
FIG. 4 is a block diagram illustrating a specific configuration example of the processor element in the present embodiment. The processor element shown in FIG. 4 has a six-stage pipeline structure in order to increase the speed of arithmetic processing and reduce the circuit area.
In FIG. 4, 8 is a processor element, 9 is a local memory, and the processor element 8 shown in FIG. _-n , Local memory 9 _-n Correspond to each.
[0038]
The processor element 8 includes six registers 41 to 45 and 47 and a logic unit 46.
The registers 41, 42, and 43 are connected in series in the order of the registers 41, 42, and 43 between the output terminal of the local memory 9 and the first input terminal of the logic unit 46, and the registers 41, 42, and 43 are input Are stored temporarily and output to the registers 42 and 43 and the logic unit 46 connected to the next stage, respectively. That is, the probability flag read from a predetermined storage area of the local memory 9 is sequentially transmitted by the registers 41, 42, 43 and input to the logic unit 46. The register 41 outputs the stored probability flag to a register corresponding to the register 44 shown in FIG. 4 in the other processor element.
[0039]
The registers 44 and 45 are connected in series to the second input terminal of the logic unit 46, and the registers 44 and 45 temporarily store the input probability flags, and the registers 45 and 45 connected to the next stage Each is output to the logic unit 46. That is, the probability flags supplied from the registers corresponding to the register 41 shown in FIG. 4 in the other processor elements are sequentially transmitted by the registers 44 and 45 and input to the logic unit 46.
[0040]
The logic unit 46 has a register 47, and uses probability flags input from the registers 43 and 45 via the first and second input terminals in accordance with the control instructions supplied from the pipeline generation unit 4, respectively. Perform the logical operation. The logic unit 46 temporarily stores the result of the logical operation in the register 47 and then writes it in the predetermined storage area of the local memory 9. Note that the register 47 may be provided outside the logic unit 46.
[0041]
The local memory 9 is for storing probability flags of a plurality of states associated with the processor element 8. By providing the local memory 9 in this way and associating a plurality of states with one processor element 8, it is possible to effectively utilize the processor element 8 rather than assign one state to one processor element. The circuit area of the processor element required for the processing can be reduced.
[0042]
Here, since the probability flag related to one state is 1 bit, the local memory 9 only needs to have at least a storage area having the same number of bits as the number of associated states. For example, the number of states is 6 In the case of 64 corresponding to qubits, the local memory 9 may have at least a 64-bit storage area. Similarly, the registers 41 to 45 and 47 only need to store at least 1-bit information.
[0043]
In the processor element 8 shown in FIG. 4, the registers 41 to 45 and 47 operate in synchronization with a predetermined timing signal such as a clock signal, and 6-stage pipeline processing is performed as will be described later.
FIG. 5 is a diagram illustrating an example of pipeline control in the parallel processor according to the present embodiment. In FIG. 5, CLK is a timing signal such as a clock signal used for pipeline processing, and time advances from the top to the bottom in FIG.
[0044]
Pipeline control is performed based on a control instruction supplied from the pipeline generation unit 4 to each processor element.
First, the pipeline generation unit 4 supplies an address of the local memory 9 in which a probability flag in a predetermined state is stored, and instructs the processor element 8 to read the probability flag from the local memory 9 (read address). The probability flag read from the local memory 9 is stored in the register 41 shown in FIG.
[0045]
The pipeline generation unit 4 controls the network 10 so that a probability flag output from the register 41 to another processor element is supplied to a desired processor element (send switch). As a result, the probability flag read from the local memory 9 associated with itself is stored in the register 42, and the probability flag supplied from another processor element is stored in the register 44.
[0046]
Next, the pipeline generation unit 4 instructs the processor element 8 on a target qubit to be noted as a calculation target (target). At this time, the probability flags used for the calculation are transmitted to the registers 43 and 45, respectively. Thereafter, the pipeline generation unit 4 instructs the processor element 8 to perform a logic operation on the probability flag (operation control), the logic unit 46 performs the operation, and a new probability flag as the operation result is stored in the register 47. The
[0047]
Next, the pipeline generation unit 4 notifies the address and write permission of the local memory 9 for storing the probability flag obtained as a result of the operation, and writes the probability flag to the local memory 9 to the processor element 8. Instruct (write address, write enable).
[0048]
As described above, each processor element 8 reads out a probability flag in a predetermined state from the local memory 9 and performs an operation, and writes the calculated probability flag back to the local memory 9. In the above description, the process flow for one state is described separately for each process. However, each process is sequentially executed in the same manner as in normal pipeline control, and different states are described. Needless to say, the process is executed.
[0049]
Next, logical operations executed by the processor element 8 in this embodiment will be described. As described above, the logic unit 46 performs logical operations corresponding to the first and second stages in the quantum algorithm shown in FIG. 1, that is, the WH conversion and the NOT (NOT) conversion, respectively.
[0050]
FIG. 6 is a diagram showing a truth table of logic operations executed by the logic unit 46.
In FIG. 6, the flag value “0” of the probability flag indicates that the state probability amplitude value is 0, and the flag value “1” indicates that the state probability amplitude value is p. .
[0051]
In FIG. 6, “| 0>” indicates a state in which the values of the other qubits excluding the target qubit are the same before conversion (before the logical operation) and the value of the target qubit is “0”. The state where the value of the target qubit is “1” is indicated by “| 1>”. Note that the states after conversion of the states | 0> and | 1> are indicated by “| 0> ′” and “| 1> ′”, respectively.
[0052]
As shown in FIG. 6, in the knot conversion, the flag values of the converted state | 0>'and state | 1>' obtained as the calculation results are the flags of the state | 1> and the state | 0> before the conversion, respectively. Value. Therefore, the logical operation corresponding to the knot transformation performed in the logic unit 46 is an operation for switching the flag values of the state | 0> and the state | 1>.
[0053]
The calculation may be executed by selecting a probability flag related to a state in which the value of the target qubit is different when the knot transformation is instructed, or flag values of the state | 0> and the state | 1> May be simply connected to signal lines for outputting flag values of state | 1>'and state | 0>', respectively.
[0054]
In the WH conversion shown in FIG. 6, the flag value of the state | 0> ′ after the conversion becomes “1” because the flag values of the state | 0> and the state | 1> before the conversion are ( | 0> flag value, | 1> flag value) = (0, 1), (1, 0), (1, 1). Therefore, the logical operation corresponding to the WH conversion performed in the logic unit 46 for the state | 0> ′ is a logical sum (OR) operation using the flag values of the state | 0> and the state | 1>. In the calculation, as shown in FIG. 7A, the flag values of the state | 0> and the state | 1> are input into the logic unit 46, and the calculation result is output as the flag value of the state | 0>'. This is realized by providing an OR operation circuit 71.
[0055]
Similarly, in the WH conversion, the flag value of the state | 1> ′ after the conversion becomes “1” because the flag values of the state | 0> and the state | 1> before the conversion are (| 0> Flag value, flag value of | 1>) = (0, 1), (1, 0). Therefore, the logical operation corresponding to the WH conversion performed in the logic unit 46 for the state | 1> ′ is an exclusive OR (EX (exclusive)) using the flag values of the state | 0> and the state | 1>. -OR) operation. In the calculation, as shown in FIG. 7B, the flag value of the state | 0> and the state | 1> is input into the logic unit 46, and the calculation result is output as the flag value of the state | 1>'. This is realized by providing the EXOR operation circuit 72.
[0056]
Next, the network 10 will be described.
The network 10 in this embodiment includes each processor element 8. _-n Is another processor element 8 _-n , Each processor element 8 can be supplied with a probability flag related to the state used in the logical operation. _-n May be connected so that they can communicate with each other. For example, the network 10 can be appropriately constructed by applying a hypercube network whose conceptual diagram is shown in FIG. 8 to the network 10.
[0057]
FIG. 8 is a conceptual diagram for explaining a hypercube network applicable to the network 10, and FIG. 8 shows a case of 3 qubits as an example for easy understanding of the explanation. As shown in FIG. 8, it is assumed that the vertices 80 to 87 of the hexahedron correspond to states “000” to “111” (binary logical values), respectively.
[0058]
Here, as described above, the processor element 8 _-n The calculation according to is performed using probability flags relating to states in which only the value of the target qubit is different. That is, the processor element 8 _-n In the logical operation according to, a probability flag relating to a logical value state in which the Hamming distance is “1” with respect to a logical value indicating the state to be calculated is used.
[0059]
For example, in the calculation for the state “000”, any one of the probability flags of the states “001”, “010”, and “100” is used, so that in FIG. 8, the vertex 80 and the vertices 81, 82, and 84 are Each side is connected to a communication line. That is, the network 10 can communicate the processor element corresponding to the state “000” and the processor elements corresponding to the states “001”, “010”, and “100” via the communication lines NW1, NW2, and NW3. Connect like this.
[0060]
As for the processor elements corresponding to other states, the processor elements corresponding to the logical value states whose Hamming distance is “1” can be communicated with the logical values indicating the states to be operated in the same manner as described above. By connecting in such a manner, the network 10 to which the hypercube network is applied can be constructed.
[0061]
Next, the operation of the parallel processor 1 in this embodiment will be described in correspondence with the quantum algorithm shown in FIG.
In the following description, for convenience of explanation, the number of qubits is 8 (Q1 to Q8), the least significant qubit is Q1, and the most significant qubit is Q8. In addition, as the initial state of the parallel processor 1, only the probability flag related to the state “00000000” in which the values of all the qubits Q1 to Q8 are “0” is “1”, and the probability flags related to other states are It is assumed that it is “0”.
[0062]
First, when receiving an instruction corresponding to the WH conversion in the first stage shown in FIG. 1 from the external device via the interface 7, the parallel processing processor 1 receives the control unit 2 (the instruction management unit 3 and the pipe). Each processor element 8 performs a logical operation corresponding to the WH conversion and distributes the probability to a predetermined state by the line generator 4). _-n A control instruction is output to
[0063]
For example, when an instruction corresponding to WH conversion in which the target qubit is qubit Q1 is received, each processor element 8 _-n The logic section 46 performs a logical operation corresponding to the WH conversion, and the flag values of the states “00000000” and “00000001” become “1”. Further, when an instruction corresponding to the WH conversion in which the target queue bit is the queue bit Q2 is received, the flag values of the states “00000000”, “00000010”, “00000001”, and “00000011” are set to “ 1 ”.
In this way, the parallel processor 1 executes a logical operation corresponding to the WH conversion and distributes the probability to a predetermined state.
[0064]
Next, upon receiving an instruction corresponding to the knot transformation in the second stage from the external device via the interface 7, the control unit 2 in the parallel processor 1 performs a logical operation corresponding to the knot transformation. Each processor element 8 _-n A control instruction is output to As a result, each processor element 8 _-n The logic unit 46 performs a logical operation corresponding to knot transformation, and exchanges probability flag values between predetermined states.
As described above, operations corresponding to the first and second stages in the quantum algorithm shown in FIG.
[0065]
Here, in the quantum algorithm shown in FIG. 1, the solution is converged by applying a quantum Fourier transform or the like to the operation result up to the second stage in the third stage, and observed in the fourth stage. To find a solution. However, since the parallel processor 1 in this embodiment performs an operation using a probability flag indicating whether or not the state probability has a value, not the state probability represented by a complex number, it is shown in FIG. Like the quantum algorithm, the solution cannot be obtained by converging the solution by quantum Fourier transform or the like.
[0066]
Therefore, in the parallel processor 1 in the present embodiment, a solution is obtained by performing an observation instruction operation using an inquiry instruction. In the observation instruction operation using the inquiry instruction, first, the inquiry instruction is issued by designating the target qubit and the unmask value. The parallel processor 1 that has received the issued inquiry instruction checks whether there is a state having the probability flag “1” in the state corresponding to the designated unmask value.
[0067]
In accordance with the result, the parallel processor 1 writes a value (“0” or “1”) in the field corresponding to the designated target qubit in the answer register 6 of the control unit 2.
By repeatedly performing the above operation, the parallel processor 1 in the present embodiment stores the solution in the answer register 6 and outputs it via the interface 7 in response to a request from the outside.
[0068]
FIG. 9 is a diagram showing a specific example of the observation command operation using the inquiry command. In FIG. 9, it is assumed that the probability flags relating to the states “010”, “100”, “110” and “111” are “1”, and the minimum value of the values in the state where the probability flag is “1” as a solution ( The case of obtaining “010”) is shown as an example.
[0069]
First, the target qubit is designated as the most significant qubit by the inquiry instruction, and the unmask value BM is designated as “0 **” (* is Don't care). The parallel processor 1 that has received the inquiry command has a state in which the probability flag is “1” among the states “000”, “001”, “010”, and “011” corresponding to the specified unmask value BM. Check if it exists. As a result, since there is a state where the probability flag is “1”, the parallel processing processor 1 writes “0” in the most significant bit in the answer register 6.
[0070]
Next, the target qubit is designated as the middle qubit among the three by the inquiry instruction, and the unmask value BM is designated as “00 *” reflecting the above result. The parallel processor 1 that has received the inquiry instruction does not have a state in which the probability flag is “1” in the states “000” and “001” corresponding to the unmask value BM. Write “1”.
[0071]
Subsequently, the target qubit is designated as the least significant qubit by the inquiry instruction, and the unmask value BM is designated as “010”. The parallel processor 1 that has received the inquiry command writes “0” to the least significant bit in the answer register 6 because the probability flag of the state “010” corresponding to the unmask value BM is “1”.
As described above, a desired solution such as a minimum value can be obtained by the parallel processing processor 1 by the observation instruction operation using the inquiry instruction.
[0072]
Next, an instruction format used in the parallel processor 1 in this embodiment will be described.
FIG. 10 is a diagram illustrating an example of an instruction format. In FIG. 10, 101 is an instruction field indicating an instruction, and 102 is a target field indicating a target queue bit.
[0073]
Reference numerals 103 and 104 denote control fields for instructing whether or not to perform an operation in accordance with the control qubit. When (control_0, control_1) = (0, 0), the answer register 6 Returns the value of. When (control_0, control_1) = (0, 1) and (1, 0), the calculation is performed for the state where the value of the control queue bit is “1” and “0”, respectively (control_0 , Control_1) = (1, 1), the calculation is performed regardless of the value of the control qubit.
[0074]
FIG. 11 is a diagram showing the calculation performance of the parallel processing processor and software simulation created in this embodiment using programmable logic elements (FPGA: Field Programmable Gate Array, CPLD: Complex Programmable Logic Device, etc.). .
[0075]
As shown in FIG. 11, the parallel processor uses 1024 programmable elements having 1.5M gates to arrange 1024 processor elements and local memories each having a 64-bit storage capacity, and has a frequency of 60 MHz. 1024 processor elements are operated in parallel with the clock signal. That is, the parallel processor can perform an operation equivalent to 16 qubits. At this time, the parallel processor executes 1.6 M operations / second (1.6 M operations / sec).
[0076]
On the other hand, in a software simulation using a CPU having an operation clock of 600 MHz and a 512 KB cache memory, 0.8K operations / second (0.8K operations / sec) are executed.
Therefore, it can be seen that the computing performance of the parallel processor in the present embodiment is about 2000 times that of software simulation.
[0077]
As described above in detail, according to the present embodiment, when a calculation is performed on a logical value state that can be expressed by N bits (N is a natural number), the logic unit 46 provided in each of the plurality of processor elements 8 performs logic processing. Using a probability flag indicating by 1 bit whether or not the state probability of the value has a value different from 0, a logical operation according to the instruction is performed in parallel for different logical value states.
[0078]
As a result, an operation equivalent to a complex product-sum operation on a state probability represented by a complex number using a plurality of bits can be performed by a simple logical operation using a 1-bit probability flag, thus impairing the operation function. In addition, the operation can be executed at high speed by performing the operation only by the logical operation processing.
[0079]
Also, by expressing the state of one logical value represented by using a plurality of bits by a 1-bit probability flag, the circuit configuration of the processor element and the like becomes very simple, and the state per state of one logical value is reduced. The circuit area required for computation can be greatly reduced, the degree of parallelism of computation within the parallel processing processor can be improved, and the scale of computation can be easily increased.
[0080]
Note that the parallel processor 1 in the present embodiment described above is not limited to the Shore quantum algorithm, but can be applied to other quantum algorithms. For example, the parallel processor 1 can also be applied to a Grover quantum algorithm related to database search. it can.
[0081]
In addition, each of the above-described embodiments is merely an example of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.
[0082]
【The invention's effect】
As described above, according to the present invention, when performing an operation on a logical value state that can be expressed by N bits, the logical value states to which a plurality of arithmetic circuits are supplied for different predetermined logical value states are determined. Perform logical operations in parallel using the flags shown.
As a result, it is possible to perform an arithmetic operation on a logical value state, which has been performed by a complex product-sum operation, by a logical operation. Can be executed. Furthermore, by representing the state of the logical value represented by a plurality of bits with a flag, the circuit area required for computation of one logical value state can be reduced, and the computation scale can be reduced without reducing the computation speed. Can be easily scaled up.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining a quantum algorithm;
FIG. 2 is a block diagram showing a configuration example of a parallel processing processor to which the semiconductor arithmetic device according to the embodiment of the present invention is applied.
FIG. 3 is a block diagram showing elemental features of a processor element.
FIG. 4 is a diagram illustrating a specific configuration example of a processor element.
FIG. 5 is a diagram illustrating an example of pipeline control in the parallel processing processor according to the present embodiment.
FIG. 6 is a diagram showing a truth table in a logical operation.
FIG. 7 is a diagram illustrating a configuration example of a logic unit.
FIG. 8 is a conceptual diagram for explaining a hypercube network.
FIG. 9 is a diagram for explaining an observation command operation;
FIG. 10 is a diagram illustrating an example of an instruction format.
FIG. 11 is a diagram showing the calculation performance of each of the parallel processing processor and software simulation in the present embodiment.
FIG. 12 is a block diagram illustrating a configuration of a processor element that performs an operation using a probability amplitude in a state represented by a complex number.
[Explanation of symbols]
1 Parallel processor
2 Control unit
3 Instruction Management Department
4 Pipeline generator
5 Instruction cache
6 Answer Register
7 Interface
8 _-1 , 8 _-2 , 8 _-3 ... Processor element (PE)
9 _-1 , 9 _-2 , 9 _-3 , ... local memory
10 network

Claims

A semiconductor arithmetic device that performs operations in parallel for all logical value states that can be expressed by N bits (N is a natural number) and holds the results of each operation,
When performing an operation on a predetermined state of the logical value, a logical operation is performed using a flag indicating the state of the supplied logical value, and a plurality of operation circuits that hold the operation result are provided.
The semiconductor arithmetic device, wherein the plurality of arithmetic circuits perform arithmetic operations in parallel on states of different logical values.

2. The semiconductor computing device according to claim 1, wherein the flag indicating the logical value state is a flag corresponding to a probability amplitude indicating the logical value state.

2. The semiconductor arithmetic device according to claim 1, wherein the flag indicating the logical value state indicates by 1 bit whether or not the probability amplitude indicating the logical value state is a value different from zero.

A plurality of storage circuits for storing the states of the plurality of logical values provided corresponding to the arithmetic circuits, respectively;
The states of the plurality of logical values stored in the plurality of storage circuits are different from each other for each of the storage circuits,
4. The semiconductor arithmetic device according to claim 1, wherein the arithmetic circuit is capable of calculating the states of the plurality of logical values stored in the corresponding storage circuit. 5.

5. The semiconductor arithmetic device according to claim 4, wherein the storage circuit stores a flag indicating the state of the logical value.

The semiconductor arithmetic device according to claim 1, wherein the plurality of arithmetic circuits are connected so as to communicate with each other via a network.

7. The semiconductor arithmetic device according to claim 6, wherein the network connects arithmetic circuits that perform arithmetic operations on a logical value state having a logical Hamming distance of 1 so as to be able to communicate with each other.

7. The semiconductor arithmetic device according to claim 6, wherein said network connects arithmetic circuits for performing arithmetic operations on a logical value state having a logical Hamming distance of 1 so as to communicate with each other in a hypercube form. .

9. The semiconductor arithmetic device according to claim 1, further comprising a register that stores a solution obtained by an observation instruction operation based on an arithmetic result regarding the state of the logical value. 10.

The semiconductor arithmetic device according to claim 1, wherein the arithmetic circuit sequentially performs different processes in the arithmetic operation in parallel when performing arithmetic operations on a plurality of logical value states.

The arithmetic circuit performs a logical operation using a flag indicating the state of the predetermined logical value and a flag indicating a state of a logical value that is different from the predetermined logical value only in the value of an operation target bit. The semiconductor arithmetic device according to claim 1.