JP2005078407A

JP2005078407A - Data search method, data search device, data search program and recording medium having the program recorded thereon

Info

Publication number: JP2005078407A
Application number: JP2003308533A
Authority: JP
Inventors: Mayumi Ooto; 真由美大音; Tomonori Izumitani; 知範泉谷; Yasuhito Kono; 泰人河野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-09-01
Filing date: 2003-09-01
Publication date: 2005-03-24

Abstract

<P>PROBLEM TO BE SOLVED: To search at high speed a sequence having the highest similarity to a given genome sequence (a base sequence, amino acid) from a database. <P>SOLUTION: An input sequence is defined as sequence 1, and a sequence fetched from the database as sequence 2. A data search method executes: superposition processing (Step 5) to make the sequence 2 in an superposition state by using a quantum algorithm; processing to apply quantum arithmetic operations complying with dynamic programming to the sequence 1 and the superposed sequence 2 by using a quantum query and derive (Step 6) the score of pair-wise alignment between the sequence 1 and the sequence 2; then processing to repeat Grover processes consisting of application of quantum oracle and diffusion conversion to the sequence 2 (Steps 7 to 10); and processing to obtain a candidate y' for a solution by observing a qubit corresponding to the sequence 2 (Step 11). <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、量子コンピュータを用いてあるいは量子アルゴリズムを適用して、与えられたゲノム配列（塩基配列、アミノ酸配列）と最も類似度が高い配列をデータベースから探索する方法及び装置に関する。 The present invention relates to a method and apparatus for searching a database for a sequence having the highest similarity to a given genome sequence (base sequence, amino acid sequence) using a quantum computer or applying a quantum algorithm.

生物学に情報科学を応用したバイオインフォマティクスの分野においては、生物が持つＤＮＡやＲＮＡにおける塩基配列や、タンパク質やペプチドにおけるアミノ酸配列などの解読が急速に進められている。本明細書では、塩基配列やアミノ酸配列を総称してゲノム配列と呼んでいる。解読された配列はデータベースに蓄積されている。そして、多数の配列の中で類似度の高い配列を発見することで、生物の新しい機能や進化の系統を解明することが可能になり、また、創薬や遺伝子治療に有効な知見を得ることができる。 In the field of bioinformatics, in which information science is applied to biology, the decoding of base sequences in DNA and RNA possessed by organisms, amino acid sequences in proteins and peptides, etc., is rapidly progressing. In this specification, a base sequence and an amino acid sequence are collectively called a genome sequence. Decoded sequences are stored in a database. By discovering sequences with high similarity among many sequences, it becomes possible to elucidate new functions and evolutionary systems of living organisms, and to obtain useful knowledge for drug discovery and gene therapy. Can do.

しかしながら、精度の高い類似度の計算を行うことは、大量の計算時間とメモリ空間とを要してコストが高くなるので、現実的ではない。そこで、近似的な計算方法を用いたアルゴリズムによって配列間の類似度を計算する手法が主流となってきている。
S. B. Needleman and C. D. Wunsch, Journal of Molecular Biology, Vol. 48, pp. 443-453 (1970) L. Grover, 28th Annual ACM Symposium on the Theory of Computing, pp. 212-219 (1996) M Boyer, G. Brassard, P. Hoyer and A. Tapp, 4th Workshop on Physics and Computation, pp. 36-43 (1996) C. Durr and P. Hoyer, "A quantum algorithm for finding the minimum," [online], 1999年1月7日、［2003年8月27日検索］、インターネット<URL: http://arXiv.org/abs/quant-ph/9607014> However, it is not realistic to calculate the similarity with high accuracy because a large amount of calculation time and memory space are required and the cost is increased. Therefore, a technique of calculating the similarity between sequences by an algorithm using an approximate calculation method has become mainstream.
SB Needleman and CD Wunsch, Journal of Molecular Biology, Vol. 48, pp. 443-453 (1970) L. Grover, 28th Annual ACM Symposium on the Theory of Computing, pp. 212-219 (1996) M Boyer, G. Brassard, P. Hoyer and A. Tapp, 4th Workshop on Physics and Computation, pp. 36-43 (1996) C. Durr and P. Hoyer, "A quantum algorithm for finding the minimum," [online], January 7, 1999, [August 27, 2003 search], Internet <URL: http://arXiv.org / abs / quant-ph / 9607014>

現在、ゲノム配列の解読が急激に進んでおり、データベースに蓄積されるデータ量も膨大になっている。データベースに蓄積されている配列の数をｎとすると、現在使用されている類似度計算アルゴリズムによる計算量はＯ（ｎ）の程度であるが、データベースに蓄積されるデータ量すなわちｎが加速度的に増大しているため、配列の類似度の算出のための計算量が膨張しつつある。配列間の類似性に基づいて配列の特徴や機能の解明をより高速に行うために、データベースを効率的に探索するアルゴリズムの開発が急務の課題となっている。 Currently, the decoding of genome sequences is rapidly progressing, and the amount of data stored in the database is enormous. When the number of sequences stored in the database is n, the amount of calculation by the similarity calculation algorithm currently used is about O (n), but the amount of data stored in the database, that is, n is accelerated. Since it is increasing, the amount of calculation for calculating the similarity of sequences is expanding. In order to elucidate the features and functions of sequences at higher speed based on the similarity between sequences, the development of algorithms for efficiently searching databases has become an urgent issue.

２個の配列の間の類似性は、一般に、ペアワイズアラインメント(pair-wise alignment)という手続きを経て計算されており、類似性が高いほどそのスコアも高くなる。一般に、ペアワイズアラインメントの計算では、ＤＰ（動的計画法）が用いられている（非特許文献１参照）。しかし、信頼性の高いＤＰの計算には、多くの計算量を必要とする。このため、実際には近似計算によって計算速度の向上を図ることが多いが、精度が劣るといった問題点がある。 The similarity between two sequences is generally calculated through a procedure called pair-wise alignment, and the higher the similarity, the higher the score. In general, DP (dynamic programming) is used in the calculation of pairwise alignment (see Non-Patent Document 1). However, calculation of DP with high reliability requires a large amount of calculation. Therefore, in practice, the calculation speed is often improved by approximate calculation, but there is a problem that the accuracy is inferior.

そこで本発明の目的は、与えられたゲノム配列に対して最も類似度の高い配列をデータベースから高速に探索するためのデータ探索方法及び装置を提供することにある。 Accordingly, an object of the present invention is to provide a data search method and apparatus for searching a database having a highest similarity to a given genome sequence at high speed.

現在広く利用されているコンピュータに比べ、原理的にはるかに高速に問題を解くことができる「量子コンピュータ」と呼ばれるコンピュータが提案されている。量子コンピュータは、量子力学で記述される物理現象に特有な現象である量子重ね合わせ状態を利用する。量子コンピュータの実現には、量子重ね合わせ状態を保持したまま演算を実行するキュービットと呼ばれる素子を使用する。量子コンピュータは、因数分解などの問題に対して現在のコンピュータ（古典的計算機）よりはるかに高速に計算を行うことができるため、世界中の多くの研究機関が開発が進められている。 Computers called “quantum computers” have been proposed that can solve problems much faster in principle than computers that are currently widely used. A quantum computer uses a quantum superposition state, which is a phenomenon peculiar to a physical phenomenon described by quantum mechanics. In order to realize a quantum computer, an element called a qubit that executes an operation while maintaining a quantum superposition state is used. Quantum computers are capable of computing much faster than current computers (classical computers) for problems such as factorization, and are being developed by many research institutions around the world.

本発明は、量子コンピュータの計算原理を用いたアルゴリズム（非特許文献２〜４参照）を利用して、ＤＰを用いた最適なペアワイズアラインメントのスコアを、従来のコンピュータよりも効率的に求めようとするものである。したがって、ゲノム配列のデータベースの中から、入力として与えられた配列と最も類似度が高いゲノム配列を発見することができる。ここで類似度とは、ペアワイズアラインメントのスコアである。 The present invention seeks to obtain an optimal pair-wise alignment score using DP more efficiently than a conventional computer by using an algorithm (see Non-Patent Documents 2 to 4) using a calculation principle of a quantum computer. To do. Therefore, a genome sequence having the highest similarity to the sequence given as an input can be found from the genome sequence database. Here, the similarity is a pair-wise alignment score.

すなわち本発明のデータ探索方法は、配列を格納したデータベース中から、入力配列として与えられたゲノム配列に最も類似した配列を探索するデータ探索方法であって、入力配列を入力配列記憶部から取り出して第１のキュービット列とする段階と、データベースを記憶するデータベース記憶部から配列を取り出して第２のキュービット列とし、第２のキュービット列を重ね合わせ状態とする重ね合わせ段階と、第１のキュービット列と重ね合わせ状態にある第２のキュービット列に対して動的計画法に対応する量子演算を行い、入力配列とデータベース記憶部から取り出された配列との間のペアワイズアラインメントのスコアを求める量子クエリ段階と、を有する。 That is, the data search method of the present invention is a data search method for searching for a sequence most similar to a genome sequence given as an input sequence from a database storing sequences, and taking out the input sequence from the input sequence storage unit. A first qubit string; a superposition stage in which an array is extracted from a database storage unit for storing a database to form a second qubit string, and the second qubit string is superposed; and a first qubit string A quantum query step of performing a quantum operation corresponding to the dynamic programming on the second qubit string in a superposed state and obtaining a score of pairwise alignment between the input array and the array extracted from the database storage unit And having.

本発明のデータ探索装置は、配列を格納したデータベース中から、入力配列として与えられたゲノム配列に最も類似した配列を探索するデータ探索装置であって、入力された入力配列を記憶する入力配列記憶部と、出力する出力配列を記憶する出力配列記憶部と、ゲノム配列のデータベースを保存するデータベース記憶部と、演算の途中経過を一時的に記憶する入力一時保持記憶部と、入力配列記憶部、出力配列記憶部及び入力一時保持記憶部との間で情報の書き込み・読み出しを行うとともに量子アルゴリズムによる演算を行う演算部と、を備え、演算部は、入力配列を入力配列記憶部から取り出して第１のキュービット列とする処理と、データベース記憶部から配列を取り出して第２のキュービット列とし、第２のキュービット列を重ね合わせ状態とする重ね合わせ処理と、第１のキュービット列と重ね合わせ状態にある第２のキュービット列に対して動的計画法に対応する量子演算を行い、入力配列とデータベース記憶部から取り出された配列との間のペアワイズアラインメントのスコアを求める量子クエリ処理と、を実行する。 A data search apparatus according to the present invention is a data search apparatus for searching a sequence most similar to a genome sequence given as an input sequence from a database storing sequences, and an input sequence storage for storing an input sequence input An output sequence storage unit that stores an output sequence to be output, a database storage unit that stores a database of genome sequences, an input temporary storage unit that temporarily stores the progress of computation, an input sequence storage unit, An arithmetic unit that writes and reads information between the output array storage unit and the input temporary storage unit and performs an operation by a quantum algorithm, and the arithmetic unit takes out the input array from the input array storage unit and Processing to make one qubit string, taking out the array from the database storage unit as a second qubit string, and superimposing the second qubit string The quantum processing corresponding to the dynamic programming is performed on the second qubit string that is superimposed with the first qubit string, and is extracted from the input array and the database storage unit. Quantum query processing for obtaining a pair-wise alignment score between the sequences.

本発明は、量子アルゴリズムを使用することにより、膨大の数の配列が保存されているデータベースから、入力されたゲノム配列と最も類似度が高い配列を、従来のコンピュータすなわち古典的計算機を用いた場合よりも効率的に発見することができるという効果がある。本発明を用いた場合のその計算速度は、データベース中の配列数の増加率をｎとしたとき、従来のコンピュータよりも√ｎ倍高速になり、また計算空間のコストは、ｌｏｇ（ｎ）に圧縮される。例えば、データベース中の配列数が１００個から１００００個に増えたとき、すなわち、ｎ＝１００のとき、従来のコンピュータでは、配列数が１００個のときに必要なものの１００倍の計算時間とメモリ空間とを必要とする。しかし本発明によれば、１０倍の計算時間とｌｏｇ（１００）すなわち２ｌｏｇ₂（１０）倍のメモリ空間を用意するだけでよい。したがって本発明によれば、従来のものと比べ、データベース中の配列数が多くなるほど、アルゴリズムの実行効率が飛躍的に増大する。 In the present invention, when a quantum computer is used, a sequence having the highest similarity to the input genome sequence from a database in which a large number of sequences are stored is used using a conventional computer, that is, a classical computer. The effect is that it can be discovered more efficiently. The calculation speed when using the present invention is √n times faster than a conventional computer, where n is the rate of increase in the number of sequences in the database, and the cost of the calculation space is log (n). Compressed. For example, when the number of arrays in the database increases from 100 to 10,000, that is, when n = 100, the conventional computer requires 100 times the computation time and memory space required when the number of arrays is 100. And need. However, according to the present invention, it is only necessary to prepare 10 times the calculation time and log (100), that is, 2 log ₂ (10) times as much memory space. Therefore, according to the present invention, the algorithm execution efficiency increases dramatically as the number of arrays in the database increases as compared to the conventional one.

次に、本発明の実施の形態について、図面を参照して説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施の一形態のデータ探索装置の構成を示すブロック図である。このデータ探索装置１００は、塩基配列やアミノ酸配列などのゲノム配列における類似度の計算を量子アルゴリズムに従って実行する量子コンピュータとして構成されており、キュービットを用いて計算を行う演算部１１０と、記憶部１２０とを備えている。記憶部１２０には、入力された入力配列を記憶する入力配列記憶部１２１と、出力する出力配列を記憶する出力配列記憶部１２２と、最も類似度が高い配列を探索するための量子アルゴリズムを記憶するアルゴリズム記憶部１２３と、ゲノム配列のデータベースを保存するデータベース記憶部１２４と、計算の過程を一時的に記憶する入力一時保持記憶部１２５とが設けられている。ここで入力配列は、量子アルゴリズムを実行するときに入力として与えられる配列であり、出力配列は、量子アルゴリズムの実行が終了したときに得られる配列である。演算部１１０は、入力配列記憶部１２１、出力配列記憶部１２２、データベース記憶部１２４及び入力一時保持記憶部１２５との間で情報の書き込み・読み出しを行い、アルゴリズム記憶部１２３に記憶された量子アルゴリズムにしたがって量子演算を行う。 FIG. 1 is a block diagram showing a configuration of a data search apparatus according to an embodiment of the present invention. This data search apparatus 100 is configured as a quantum computer that executes calculation of similarity in a genome sequence such as a base sequence or an amino acid sequence according to a quantum algorithm, and includes a calculation unit 110 that performs calculation using qubits, and a storage unit 120. The storage unit 120 stores an input array storage unit 121 that stores an input array that has been input, an output array storage unit 122 that stores an output array to be output, and a quantum algorithm for searching for a sequence having the highest degree of similarity. An algorithm storage unit 123, a database storage unit 124 that stores a genome sequence database, and an input temporary storage unit 125 that temporarily stores a calculation process. Here, the input array is an array given as an input when the quantum algorithm is executed, and the output array is an array obtained when the execution of the quantum algorithm is finished. The arithmetic unit 110 writes / reads information to / from the input array storage unit 121, the output array storage unit 122, the database storage unit 124, and the temporary input storage unit 125, and stores the quantum algorithm stored in the algorithm storage unit 123. Quantum operation is performed according to

このデータ探索装置１００では、入力配列が入力されると、それは入力配列記憶部１２１に格納される。また、ゲノム配列のデータベースは予めデータベース記憶部１２４に格納され、最も類似度が高い配列を探索するアルゴリズムはアルゴリズム記憶部１２３に格納されている。次に、出力配列記憶部１２２と入力一時保持記憶部１２５が初期化される。その後、量子アルゴリズムに応じて、演算部１１０で量子演算処理が繰り返される。この間、入力一時保持記憶部１２５は、演算のステップごとにその内容が更新される。最後に、出力配列記憶部１２２の内容が計算結果として出力される。その結果、このデータ探索装置１００では、入力された１個の配列に最も類似した配列がデータベース中から検索されて出力されることになる。 In this data search apparatus 100, when an input array is input, it is stored in the input array storage unit 121. The genome sequence database is stored in advance in the database storage unit 124, and an algorithm for searching for a sequence having the highest similarity is stored in the algorithm storage unit 123. Next, the output array storage unit 122 and the input temporary storage unit 125 are initialized. Thereafter, the quantum computation process is repeated in the computation unit 110 according to the quantum algorithm. During this time, the contents of the temporary input storage unit 125 are updated for each calculation step. Finally, the contents of the output array storage unit 122 are output as calculation results. As a result, in the data search apparatus 100, the sequence most similar to the input one sequence is searched from the database and output.

以下、本実施形態において用いる量子アルゴリズムについて、詳細に説明する。 Hereinafter, the quantum algorithm used in this embodiment will be described in detail.

まず、量子アルゴリズムを定義する。 First, a quantum algorithm is defined.

量子アルゴリズム：
チューリングマシンによって数学的モデルが与えられる計算機であって、典型的にはノイマン型コンピュータである計算機のことを古典的計算機と呼ぶことにする。古典的計算機におけるビットに相当する量子コンピュータにおける概念は、キュービット(qubit)と呼ばれる。各キュービットは、古典的計算機における“１”，“０”、またはそれを拡張した（“１”と“０”の重ね合わせに相当する）ビット情報を保有する。量子コンピュータは、有限個のキュービットで構成される。「量子コンピュータで計算を行う」とは、各キュービットに初期値を与え、それらに操作を施してキュービットが保有するビット情報を操作し、最後に観測を行ってこれらのビット情報を取り出すことを意味する。キュービット上に施されるビット情報の一連の操作のことを、量子アルゴリズムと呼ぶ。キュービットに施される操作の基本単位を定めれば、量子アルゴリズムは、この基本操作の列によって記述される。 Quantum algorithm:
A computer to which a mathematical model is given by a Turing machine, typically a von Neumann computer, is called a classic computer. A concept in a quantum computer that corresponds to a bit in a classical computer is called a qubit. Each qubit holds “1”, “0” in the classical computer, or bit information obtained by extending it (corresponding to the superposition of “1” and “0”). A quantum computer is composed of a finite number of qubits. “Calculate with a quantum computer” means to give each qubit an initial value, manipulate them to manipulate the bit information held by the qubits, and finally take these bit information by observing Means. A series of operations on bit information performed on the qubit is called a quantum algorithm. If the basic unit of operation performed on the qubit is defined, the quantum algorithm is described by this sequence of basic operations.

次に、本実施の形態において用いられるデータベースを定義する。 Next, a database used in this embodiment is defined.

入力配列（配列１とする）をα、データベース中の配列（配列２とする）をα_iとする。データベースには、配列α_iが、ｎ個、すなわち、α₁、α₂、…、α_nまで格納されているとする。配列１、２は、いずれも、ｉｎｄｅｘ（インデックス）とｉｎｄｅｘでラベルがつけられたｔａｂｌｅ（テーブル：配列の実体）とから構成されている。ｉｎｄｅｘは、データベース中の配列番号を意味する変数ｉ（１≦ｉ≦ｎ）によって区別され、ｔａｂｌｅは、コード化された配列α_iを保存している。コード化とは、配列を構成する塩基あるいはアミノ酸の種類を番号で区別したものを、“０”と“１”の数字の列で表現することと定義する。 Let α be an input array (referred to as array 1) and α _i be an array (referred to as array 2) in the database. Assume that the database stores up to _n arrays α _i , that is, α ₁ , α ₂ ,..., Α _n . Each of the arrays 1 and 2 is composed of an index (index) and a table (table: entity of the array) labeled with the index. The index is distinguished by a variable i (1 ≦ i ≦ n) that means a sequence number in the database, and the table stores the encoded sequence α _i . Encoding is defined as expressing a sequence of bases or amino acids constituting a sequence by numbers with a sequence of numbers “0” and “1”.

次に、類似度を求めるための本実施形態におけるＤＰ（動的計画法）のアルゴリズムを説明する。 Next, a DP (dynamic programming) algorithm in the present embodiment for obtaining the similarity will be described.

１個の配列１が入力配列として与えられてそれが入力配列記憶部１２１に格納され、また、データベース記憶部１２４に記憶されたデータベースには配列２があるとして、配列１と配列２の間の類似度の算出を説明することとする。ここでは、配列１、２は塩基配列であるとするが、アミノ酸配列である場合も全く同様の手順が採用される。配列１を構成する塩基の数をｐ個、配列２を構成する塩基の数をｑ個とするとき、配列１は、ｘ₁，…，ｘ_a，…，ｘ_p、配列２はｙ_i1，…，ｙ_ib，…，ｙ_iqと表す。ａは、ａ番目の塩基を指定する変数であり、ｉｂはｉ番目の配列のｂ番目の塩基を指定する変数である。何番目の塩基であるかは、一般には、配列のＮ末端側から数えられる。アミノ酸配列の場合であれば、何番目のアミノ酸であるかは、一般には、配列の５’末端側から数えられる。 One array 1 is given as an input array, which is stored in the input array storage unit 121, and there is an array 2 in the database stored in the database storage unit 124. The calculation of similarity will be described. Here, it is assumed that sequences 1 and 2 are base sequences, but the same procedure is adopted when they are amino acid sequences. When the number of bases constituting the sequence 1 is p and the number of bases constituting the sequence 2 is q, the sequence 1 is x ₁ , ..., x _a , ..., x _p , and the sequence 2 is y _i1 , ..., y _ib , ..., y _iq . a is a variable that designates the a-th base, and ib is a variable that designates the b-th base of the i-th sequence. The number of the base is generally counted from the N-terminal side of the sequence. In the case of an amino acid sequence, the amino acid number is generally counted from the 5 ′ end side of the sequence.

まず、配列１の塩基を行列の行番号に、配列２の塩基を列番号にした行列を考える。この行列の各要素のことをセルと呼ぶこととし、１個のセル(ｘ_a，ｙ_b)に、１個のアラインメントのスコアｆ_i(ａ，ｂ)を保持する。 First, consider a matrix in which the base of sequence 1 is the row number of the matrix and the base of sequence 2 is the column number. Each element of this matrix is called a cell, and one alignment score f _i (a, b) is held in one cell (x _a , y _b ).

ｆ_i(ａ，ｂ)は、
ｆ_i(ａ，ｂ)＝ｍａｘ（ｆ_i(ａ−１，ｂ−１)＋ｓ(ａ，ｂｉ)，ｆ_i(ａ−１，ｂ)−ｄ，ｆ_i(ａ，ｂ−１)−ｄ） …(1)
と表される。ここでｍａｘ（Ａ，Ｂ，Ｃ）とは、Ａ，Ｂ，Ｃの中から最も大きい値を採用する意味の記号である。ｓ(ａ，ｂｉ)は、配列１のｘ_aと配列２のｙ_ibとの置換行列を用いたときの置換スコアであり、ｄは、配列に欠失や挿入が生じたときのギャップスコアである。置換行列は、２個の塩基またはアミノ酸の尤度の組み合わせを表にしたものであって、いくつかのものが知られている。後述する具体例では、置換行列として、ＢＬＯＳＵＭ５０を用いている。 f _i (a, b) is
f _i (a, b) = max (f _i (a−1, b−1) + s (a, bi), f _i (a−1, b) −d, f _i (a, b−1) − d)… (1)
It is expressed. Here, max (A, B, C) is a symbol meaning that the largest value among A, B, and C is adopted. s (a, bi) is a substitution score when using a substitution matrix of x _{a of} sequence 1 and y _{ib of} sequence 2, and d is a gap score when deletion or insertion occurs in the sequence. is there. The substitution matrix is a table of likelihood combinations of two bases or amino acids, and some are known. In a specific example described later, BLOSUM50 is used as the permutation matrix.

このときのセルは、(０，０)から(ｐ，ｑ)まで存在し、行番号あるいは列番号に０が含まれる場合は、境界条件として、
ｆ_i(０，０)＝０，
ｆ_i(０，ｂ)＝−ｂｄ，
ｆ_i(ａ，０)＝−ａｄ
と定められている。このような境界条件を用いて(1)式を適用することにより、(１，１)から(ｐ，ｑ)までの各セルのスコアを計算することができ、最終的にｆ_i(ｐ，ｑ)を計算できる。このスコアｆ_i(ｐ，ｑ)が、配列１と配列２との間の類似度を示すペアワイズアラインメントのスコアＳ(α，α_i)となる。 The cells at this time exist from (0, 0) to (p, q), and when the row number or column number includes 0, the boundary condition is
f _i (0,0) = 0,
f _i (0, b) = − bd,
f _i (a, 0) = − ad
It is stipulated. By applying the equation (1) using such a boundary condition, the score of each cell from (1, 1) to (p, q) can be calculated, and finally _fi (p, q) can be calculated. The score f _i (p, q) is a pair-wise alignment score S (α, α _i ) indicating the similarity between the sequences 1 and 2.

上述したＤＰのアルゴリズムは、それ自体では量子アルゴリズムの適用を前提としたものではなく、古典的計算機上で通常に実行されるアルゴリズムである。次に、上述したＤＰアルゴリズムを実行するための量子アルゴリズムの手順について説明する。図２はここで用いる量子アルゴリズムを示すフローチャートである。 The DP algorithm described above is not an algorithm that itself is based on the application of a quantum algorithm, but is an algorithm that is normally executed on a classic computer. Next, the procedure of the quantum algorithm for executing the above DP algorithm will be described. FIG. 2 is a flowchart showing the quantum algorithm used here.

ステップ１．しきい値ｙのｉｎｄｅｘ及びλの設定：
入力配列として配列１が入力して入力行列記憶部１２１に格納され、まず、ステップ１において、しきい値の初期値のｉｎｄｅｘを１からｎ以下の整数から無作為に選んだ整数ｙとする。ここで、ｎはデータベース中の配列の数である。また、ｍの増加分を決める定数であるλを６／５とする。しきい値とは、解（配列２の中で、配列１に対して最も類似度が高い配列）の候補となるものである。以下の説明から明らかなようにこの実施形態では、しきい値に対するペアワイズアラインメントのスコアより大きなスコアが得られた場合に、しきい値のｉｎｄｅｘであるｙを更新することによってしきい値を更新し、このような処理を繰り返すことによって、最終的な解を得るようにしている。以下の説明では、ｉｎｄｅｘがｙであるしきい値のことをしきい値ｙと表現している。 Step 1. Setting the index and λ of the threshold y:
Array 1 is input as an input array and stored in the input matrix storage unit 121. First, in step 1, the initial value of the threshold value is set to an integer y randomly selected from integers of 1 to n or less. Here, n is the number of sequences in the database. Also, λ, which is a constant that determines the increment of m, is 6/5. The threshold value is a candidate for a solution (sequence having the highest similarity to sequence 1 in sequence 2). As will be apparent from the following description, in this embodiment, the threshold value is updated by updating y, which is the index of the threshold value, when a score greater than the pairwise alignment score for the threshold value is obtained. By repeating such a process, a final solution is obtained. In the following description, a threshold value whose index is y is expressed as a threshold value y.

ステップ２．しきい値のｔａｂｌｅ及びｍの設定：
ステップ１の実行後、ステップ２では、ステップ１で選んだしきい値ｙのｉｎｄｅｘと対応するように、ｔａｂｌｅを設定する。アルゴリズムが繰り返される回数を定めるｍを、初期値として、ｍ＝１と設定する。 Step 2. Threshold table and m settings:
After execution of step 1, in step 2, table is set so as to correspond to the index of threshold value y selected in step 1. As an initial value, m, which determines the number of times the algorithm is repeated, is set as m = 1.

ステップ３．ペアワイズアラインメントスコアＳ(α，α _y )の計算：
ステップ２の実行後、ステップ３では、しきい値ｙのｔａｂｌｅに対応するペアワイズアラインメントスコアＳ(α，α_y)を古典的計算機を用いて計算する。しきい値ｙに対応するペアワイズアラインメントスコアＳ(α，α_y)は、量子アルゴリズムによって得られる解の候補を評価する際に必要であるから、ここでは古典的計算機で事前に計算しているが、もちろん、量子アルゴリズムを用いて計算してもよい。ペアワイズアラインメントスコアＳ(α，α_y)を求めるためのアルゴリズムとしては、上述したＤＰのアルゴリズムを適用することができる。 Step 3. Calculation of the pairwise alignment score S (α, α _y ):
After execution of step 2, in step 3, a pair-wise alignment score S (α, α _y ) corresponding to the table of the threshold value y is calculated using a classic computer. Since the pair-wise alignment score S (α, α _y ) corresponding to the threshold value y is necessary when evaluating the solution candidate obtained by the quantum algorithm, it is calculated in advance by a classical computer here. Of course, you may calculate using a quantum algorithm. As an algorithm for obtaining the pair-wise alignment score S (α, α _y ), the above-described DP algorithm can be applied.

ステップ４．ｍより小さく、かつ負でない整数ｊの無作為選択：
ステップ３の実行後、ステップ４では、ｍより小さく、かつ負でない整数ｊを無作為に選択する。ｊは、後述するステップ７と８を繰り返すために設定される変数である。 Step 4. Random selection of an integer j that is less than m and not negative:
After step 3 is executed, step 4 randomly selects an integer j that is smaller than m and not negative. j is a variable set to repeat Steps 7 and 8 described later.

ステップ５．配列２のｉｎｄｅｘに対するWalsh-Hadamard変換Ｈの適用：
ステップ４の実行後、ステップ５では、データベース記憶部１２４内の配列２に対し、演算部１１０が、Walsh-Hadamard（ウォルシュ−アダマール）変換Ｈを適用する。このとき、配列２は、キュービット列として表されている。Walsh-Hadamard変換Ｈとは、１個のキュービットに対する変換行列Ｈが、 Step 5. Application of Walsh-Hadamard transform H to array 2 index:
After execution of step 4, in step 5, operation unit 110 applies Walsh-Hadamard (Walsh-Hadamard) transformation H to array 2 in database storage unit 124. At this time, the array 2 is represented as a qubit string. Walsh-Hadamard transformation H is a transformation matrix H for one qubit.

と表される変換である。入力０に対しては、０と１が等しい振幅の重ね合わせ状態を出力し、入力１に対しては０と１が符号のみ異なる等しい振幅の重ね合わせ状態を出力する。振幅とは、キュービットを観測したときに０または１を得るそれぞれの確率の平方根であって、一般に、複素数で表される。 It is a conversion expressed as For the input 0, 0 and 1 are output with the same amplitude overlap state, and for the input 1 0 and 1 are output with the same amplitude overlap state that differs only in sign. The amplitude is the square root of each probability of obtaining 0 or 1 when a qubit is observed, and is generally represented by a complex number.

ステップ６．量子クエリ（量子query）を用いたＳ(α，α _i )の計算：
ステップ５の実行後、ステップ６では、量子クエリを用いて、配列１及び配列２の間の類似度のスコアＳ(α，α_i)を求める。ここでは量子クエリとは、両方またはどちらか一方が重ね合わせ状態にある２個の配列のＤＰ計算を行い、ペアワイズアラインメントのスコアを求める量子演算と定義する。 Step 6. Calculation of S (α, α _i ) using a quantum query :
After execution of step 5, in step 6, a score S (α, α _i ) of similarity between the arrays 1 and 2 is obtained using a quantum query. Here, the quantum query is defined as a quantum operation in which DP calculation is performed on two arrays in which both or one of them is in an overlapped state to obtain a pair-wise alignment score.

図３は、スコアＳ(α，α_i)を求めるための量子クエリを示している。ここでは、量子クエリを表す量子アルゴリズムは、量子レジスタなどの意味を持つキュービットの集合と、それに対する関数からなる。左から時系列に量子演算がキュービットに適用される。この図において、１本の横線は複数のキュービットを指し、四角形は、それと重なっている横線（標的キュービット）に対する関数である。関数と縦線で繋がれている黒い丸は、関数を実行するときに必要な制御キュービットを指定している。ここで示した量子演算は、演算部１１０で行われる。配列１（配列α）のｉｎｄｅｘとｔａｂｌｅ（すなわちｘ₁，…，ｘ_a，…，ｘ_p）は、入力行列記憶部１２１から演算部１１０にキュービット列として供給され、配列２（配列α_i）のｉｎｄｅｘとｔａｂｌｅ（すなわちｙ_i1，…，ｙ_ib，…，ｙ_iq）は、データベース記憶部１２４からキュービット列として演算部１１０に供給される。置換行列は、量子演算に用いるパラメータとして、アルゴリズム記憶部１２３から演算部１１０にキュービットとして与えられる。演算の途中結果であるｆ_i(０，０)からｆ_i(ｐ，ｑ)までは、入力一時保持記憶部１２５に格納されることになる。 FIG. 3 shows a quantum query for obtaining the score S (α, α _i ). Here, a quantum algorithm representing a quantum query is composed of a set of qubits having a meaning such as a quantum register and a function corresponding thereto. Quantum operations are applied to qubits in chronological order from the left. In this figure, one horizontal line indicates a plurality of qubits, and a square is a function with respect to the horizontal line (target qubit) that overlaps the qubit. A black circle connected to the function by a vertical line specifies a control qubit necessary for executing the function. The quantum operation shown here is performed by the operation unit 110. The index and table (that is, x ₁ ,..., X _a ,..., X _p ) of the array 1 (array α) are supplied from the input matrix storage unit 121 to the arithmetic unit 110 as a qubit string, and the array 2 (array α _i ) Index and table (ie, y _i1 ,..., Y _ib _,. The permutation matrix is given as a qubit from the algorithm storage unit 123 to the calculation unit 110 as a parameter used for quantum calculation. The intermediate results of the calculation, f _i (0,0) to f _i (p, q), are stored in the temporary input storage unit 125.

ステップ７．配列２のｉｎｄｅｘへの量子オラクルの適用：
ステップ６の実行後、ステップ７では、演算部１１０は、配列２のｉｎｄｅｘに対して量子オラクルを適用する。量子オラクル(oracle)とは、０と１のキュービット列で表された入力に対して、０または１、または０と１の重ね合わせ状態を出力する関数である。この関数によって、重ね合わせ状態の配列２のｉｎｄｅｘのうち、設定された条件を満たすｉｎｄｅｘの符号のみが反転される。 Step 7. Application of quantum oracle to array 2 index:
After execution of step 6, in step 7, the arithmetic unit 110 applies the quantum oracle to the index of the array 2. A quantum oracle (oracle) is a function that outputs 0 or 1 or an overlap state of 0 and 1 with respect to an input represented by a qubit string of 0 and 1. By this function, only the sign of the index satisfying the set condition among the indexes of the array 2 in the superposed state is inverted.

ステップ８．配列２のｉｎｄｅｘへの拡散変換の適用：
ステップ７の実行後、ステップ８では、演算部１１０は、配列２のｉｎｄｅｘに拡散変換を適用する。拡散変換Ｄは、ｎ次の正方行列であって、その対角成分は、いずれも−１＋（２／ｎ）であり、その他の成分はいずれも２／ｎである。 Step 8. Application of diffusion transformation to array 2 index:
After execution of step 7, in step 8, operation unit 110 applies diffusion transformation to the index of array 2. The diffusion transformation D is an n-order square matrix, and the diagonal components are all -1+ (2 / n), and the other components are all 2 / n.

ステップ９．ｊのデクリメント：
ステップ８の実行後、ステップ９では、ｊから１を引く。上述したステップ７、８の処理は、Grover（グローバー）の反復演算(iteration)であり、ここでｊから１を引くことにより、ステップ４で定めたｊの回数だけ、この反復演算が繰り返されるようにしている。この反復演算の結果、より高いスコアを持つｉｎｄｅｘとその他のｉｎｄｅｘとの振幅の差が拡大する。 Step 9. j decrement:
After execution of step 8, in step 9, 1 is subtracted from j. The processing of Steps 7 and 8 described above is a Grover iteration, and by subtracting 1 from j, the iteration is repeated as many times as j determined in Step 4. I have to. As a result of this iterative operation, the difference in amplitude between the index having a higher score and the other index is enlarged.

ステップ１０．条件分岐：
ステップ９の実行後、ステップ１０においてはｊの値を評価し、ｊが正ならばステップ１１に進み、そうでなければステップ７へ戻る。 Step 10. Conditional branch:
After execution of step 9, the value of j is evaluated in step 10, and if j is positive, the process proceeds to step 11; otherwise, the process returns to step 7.

ステップ１１．配列２の観測：
ステップ１１では、配列２のｉｎｄｅｘを観測し、解の候補となるｙ’とα_y'とＳ(α，α_y')を得る。 Step 11. Observation of sequence 2:
In step 11, the index of array 2 is observed, and y ′, α _{y ′,} and S (α, α _{y ′} ) that are solution candidates are obtained.

ステップ１２．アルゴリズムの実行ステップ数の確認：
ステップ１１の実行後、ステップ１２において、ステップ７、８で実行した合計ステップ数Ｔを求め、このＴがＯ(√ｎ)に満たないときは、次のステップ１３へ進む。そうでなければ、このときのｙを解として出力し、アルゴリズムを終了する。ここでの合計ステップ数Ｔの数え方は、１個のキュービットに対する１回の演算を１ステップとする。Ｏ(√ｎ)とは、たかだか、√ｎの次数を持つ多項式のことである。本発明によれば、量子アルゴリズムを用いることにより、配列数がｎであるデータベース中から最も類似度の高い配列を探索するための計算量をＯ(√ｎ)の程度とすることができる。具体的にどの値をもってここでのアルゴリズムの終了条件とするかは、用いられる量子コンピュータの実装に依存する。 Step 12. Check the number of execution steps of the algorithm:
After execution of step 11, in step 12, the total number of steps T executed in steps 7 and 8 is obtained, and when this T is less than O (√n), the process proceeds to the next step 13. Otherwise, y at this time is output as a solution, and the algorithm is terminated. Here, the total step number T is counted as one step for one operation on one qubit. O (√n) is a polynomial having an order of √n at most. According to the present invention, by using the quantum algorithm, it is possible to reduce the amount of calculation for searching for a sequence having the highest similarity from a database having n sequences to the order of O (√n). Which value is used as the end condition of the algorithm here depends on the implementation of the quantum computer used.

ステップ１３．条件分岐：
ステップ１３では、Ｓ(α，α_y')−Ｓ(α，α_y)の値を評価し、この値が正であればステップ１４へ進み、そうでなければステップ１５へ進む。ここでの評価は、ステップ１１で得られた解の候補ｙ’に対応するスコアＳ(α，α_y')が、現時点でのしきい値ｙに対応するスコアＳ(α，α_y)を上回っているかを判断するものである。解の候補ｙ’の方が大きなスコアを示すとき、すなわち類似度が高い場合には、しきい値ｙをｙ’に置き換えるために、ステップ１４に進むことになる。 Step 13. Conditional branch:
In step 13, the value of S (α, α _{y ′} ) −S (α, α _y ) is evaluated. If this value is positive, the process proceeds to step 14, and if not, the process proceeds to step 15. Evaluation here is 'score S (alpha, alpha _y corresponding _to') a candidate y of the solution obtained in step 11, the score S (alpha, alpha _y) corresponding to the threshold y at the present time the It is to judge whether it exceeds. When the solution candidate y ′ shows a larger score, that is, when the similarity is higher, the process proceeds to step 14 in order to replace the threshold value y with y ′.

ステップ１４．しきい値の更新：
ステップ１４では、しきい値ｙをｙ’に書き換えて、ステップ２へ戻る。 Step 14. Threshold update:
In step 14, the threshold value y is rewritten to y ', and the process returns to step 2.

ステップ１５．ｍの更新：
ステップ１５では、ｍにλｍと√ｎのうちの小さい方の値を代入することによってｍを更新し、ステップ４に戻る。 Step 15. Update m:
In step 15, m is updated by substituting the smaller value of λm and √n for m, and the process returns to step 4.

以上の量子アルゴリズムが終了したら、最大値の候補となるしきい値のｉｎｄｅｘ及びｔａｂｌｅを出力行列記憶部１２２に格納することによって出力する。 When the above quantum algorithm is completed, the threshold value index and table that are candidates for the maximum value are stored in the output matrix storage unit 122 and output.

図４は、上述した量子アルゴリズムにおいて、各配列、各変数などについてどのような処理が行われるかを時系列に示したものである。ここでも、量子アルゴリズムは、量子レジスタなどの意味を持つキュービットの集合とそれに対する関数から構成されている。図において、１本の横線は複数のキュービットを指し、四角形は、それと重なっている横線（標的キュービット）に対する関数である。関数と縦線で繋がれている黒い丸は、関数を実行するときに必要な制御キュービットを指定している。また、図中の「ｗｏｒｋｓｐａｃｅ」（作業空間）は、演算途中で一時的に使用する補助的なキュービットである。 FIG. 4 shows, in time series, what kind of processing is performed for each array, each variable, etc. in the quantum algorithm described above. Here again, the quantum algorithm is composed of a set of qubits having a meaning such as a quantum register and a function corresponding thereto. In the figure, one horizontal line indicates a plurality of qubits, and a square is a function with respect to the horizontal line (target qubit) that overlaps the qubit. A black circle connected to the function by a vertical line specifies a control qubit necessary for executing the function. Also, “work space” (work space) in the figure is an auxiliary qubit that is temporarily used during the calculation.

《計算の具体例》
以下、本実施形態に基づく具体的な計算例を説明する。ここでは、ゲノム配列をアミノ酸配列として、類似度が最も高い配列を発見することとする。また、類似度の計算には、ＢＬＯＳＵＭ５０という置換行列を用いる。入力として与えられた配列の数は１個、データベース中の配列数ｎは４、各アミノ酸配列が持つアミノ酸数を３個とする。ｎを表現するのに２キュービット、アミノ酸の種類は２０種類なので、１個のアミノ酸配列を表現するためには１５キュービットが必要である。また、ギャップスコアを−８とする、入力として与えられた配列αは“ＡＲＮ”、データベース中の配列は、それぞれ、（ｉｎｄｅｘ，ｔａｂｌｅ）＝｛（ｉ，α_i）｜（１，ＡＬＫ），（２，ＡＮＱ），（３，ＡＰＮ），（４，ＴＷＹ）｝であるとする。ここでは、２０種類のアミノ酸をそれぞれ、アルファベットの大文字１文字からなる略号で表しており、ここに記載された範囲では、Ａはアラニン、Ｒはアルギニン、Ｎはアスパラギン、Ｌはロイシン、Ｋはリジン、Ｑはグルタミン、Ｐはプロリン、Ｔはトレオニン、Ｗはトリプトファン、Ｙはチロシンである。これらのアミノ酸は、キュービットとして、Ａ＝“０００００”、Ｒ＝“００００１”、Ｎ＝“０００１０”、…、Ｖ＝“１００１１”のように、０と１の数字の列に置き換えられる。もちろん、入力がこの例に限られるわけではない。《Specific examples of calculation》
Hereinafter, a specific calculation example based on this embodiment will be described. Here, the genome sequence is assumed to be an amino acid sequence, and a sequence having the highest similarity is found. Further, a permutation matrix called BLOSUM50 is used for calculating the similarity. The number of sequences given as input is 1, the number of sequences n in the database is 4, and the number of amino acids in each amino acid sequence is 3. Since 2 qubits and 20 kinds of amino acids are used to express n, 15 qubits are required to express one amino acid sequence. The sequence α given as an input with a gap score of −8 is “ARN”, and the sequences in the database are (index, table) = {(i, α _i ) | (1, ALK), It is assumed that (2, ANQ), (3, APN), (4, TWY)}. Here, each of the 20 amino acids is represented by an abbreviation consisting of one uppercase letter of the alphabet, and within the ranges described herein, A is alanine, R is arginine, N is asparagine, L is leucine, and K is lysine. , Q is glutamine, P is proline, T is threonine, W is tryptophan, and Y is tyrosine. These amino acids are replaced by a sequence of numbers 0 and 1 as qubits such as A = “00000”, R = “00001”, N = “00010”,..., V = “10011”. Of course, the input is not limited to this example.

以下に示すものは、この例において、類似度が最も高い配列を出力するための計算過程を、図２に示したフローチャートにおける各ステップに対応させて示したものである。 The following is a calculation process for outputting an array having the highest similarity in this example, corresponding to each step in the flowchart shown in FIG.

ステップ１．しきい値をｙ＝２、λ＝５／６に設定する。
ステップ２． α_y＝ＡＮＱ、ｍ＝１に設定する。
ステップ３．アラインメントのスコアは、Ｓ(α，α_y)＝４となる。
ステップ４．ｊ＝１とする。
ステップ５．配列２のｉｎｄｅｘに、Walsh-Hadamard変換を適用する。このときの量子状態は、１／２｛（１，ＡＬＫ，０）＋（２，ＡＮＱ，０）＋（３，ＡＰＮ，０）＋（４，ＴＷＹ，０）｝となる。量子状態の表記は、１個の量子状態（ｉｎｄｅｘ，ｔａｂｌｅ，スコア）が｛｝の中で重ね合わせの状態になっているものとする。
ステップ６．Ｓ(α，α_i)を求めたときの量子状態は、１／２｛（１，ＡＬＫ，２）＋（２，ＡＮＱ，４）＋（３，ＡＰＮ，９）＋（４，ＴＷＹ，−５）｝である。
ステップ７．量子オラクルを用いて、Ｓ(α，α_i)＞Ｓ(α，α_y)を満たすｉｎｄｅｘの係数のみを反転する。このときの量子状態は、１／２｛（１，ＡＬＫ，２）＋（２，ＡＮＱ，４）−（３，ＡＰＮ，９）＋（４，ＴＷＹ，−５）｝になる。
ステップ８．配列２のｉｎｄｅｘに拡散変換を適用する。このときの量子状態は｛（３，ＡＰＮ，９）｝である。
ステップ９．ｊ＝０となる。
ステップ１０．ｊ≦０なので、ステップ１１に進む。
ステップ１１．配列２のｉｎｄｅｘを観測し、（３，ＡＰＮ，９）を得る。
ステップ１２．ステップ７と８の合計ステップ数がＯ(√ｎ)に達していなければ、ステップ１３へ進む。
ステップ１３．Ｓ(α，α_y')−Ｓ(α，α_y)＝９−４＞０なので、ステップ１４へ進む。
ステップ１４．ｙ＝３に更新する。
ステップ１５．ｍ＝６／５に更新して、ステップ４へ戻る。 Step 1. The threshold is set to y = 2 and λ = 5/6.
Step 2. Set α _y = ANQ, m = 1.
Step 3. The alignment score is S (α, α _y ) = 4.
Step 4. Let j = 1.
Step 5. The Walsh-Hadamard transformation is applied to the index of array 2. The quantum state at this time is 1/2 {(1, ALK, 0) + (2, ANQ, 0) + (3, APN, 0) + (4, TWY, 0)}. In the quantum state notation, it is assumed that one quantum state (index, table, score) is superposed in {}.
Step 6. The quantum state when S (α, α _i ) is obtained is 1/2 {(1, ALK, 2) + (2, ANQ, 4) + (3, APN, 9) + (4, TWY, − 5)}.
Step 7. Using the quantum oracle, only the index coefficients satisfying S (α, α _i )> S (α, α _y ) are inverted. The quantum state at this time is 1/2 {(1, ALK, 2) + (2, ANQ, 4) − (3, APN, 9) + (4, TWY, −5)}.
Step 8. A diffusion transformation is applied to the index of array 2. The quantum state at this time is {(3, APN, 9)}.
Step 9. j = 0.
Step 10. Since j ≦ 0, go to Step 11.
Step 11. Observe the index of sequence 2 to obtain (3, APN, 9).
Step 12. If the total number of steps 7 and 8 has not reached O (√n), the process proceeds to step 13.
Step 13. Since S (α, α _{y ′} ) −S (α, α _y ) = 9−4> 0, the process proceeds to step 14.
Step 14. Update to y = 3.
Step 15. Update to m = 6/5 and return to step 4.

以上のようにして各ステップが実行された後、ｙ＝３、α_y＝ＡＰＮを出力し、アルゴリズムを終了する。 After each step is executed as described above, y = 3 and α _y = APN are output, and the algorithm is terminated.

ここでは、ゲノム配列がアミノ酸配列である場合の具体例を説明したが、ゲノム配列が塩基配列の場合にも、同様の手順で処理される。ただし、塩基配列の場合には塩基は４種類しかないので、１個の塩基を記述するのに２キュービットが必要である。 Here, a specific example in which the genome sequence is an amino acid sequence has been described, but the same procedure is performed when the genome sequence is a base sequence. However, since there are only four types of bases in the case of a base sequence, two qubits are required to describe one base.

以上、本発明の好ましい実施形態について説明した。上述したデータ探索装置は、専用のハードウェアとしての量子コンピュータにより実行されるほか、その機能を実現するためのプログラムを、コンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行するものであってもよい。古典的計算機であっても、演算時間はかかるものの、量子アルゴリズムの検証やシミュレーションを行うことは可能であるから、古典的計算機であるコンピュータシステムにそのようなプログラムを実行させて、上述したようなデータ探索を行ってもよい。 The preferred embodiments of the present invention have been described above. In addition to being executed by the quantum computer as dedicated hardware, the above-described data search apparatus records a program for realizing the function on a computer-readable recording medium, and the program recorded on the recording medium May be read by a computer system and executed. Even with a classical computer, computation time is required, but it is possible to perform verification and simulation of a quantum algorithm, so that a computer system, which is a classical computer, can execute such a program, as described above. Data search may be performed.

コンピュータ読み取り可能な記録媒体とは、フレキシブルディスク（ＦＤ）、光磁気ディスク（ＭＯ）、ＣＤ−ＲＯＭ等の記録媒体、コンピュータシステムに内蔵されるハードディスク装置などの記憶装置を指す。さらに、コンピュータ読み取り可能な記録媒体は、インターネットなどのネットワークを介してプログラムを送信する場合のように、短時間の間、動的にプログラムを保持するもの（伝送媒体もしくは伝送波）、その場合のサーバとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでいる。 The computer-readable recording medium refers to a recording medium such as a flexible disk (FD), a magneto-optical disk (MO), a CD-ROM, or a storage device such as a hard disk device built in a computer system. Furthermore, a computer-readable recording medium is one that dynamically holds a program for a short time (transmission medium or transmission wave), as in the case of transmitting a program via a network such as the Internet, in which case Some of them hold a program for a certain period of time, such as a volatile memory inside a computer system as a server.

本発明の実施の一形態のデータ探索装置のブロック図である。It is a block diagram of the data search device of one embodiment of the present invention. 本発明に基づくデータ探索方法のアルゴリズムを示すフローチャートである。It is a flowchart which shows the algorithm of the data search method based on this invention. 量子アルゴリズムとして示した量子クエリを説明する図である。It is a figure explaining the quantum query shown as a quantum algorithm. 量子アルゴリズムを構成する量子演算を時系列に表現した図である。It is the figure which expressed the quantum operation which comprises a quantum algorithm in time series.

Explanation of symbols

１〜１５ステップ
１００データ探索装置
１１０演算部
１２０記憶部
１２１入力配列記憶部
１２２出力配列記憶部
１２３アルゴリズム記憶部
１２４データベース記憶部
１２５入力一時保持記憶部 1 to 15 Steps 100 Data Search Device 110 Arithmetic Unit 120 Storage Unit 121 Input Array Storage Unit 122 Output Array Storage Unit 123 Algorithm Storage Unit 124 Database Storage Unit 125 Input Temporary Storage Unit

Claims

A data search method for searching a sequence most similar to a genome sequence given as an input sequence from a database storing sequences,
Extracting the input array from the input array storage unit into a first qubit string;
A superposition step of taking an array from a database storage unit that stores the database to form a second qubit string, and superimposing the second qubit string;
A quantum operation corresponding to dynamic programming is performed on the first qubit string and the second qubit string in the overlapped state, and between the input array and the array extracted from the database storage unit A quantum query stage to determine a pairwise alignment score;
A data search method comprising:

Using a candidate sequence for the most similar sequence as a threshold,
Determining an initial value of the threshold, and determining a score by pairwise alignment between the array of the initial value and the input array;
Quantum oracle and diffusion for the second qubit string corresponding to the sequence having a score obtained in the quantum query step larger than the pairwise alignment score between the sequence corresponding to the threshold and the input sequence An iterative phase to apply the transformation;
An observation step of observing the second qubit string that has undergone the iterative operation step to obtain a solution candidate;
The data search method according to claim 1, further comprising:

If the number of sequences stored in the database storage unit is n, and the pairwise alignment score corresponding to the solution candidate is greater than the pairwise alignment score corresponding to the threshold, the solution candidate Updating the threshold, otherwise, repeating the superposition step, the quantum query step, the iterative operation step and the observation step by the number of operations increasing in the order of O (√n) with respect to n, 3. The data search method according to claim 2, wherein after that, a final solution candidate is stored in the output array storage unit as an output array.

A data search device for searching a sequence most similar to a genome sequence given as an input sequence from a database storing sequences,
An input array storage unit for storing input input arrays;
An output array storage unit for storing an output array to be output;
A database storage unit for storing a genome sequence database;
An input temporary storage unit for temporarily storing the progress of the calculation;
An arithmetic unit that performs operations by writing and reading information between the input sequence storage unit, the output sequence storage unit, the database storage unit, and the input temporary storage unit, and a calculation by a quantum algorithm,
The operation unit extracts the input array from the input array storage unit to form a first qubit string, extracts the array from the database storage unit to form a second qubit string, and superimposes the second qubit string Superimposing processing for combining states, performing quantum operation corresponding to dynamic programming on the first qubit string and the second qubit string in the overlapping state, and the input array and the database storage unit Performing a quantum query process for determining a pairwise alignment score between the sequence extracted from
Data search device.

Using the sequence that is a candidate for the most similar sequence as a threshold value, the calculation unit further obtains an initial value of the threshold value, and a pairwise alignment score between the sequence of the initial value and the input sequence And after the execution of the quantum query process, the score obtained in the quantum query stage is larger than the pair-wise alignment score between the array corresponding to the threshold and the input array. Performing an iterative operation process that applies a quantum oracle and a diffusion transformation to the second qubit string to be observed, and an observation process that obtains a solution candidate by observing the second qubit string that has undergone the iterative operation process. The data search device according to claim 4.

When the number of sequences stored in the database storage unit is n, the calculation unit, if the pair-wise alignment score corresponding to the solution candidate is larger than the pair-wise alignment score corresponding to the threshold, Updating the threshold with candidate solutions, otherwise, the superposition process, the quantum query process, the iterative operation process, and the number of operations increasing in the order of O (√n) to n; The data search apparatus according to claim 5, wherein the observation process is repeated, and then a final solution candidate is stored in the output array storage unit as an output array.

On the computer,
Processing to extract the input array from the input array storage unit and form a first qubit string;
A superposition process for taking out the array from the database storage unit for storing the database storing the array and setting it as a second qubit string, and setting the second qubit string to a superposed state;
A quantum operation corresponding to dynamic programming is performed on the first qubit string and the second qubit string in the overlapped state, and between the input array and the array extracted from the database storage unit Quantum query processing to find a pairwise alignment score,
A data search program that executes

Using a sequence that is a candidate for the most similar sequence as a threshold,
A process for obtaining an initial value of the threshold value and obtaining a score of a pairwise alignment between the array of the initial value and the input array;
Quantum oracle and diffusion for the second qubit string corresponding to the sequence having a score obtained in the quantum query step larger than the pairwise alignment score between the sequence corresponding to the threshold and the input sequence Iterative processing to apply transformations,
An observation process for observing the second qubit string that has undergone the iterative calculation process to obtain a solution candidate;
The data search program according to claim 7, further executing:

In the computer, n is the number of arrays stored in the database storage unit.
If the pairwise alignment score corresponding to the solution candidate is greater than the pairwise alignment score corresponding to the threshold, update the threshold with the solution candidate, otherwise for n The superposition process, the quantum query process, the iterative calculation process, and the observation process are repeated as many times as the number of operations increases in the order of O (√n), and then the final solution candidate is stored as an output array. The data search program according to claim 8, further causing the processing to be stored in the section to be executed.

A computer-readable recording medium on which the data search program according to any one of claims 7 to 9 is recorded.