JP2011062085A

JP2011062085A - Apparatus for searching primer set, method and program for searching primer set

Info

Publication number: JP2011062085A
Application number: JP2009212703A
Authority: JP
Inventors: Ko Fujifuchi; 航藤渕; Keiwa Chiba; 啓和千葉
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2009-09-15
Filing date: 2009-09-15
Publication date: 2011-03-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for searching for a primer set useful for sequencing a number of target genes with little cross hybridization with non-target genes. <P>SOLUTION: The method comprises: a step in which a database is constructed from whether hybridization of the candidate primers to be included in the primer set with the target genes and plural non-target genes occurs or not; a step in which the initial primer set is generated at random; and a searching step in which the following process is repeatedly conducted: the each evaluated value of the plural primer sets formed by exchanging a primer of the primer sets with plural other candidate primers are calculated based on the rates at which the primer sets hybridize with the target genes and the non-target genes, Ct and Cn, respectively; the primer sets are subjected to stochastic selection based on the calculated evaluated value; and the data of the primer set and its evaluated value are stored in the memory device. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、プライマーの探索を行う装置および方法に関し、特に大規模シークエンシングによる遺伝子発現量測定に用いる複数のプライマーの組み合わせ（以下、「プライマーセット」という）の探索に関する。 The present invention relates to an apparatus and method for searching for primers, and more particularly to searching for a combination of a plurality of primers (hereinafter referred to as “primer sets”) used for gene expression level measurement by large-scale sequencing.

近年ＤＮＡシークエンシングの分野では急速な大規模化が進み、ターゲット遺伝子を特定するために用いるプライマー配列に関しても、計算機を用いた大規模な設計が必要になってきている。 In recent years, in the field of DNA sequencing, rapid scale-up has progressed, and it has become necessary to design a large-scale design using a computer for primer sequences used to identify target genes.

プライマーを用いてターゲットとなる遺伝子群を特定し、配列を決定することにより、遺伝子発現量を絶対定量することが可能である。多数の遺伝子の発現量を一度に測定する実験は、生物学上重要であるが、多数の遺伝子群の発現量を効率よく測定するためには、1つのプライマーで複数の遺伝子を特定できるようなプライマーを、適切な組み合わせで用いる必要がある。 By specifying a target gene group using a primer and determining the sequence, it is possible to absolute quantitate the gene expression level. Experiments that measure the expression level of a large number of genes at once are important in biology, but in order to efficiently measure the expression level of a large number of genes, multiple genes can be identified with a single primer. Primers should be used in appropriate combinations.

少数のプライマーでなるべく多くの遺伝子を特定する試みは、これまでにもＰＣＲの分野で行われてきた（例えば非特許文献１）。しかし、これらの研究において想定されているターゲット遺伝子の数は百遺伝子程度であった。 Attempts to identify as many genes as possible with a small number of primers have been made in the field of PCR (for example, Non-Patent Document 1). However, the number of target genes assumed in these studies was about one hundred genes.

Pesole G.他著「GeneUp: a program to select short PCR primer pairs that occur in multiple members of sequence lists.」Biotechniques,1998 vol. 25, no1, pp. 112-123Pesole G. et al., “GeneUp: a program to select short PCR primer pairs that occur in multiple members of sequence lists.” Biotechniques, 1998 vol. 25, no1, pp. 112-123

多数のターゲット遺伝子および非ターゲット遺伝子を対象とした場合、プライマーの候補配列は膨大となる。全ての組み合わせを調べるには非現実的な時間が必要となる。一方で、非特許文献１で用いられているような、発見的なアルゴリズムを用いた探索では、局所最適解に陥る可能性がある。現実的な時間内に、最適なもしくは準最適な組み合わせを計算するアルゴリズムが必要である。 When a large number of target genes and non-target genes are targeted, the candidate sequences for primers are enormous. Examining all combinations requires unrealistic time. On the other hand, a search using a heuristic algorithm as used in Non-Patent Document 1 may fall into a local optimal solution. There is a need for an algorithm that calculates an optimal or sub-optimal combination within a realistic time.

また、シークエンシングにおいて、クロスハイブリダイゼーションが起きると、目的外の遺伝子が増幅されてしまう。従って、限られた種類のプライマーで、多くの目的遺伝子群をシークエンシングするために、クロスハイブリダイゼーションを最小限に抑えることが重要になる。 In addition, when cross-hybridization occurs in sequencing, a non-target gene is amplified. Therefore, it is important to minimize cross-hybridization in order to sequence many gene groups with a limited number of primers.

非特許文献１に記載されたプログラムは、排除したい少数の遺伝子配列リストを指定するオプション機能を有している。このオプション機能を利用して、目的としない遺伝子にハイブリダイズするプライマーを探索の対象から除外することにより、クロスハイブリダイゼーションを防止することができる。しかし、この方法では、排除したい遺伝子が多数となった場合には、探索対象のプライマーが大幅に限定されてしまう。 The program described in Non-Patent Document 1 has an optional function for designating a small number of gene sequence lists to be excluded. By utilizing this optional function, a primer that hybridizes to an unintended gene is excluded from the search target, thereby preventing cross-hybridization. However, in this method, when there are a large number of genes to be excluded, primers to be searched are greatly limited.

本発明は、上記背景に鑑み、非目的遺伝子群へのクロスハイブリダイゼーションが少なく、多くの目的遺伝子群をシークエンシングすることができるプライマーセットを、大規模に探索することのできるプライマー探索装置を提供することを目的とする。 In view of the above background, the present invention provides a primer search apparatus capable of searching a large-scale primer set that can sequence many target gene groups with little cross-hybridization to non-target gene groups. The purpose is to do.

本発明のプライマーセット探索方法は、目的遺伝子にハイブリダイズするプライマーセットを探索するための方法であって、複数の目的遺伝子と複数の非目的遺伝子の配列を入力するステップと、プライマーセットに含めるプライマーの探索条件を入力するステップと、前記探索条件に合致するプライマー候補が前記複数の目的遺伝子および前記複数の非目的遺伝子のそれぞれにハイブリダイズするか否かを求め、その結果を記憶したデータベースを生成するステップと、前記プライマーセットに含める個数のプライマーを前記プライマー候補からランダムに選択して初期のプライマーセットを生成するステップと、（１）プライマーセットに含まれるプライマーから一つのプライマーをランダムに選択し、（２）選択したプライマーを複数のプライマー候補のそれぞれと交換したときに生成される複数のプライマーセットについて、ハイブリダイズする目的遺伝子の割合Ｃｔおよび非目的遺伝子の割合Ｃｎを前記データベースに記憶されたデータに基づいて求め、（３）前記目的遺伝子の割合Ｃｔと前記非目的遺伝子の割合Ｃｎとに基づいて、複数のプライマーセットのそれぞれの評価値を計算し、（４）前記評価値に応じた確率値を与える確率分布と前記複数のプライマーセットのそれぞれの評価値とに基づいて、前記複数のプライマーセットから一つのプライマーセットを選択し、（５）選択したプライマーセットとその評価値を記憶部に記憶する、上記（１）〜（５）の処理を繰り返し行う探索ステップと、前記記憶部に記憶された複数のプライマーセットの中でその評価値が最も高かったプライマーセットを探索結果として出力するステップとを備える。 The primer set search method of the present invention is a method for searching for a primer set that hybridizes to a target gene, the step of inputting sequences of a plurality of target genes and a plurality of non-target genes, and primers included in the primer set A search condition is input, and whether or not primer candidates that match the search condition hybridize to each of the plurality of target genes and the plurality of non-target genes is generated, and a database storing the results is generated A step of randomly selecting the number of primers to be included in the primer set from the candidate primers to generate an initial primer set, and (1) randomly selecting one primer from the primers included in the primer set. (2) Multiple selected primers For a plurality of primer sets generated when exchanged with each of the primer candidates, the ratio Ct of the target gene to hybridize and the ratio Cn of the non-target gene are determined based on the data stored in the database, (3) Based on the ratio Ct of the target gene and the ratio Cn of the non-target gene, each evaluation value of the plurality of primer sets is calculated, and (4) a probability distribution that gives a probability value corresponding to the evaluation value and the plurality of the plurality of primer sets Based on each evaluation value of the primer set, one primer set is selected from the plurality of primer sets, and (5) the selected primer set and its evaluation value are stored in the storage unit, (1) to (1) 5) a search step for repeatedly performing the process, and evaluation of the plurality of primer sets stored in the storage unit There and a step of outputting as a search result highest was primer set.

本発明によれば、ハイブリダイズする目的遺伝子の割合Ｃｔと非目的遺伝子の割合Ｃｎとに基づいて計算した評価値を用いてプライマーセットを探索するので、非目的遺伝子へのクロスハイブリダイゼーションが少なく、かつ、多くの目的遺伝子にハイブリダイズするプライマーの組み合わせを求めることができる。 According to the present invention, since the primer set is searched using the evaluation value calculated based on the ratio Ct of the target gene to hybridize and the ratio Cn of the non-target gene, there is little cross-hybridization to the non-target gene, In addition, combinations of primers that hybridize to many target genes can be obtained.

実施の形態のプライマーセット探索装置の構成を示す図である。It is a figure which shows the structure of the primer set search apparatus of embodiment. 配列特異性データベースに記憶されたデータの例を示す図である。It is a figure which shows the example of the data memorize | stored in the sequence specificity database. 配列特異性を計算する専用回路について説明する。A dedicated circuit for calculating sequence specificity will be described. 配列特異性の計算に要する時間を示す図である。It is a figure which shows the time which calculation of sequence specificity requires. プライマーセット探索装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a primer set search device. 実施の形態のプライマーセット探索装置の動作を示す図である。It is a figure which shows operation | movement of the primer set search apparatus of embodiment. プライマーの組み合わせ探索の動作を示す図である。It is a figure which shows the operation | movement of a primer combination search. 評価値について説明するための図である。It is a figure for demonstrating an evaluation value. プライマーセットを選択するための確率分布を示す図である。It is a figure which shows the probability distribution for selecting a primer set.

以下、本発明の実施の形態のプライマーセット探索装置およびプライマーセット探索方法について図面を参照しながら説明する。 Hereinafter, a primer set search device and a primer set search method according to an embodiment of the present invention will be described with reference to the drawings.

（第１の実施の形態）
図１は、第１の実施の形態のプライマーセット探索装置１０の構成を示す図である。プライマーセット探索装置１０は、ｃＤＮＡライブラリ３０に接続されている。ｃＤＮＡライブラリ３０には、ｃＤＮＡの配列と、そのｃＤＮＡによって発現されるタンパク質の特徴（ヒト転写遺伝子、ハウスキーピング遺伝子等）が関連付けて記憶されている。ｃＤＮＡライブラリ３０には、例えば、数万個のｃＤＮＡのデータが記憶されている。 (First embodiment)
FIG. 1 is a diagram illustrating a configuration of a primer set search apparatus 10 according to the first embodiment. The primer set search device 10 is connected to the cDNA library 30. The cDNA library 30 stores the cDNA sequence in association with the characteristics of the protein expressed by the cDNA (human transcriptional gene, housekeeping gene, etc.). In the cDNA library 30, for example, data of tens of thousands of cDNAs are stored.

プライマーセット探索装置１０は、ｃＤＮＡライブラリ３０から目的遺伝子を入力する目的遺伝子入力部１１と、非目的遺伝子を入力する非目的遺伝子入力部１２とを有している。目的遺伝子入力部１１は、オペレータが目的遺伝子として取得したいｃＤＮＡの特徴のデータを入力すると、その特徴に関連付けられたｃＤＮＡをｃＤＮＡライブラリ３０から読み出す。同様に、非目的遺伝子入力部１２は、オペレータが非目的遺伝子としたいｃＤＮＡの特徴のデータを入力すると、その特徴に関連付けられたｃＤＮＡをｃＤＮＡライブラリ３０から読み出す。 The primer set search apparatus 10 has a target gene input unit 11 for inputting a target gene from the cDNA library 30 and a non-target gene input unit 12 for inputting a non-target gene. When the operator inputs the data of the characteristics of the cDNA that the operator wants to acquire as the target gene, the target gene input unit 11 reads the cDNA associated with the characteristics from the cDNA library 30. Similarly, the non-target gene input unit 12 reads out the cDNA associated with the feature from the cDNA library 30 when the operator inputs the data of the feature of the cDNA desired to be the non-target gene.

プライマーセット探索装置１０は、探索条件入力部１３を有している。探索条件入力部１３は、プライマーセットに含めるプライマーの個数と、プライマーの塩基長のデータの入力を受け付ける。探索条件の例としては、例えば、塩基長：１０ｎｔ、プライマー数：２０個である。この探索条件により、目的遺伝子をなるべくカバーして、非目的遺伝子をなるべくカバーしないような、例えば１０ｎｔのプライマー２０個の組み合わせ（プライマーセット）を探索する。 The primer set search device 10 has a search condition input unit 13. The search condition input unit 13 receives input of the number of primers included in the primer set and the base length data of the primers. Examples of search conditions are, for example, base length: 10 nt and number of primers: 20. Based on this search condition, a search is made for a combination (primer set) of, for example, 20 primers of 10 nt that covers the target gene as much as possible and does not cover the non-target gene as much as possible.

プライマーセット探索装置１０は、配列特異性計算部１４を備えている。配列特異性計算部１４は、探索条件として入力された塩基長のすべてのプライマー候補について、目的遺伝子および非目的遺伝子にハイブリダイズするか否かを求める機能を有している。例えば、１０ｎｔのプライマーであれば、４^１０個のプライマー候補が存在する。このすべてのプライマー候補と、目的遺伝子および非目的遺伝子とのハイブリダイゼーションの可否を総当たりで計算し、その結果を配列特異性データベース（以下、「配列特異性ＤＢ」という）１５に記憶する。 The primer set search apparatus 10 includes a sequence specificity calculation unit 14. The sequence specificity calculation unit 14 has a function of determining whether or not all candidate primers having a base length input as search conditions are hybridized to a target gene and a non-target gene. For example, if the primer is ¹⁰ nt, there are 410 candidate primers. Whether all of these primer candidates can be hybridized with the target gene and the non-target gene is calculated by brute force, and the result is stored in a sequence specificity database (hereinafter referred to as “sequence specificity DB”) 15.

プライマー候補が遺伝子（目的遺伝子、非目的遺伝子）にハイブリダイズするか否かは、遺伝子にプライマー候補の配列と相補的な配列が含まれているか否かによって判定する。遺伝子にプライマー候補の配列と相補的な配列を含む場合には、プライマー候補はその遺伝子にハイブリダイズすると判定する。プライマー候補の配列の全塩基に対して相補的でない場合にも、相補的な塩基を所定の割合（例えば、８割）以上含む配列が存在する場合には、その遺伝子にハイブリダイズすると判定してもよい。どの程度のマッチング率によってハイブリダイズすると判定するかの閾値は、当業者が適切に設定することが可能である。 Whether or not a primer candidate hybridizes to a gene (target gene, non-target gene) is determined by whether or not the gene contains a sequence complementary to the primer candidate sequence. If the gene contains a sequence complementary to the sequence of the candidate primer, it is determined that the candidate primer hybridizes to the gene. Even if it is not complementary to all the bases of the candidate primer sequence, if there is a sequence containing more than a predetermined percentage (for example, 80%) of complementary bases, it is determined to hybridize to that gene. Also good. The threshold for determining the degree of matching to be hybridized can be appropriately set by those skilled in the art.

図２は、配列特異性ＤＢ１５に記憶されたデータの例を示している。ここでは簡単のため、目的遺伝子Ｔ１〜Ｔ６、非目的遺伝子Ｎ１〜Ｎ６に対して、プライマー候補Ｐ１〜Ｐ６がハイブリダイズするか否かのデータを示している。図中に「○」が記載してある組み合わせは、ハイブリダイズすることを示している。例えば、プライマー候補Ｐ１は、目的遺伝子Ｔ１、Ｔ３、Ｔ４と非目的遺伝子Ｎ１に対してハイブリダイズし、それ以外の遺伝子に対してはハイブリダイズしない。なお、配列特異性ＤＢ１５には、目的遺伝子Ｔ１〜Ｔ６のいずれにもハイブリダイズしないプライマー候補は記憶しなくてもよい。すなわち、目的遺伝子Ｔ１〜Ｔ６のいずれにもハイブリダイズしないプライマーはプライマー候補から除外する。 FIG. 2 shows an example of data stored in the sequence specificity DB 15. Here, for the sake of simplicity, data on whether or not candidate primers P1 to P6 hybridize to the target genes T1 to T6 and the non-target genes N1 to N6 are shown. Combinations indicated by “◯” in the figure indicate that they are hybridized. For example, the candidate primer P1 hybridizes to the target genes T1, T3, T4 and the non-target gene N1, and does not hybridize to other genes. The sequence specificity DB 15 may not store candidate primers that do not hybridize to any of the target genes T1 to T6. That is, primers that do not hybridize to any of the target genes T1 to T6 are excluded from candidate primers.

配列特異性計算部１４は、プライマー候補と遺伝子との膨大な数の組み合わせについて配列特異性計算を行う必要があるので、計算を高速に行うため、ＦＰＧＡ（Field Programmable Gate Array）等により専用回路を構成してもよい。専用回路については、後述する。 Since the sequence specificity calculation unit 14 needs to perform sequence specificity calculation for a huge number of combinations of primer candidates and genes, a dedicated circuit is used by FPGA (Field Programmable Gate Array) or the like for high speed calculation. It may be configured. The dedicated circuit will be described later.

プライマーセット探索装置１０は、プライマー候補のＴｍ値を計算する熱力学特性計算部１６を備えている。Ｔｍ値は、プライマーのうち半数がターゲットの遺伝子にハイブリダイズする温度である。熱力学特性計算部１６は、まず、プライマー配列とターゲット配列から、最近接塩基対法（Nearest Neighbor法）を用いて、ハイブリダイゼーション反応におけるギブス自由エネルギー変化ΔＧを計算する。この際、熱力学特性計算部１６は、Ｍｇ^２＋やＮａ^＋などのイオン濃度を考慮してもよい。 The primer set search apparatus 10 includes a thermodynamic characteristic calculation unit 16 that calculates Tm values of primer candidates. The Tm value is the temperature at which half of the primers hybridize to the target gene. The thermodynamic characteristic calculation unit 16 first calculates the Gibbs free energy change ΔG in the hybridization reaction from the primer sequence and the target sequence using the nearest base pair method (Nearest Neighbor method). At this time, the thermodynamic characteristic calculation unit 16 may consider the ion concentration of Mg ²⁺ , Na ⁺ or the like.

次に、下記の式（１）を用いて、温度Ｔにおけるハイブリダイゼーションの割合ｆを求める。
Next, the hybridization rate f at temperature T is determined using the following equation (1).

ここで、プライマー濃度Ｃｐ、気体定数Ｒには、実際にプライマーを用いてシークエンシングするときの条件を設定する。例えば、プライマー濃度Ｃｐを０．５（μmol/l）、気体定数Ｒを１．９８７２（cal/mol K）とすることができる。熱力学特性計算部１６は、ｆ＝０．５となるＴを求め、Ｔｍ値とする。 Here, the primer concentration Cp and the gas constant R are set with conditions for actual sequencing using primers. For example, the primer concentration Cp can be 0.5 (μmol / l) and the gas constant R can be 1.9872 (cal / mol K). The thermodynamic characteristic calculation unit 16 obtains T at which f = 0.5 and sets it as the Tm value.

プライマーセット探索装置１０は、プライマーセットを用いる温度条件として温度Ｔｃを入力する温度条件入力部１７を有している。温度条件入力部１７は、温度条件Ｔｃの上限値と下限値の入力を受け付ける。また、プライマーセット探索装置１０は、温度条件Ｔｃに合致しないＴｍ値を有するプライマー候補を除外して配列特異性ＤＢ１５を更新するデータベース更新部（以下、「ＤＢ更新部」という）１８を備えている。 The primer set search apparatus 10 has a temperature condition input unit 17 that inputs a temperature Tc as a temperature condition using the primer set. The temperature condition input unit 17 receives input of an upper limit value and a lower limit value of the temperature condition Tc. In addition, the primer set search apparatus 10 includes a database update unit (hereinafter referred to as “DB update unit”) 18 that updates the sequence specificity DB 15 by excluding primer candidates having Tm values that do not match the temperature condition Tc. .

プライマーセット探索装置１０は、配列特異性ＤＢ１５に記憶されたデータを用いて、目的遺伝子になるべく多くハイブリダイズすると共に、非目的遺伝子に対するクロスハイブリダイゼーションがなるべく少ないプライマーセットを探索するプライマーセット探索部１９を有している。プライマーセットの探索アルゴリズムについては後述する。 The primer set search device 10 uses the data stored in the sequence specificity DB 15 to hybridize as much as possible to the target gene and to search for a primer set with as little cross hybridization to the non-target gene as possible. have. The primer set search algorithm will be described later.

プライマーセット探索装置１０は、プライマーセット探索部１９にて探索されたプライマーセットを出力する出力部２０を有している。出力部２０は、探索されたプライマーセットを表示してもよいし、探索結果を他の装置に対して送信することにより、他の装置から出力させてもよい。 The primer set search apparatus 10 has an output unit 20 that outputs the primer set searched by the primer set search unit 19. The output unit 20 may display the searched primer set, or may output the search result from another device by transmitting the search result to the other device.

図３を参照して、配列特異性計算部１４を構成する専用回路について説明する。図３では、ｂ塩基長のプライマー候補と、目的遺伝子および非目的遺伝子とを比較する例を示している。図３では、プライマー候補の相補配列を「クエリー配列」、目的遺伝子及び非目的遺伝子を「データベース配列」と記載している。 With reference to FIG. 3, the dedicated circuit constituting the sequence specificity calculation unit 14 will be described. FIG. 3 shows an example in which a b-base length primer candidate is compared with a target gene and a non-target gene. In FIG. 3, the complementary sequence of the candidate primer is described as “query sequence”, and the target gene and non-target gene are described as “database sequence”.

専用回路は、塩基長ｂ（ｎｔ）の配列どうしがマッチするかどうかを比較する配列比較ユニット４０を有している。比較対象の配列に含まれる塩基を並列的に比較する。図３では、比較対象として塩基長ｂ（ｎｔ）の２つの配列が入力され、配列に含まれるｂ個の塩基が同時に比較される。 The dedicated circuit has a sequence comparison unit 40 that compares whether the sequences having the base length b (nt) match each other. The bases contained in the sequences to be compared are compared in parallel. In FIG. 3, two sequences having a base length b (nt) are input as comparison targets, and b bases included in the sequences are simultaneously compared.

配列比較ユニット４０には、閾値が入力される。配列比較ユニット４０は、配列のマッチング率が閾値より高い場合に、ヒットすると判定する。例えば、閾値として「１０」が入力された場合には、配列中の塩基のうちの１０塩基以上が一致すれば、ヒットと判定する。なお、配列比較ユニット４０に入力される閾値は、探索条件として入力された塩基長とマッチング率から計算される。配列比較ユニット４０は、比較結果に基づいて「ｈｉｔ」情報を出力する。「ｈｉｔ」情報の値は、例えば、マッチした塩基数が閾値以上の場合に「１」、閾値未満の場合に「０」とする。 A threshold value is input to the sequence comparison unit 40. The sequence comparison unit 40 determines that a hit occurs when the sequence matching rate is higher than the threshold. For example, when “10” is input as the threshold value, a hit is determined if 10 or more of the bases in the sequence match. Note that the threshold value input to the sequence comparison unit 40 is calculated from the base length input as the search condition and the matching rate. The sequence comparison unit 40 outputs “hit” information based on the comparison result. The value of the “hit” information is, for example, “1” when the number of matched bases is greater than or equal to a threshold, and “0” when less than the threshold.

専用回路は、上記した配列比較ユニット４０が３２個集まって、３２並列配列比較回路４１を構成している。３２並列配列比較回路４１は、ｂ塩基長の配列を（ｂ＋３１）塩基長の配列を比較して３２ビットの情報（「ｈｉｔ」）を出力する。専用回路は、さらに、３２並列比較回路４１を２並列化している。以上の構成により、専用回路は、スループットの高い配列特異性計算を行うことができる。 The dedicated circuit constitutes a 32 parallel arrangement comparison circuit 41 by collecting 32 arrangement comparison units 40 described above. The 32-parallel sequence comparison circuit 41 compares the b-base length sequence with the (b + 31) base-length sequence and outputs 32-bit information (“hit”). The dedicated circuit further has two parallel comparison circuits 41 in parallel. With the above configuration, the dedicated circuit can perform sequence specificity calculation with high throughput.

また、配列特異性計算のスループットをさらに高めるために、ターゲット遺伝子の配列を、1クロックで２塩基分シフトするシフト型のレジスタに収納し、計算対象であるレジスタの下位ビットのみを比較回路につないでもよい。通常は、比較回路への入力として、レジスタに収めたプライマー候補配列を用いるが、多数のプライマー候補について計算を行うには、レジスタの内容を入れ替えなければならない。上記した構造によって、比較回路への入力を、少ない論理リソースで高速に入れ替えることができ、下流に接続された３２並列配列比較回路４１のスループットを最大限に引き出すことができる。 In order to further increase the throughput of sequence specificity calculation, the target gene sequence is stored in a shift-type register that shifts by 2 bases in one clock, and only the lower bits of the register to be calculated are connected to the comparison circuit. But you can. Normally, primer candidate sequences stored in a register are used as an input to the comparison circuit. However, in order to calculate a large number of primer candidates, the contents of the register must be exchanged. With the above-described structure, the input to the comparison circuit can be switched at high speed with a small number of logical resources, and the throughput of the 32 parallel array comparison circuits 41 connected downstream can be maximized.

図４は、配列特異性（アラインメント）の計算時間を示す図である。実験条件は、ヒトＥＲＢＢ遺伝子３’端のプライマー候補配列（２５ｎｔ）４８８個と、ヒトｍＲＮＡデータベースとのギャップなしアラインメントを行い、ヒットの閾値を８０％として計算するのに要する時間を測定した（ＣＰＵ：ＩｎｔｅｌＸｅｏｎ３．８ＧＨｚ,ＯＳ：Ｌｉｎｕｘ）。Ｃ言語で実装したＮａｉｖｅでは１００００秒以上かかり、高速アライメントソフトウェアＬＡＳＴを用いても１０００秒近くかかった。ＦＰＧＡに、上記専用回路を構成したシステムでは、１０秒強でアライメントが可能であった。以上、配列特異性計算部１４を構成する専用回路について説明した。 FIG. 4 is a diagram showing the calculation time of sequence specificity (alignment). The experimental condition was that 488 candidate primer sequences (25 nt) at the 3 ′ end of the human ERBB gene were aligned with the human mRNA database without gaps, and the time required to calculate the hit threshold as 80% was measured (CPU : Intel Xeon 3.8 GHz, OS: Linux). In the case of Naive implemented in C language, it took more than 10,000 seconds, and even using the high-speed alignment software LAST, it took nearly 1000 seconds. In a system in which the dedicated circuit is configured in the FPGA, alignment was possible in just over 10 seconds. Heretofore, the dedicated circuit constituting the sequence specificity calculation unit 14 has been described.

図５は、上記に説明したプライマーセット探索装置１０のハードウェア構成を示す図である。プライマーセット探索装置１０は、ＣＰＵ５０、ＲＡＭ５１、ＲＯＭ５２、通信インターフェース５４、ハードディスク５５、操作部５６、ディスプレイ５７がデータバス５９によって接続されたコンピュータによって構成される。また、上記した配列特異性を計算する専用回路を有するＦＰＧＡボード５８がコンピュータに装着される。ＣＰＵ５０が、ＲＯＭ５２に書き込まれたプログラム５３に従って演算処理を実行することにより、上記したプライマーセット探索装置１０の機能が実現される。このようなプログラム５３は、本発明の範囲に含まれる。ハードディスク５５には、配列特異性ＤＢ１５のデータが記憶される。 FIG. 5 is a diagram showing a hardware configuration of the primer set search apparatus 10 described above. The primer set search apparatus 10 is configured by a computer in which a CPU 50, a RAM 51, a ROM 52, a communication interface 54, a hard disk 55, an operation unit 56, and a display 57 are connected by a data bus 59. Further, the FPGA board 58 having a dedicated circuit for calculating the sequence specificity is mounted on the computer. The function of the primer set search device 10 described above is realized by the CPU 50 executing arithmetic processing according to the program 53 written in the ROM 52. Such a program 53 is included in the scope of the present invention. The hard disk 55 stores data of the sequence specificity DB 15.

図６は、本実施の形態のプライマーセット探索装置１０の動作を示す図である。まず、プライマーセット探索装置１０は、目的遺伝子配列および非目的遺伝子配列の入力を受け付ける（Ｓ１０）。具体的には、オペレータが、目的遺伝子の特徴および非目的遺伝子の特徴を指定すると、プライマーセット探索装置１０は、その特徴に合致する目的遺伝子配列および非目的遺伝子配列をｃＤＮＡライブラリ３０から読み出す。 FIG. 6 is a diagram illustrating the operation of the primer set search apparatus 10 of the present embodiment. First, the primer set search device 10 receives input of a target gene sequence and a non-target gene sequence (S10). Specifically, when the operator specifies the characteristics of the target gene and the characteristics of the non-target gene, the primer set search apparatus 10 reads out the target gene sequence and the non-target gene sequence that match the characteristics from the cDNA library 30.

次に、プライマーセット探索装置１０は、探索条件として、プライマーセットに含めるプライマーの塩基長と、プライマーセットに含めるプライマーの数と、温度条件Ｔｃの入力を受け付ける（Ｓ１２）。プライマーセット探索装置１０は、入力された塩基長のすべてのプライマーをプライマーセットに含めるプライマーの候補として扱う。 Next, the primer set search apparatus 10 receives input of the base length of the primer included in the primer set, the number of primers included in the primer set, and the temperature condition Tc as search conditions (S12). The primer set search apparatus 10 treats all primers having the input base length as candidate primers to be included in the primer set.

続いて、プライマーセット探索装置１０は、それぞれのプライマー候補が、目的遺伝子および非目的遺伝子にハイブリダイズするか否かを計算し、計算結果に基づいて、図２に示すような配列特異性ＤＢ１５を生成する（Ｓ１４）。ここでは、ターゲットの遺伝子配列が、プライマーの配列に対して所定の閾値以上の割合でマッチングする相補的な配列を含んでいる場合に、ハイブリダイズすると判定する。この閾値は、システムの設計により適切に設定することができる。配列特異性ＤＢ１５の生成に際して、いずれの目的遺伝子にもハイブリダイズしないプライマーは、プライマー候補から除外し、配列特異性ＤＢ１５には含めないこととしてもよい。 Subsequently, the primer set search apparatus 10 calculates whether or not each candidate primer hybridizes to the target gene and the non-target gene, and based on the calculation result, the sequence specificity DB 15 as shown in FIG. Generate (S14). Here, when the target gene sequence includes a complementary sequence that matches the primer sequence at a ratio equal to or higher than a predetermined threshold value, it is determined to hybridize. This threshold value can be appropriately set depending on the system design. In the generation of the sequence specificity DB15, primers that do not hybridize to any target gene may be excluded from the primer candidates and not included in the sequence specificity DB15.

プライマーセット探索装置１０は、最近接塩基対法（Nearest Neighbor法）を用いて、配列特異性ＤＢ１５に記憶されたプライマー候補のＴｍ値を求める。続いて、求めたＴｍ値と入力された温度条件Ｔｃとに基づいて、配列特異性ＤＢ１５のデータを更新する（Ｓ１６）。具体的には、温度条件Ｔｃに合致しないＴｍ値を有するプライマー候補を配列特異性ＤＢ１５から削除する。このようなプライマー候補をあらかじめ配列特異性ＤＢ１５から除外しておくことにより、プライマーセット探索のための計算の負荷を軽減できる。 The primer set search apparatus 10 obtains the Tm value of the candidate primer stored in the sequence specificity DB 15 using the nearest base pair method (Nearest Neighbor method). Subsequently, the data of the sequence specificity DB 15 is updated based on the obtained Tm value and the input temperature condition Tc (S16). Specifically, primer candidates having Tm values that do not match the temperature condition Tc are deleted from the sequence specificity DB 15. By excluding such primer candidates from the sequence specificity DB 15 in advance, it is possible to reduce the calculation load for the primer set search.

次に、プライマーセット探索装置１０は、配列特異性ＤＢ１５に記憶されたデータに基づいて、プライマー候補の中からプライマーセットに採用するプライマーの組み合わせを探索する（Ｓ１８）。 Next, the primer set search device 10 searches for primer combinations to be adopted for the primer set from the primer candidates based on the data stored in the sequence specificity DB 15 (S18).

図７は、プライマーの組み合わせ探索の動作を示す図である。プライマーセット探索部１９は、探索条件として入力されたプライマー数と同数のプライマー候補をランダムに選択し、初期プライマーセットを生成する（Ｓ３０）。ここで、プライマー候補を選択する際には、同じプライマー候補を重複して選択することを許容してもよい。 FIG. 7 is a diagram illustrating an operation of searching for a combination of primers. The primer set search unit 19 randomly selects the same number of primer candidates as the number of primers input as a search condition, and generates an initial primer set (S30). Here, when selecting candidate primers, it may be allowed to select the same candidate candidates in duplicate.

次に、初期プライマーセットの評価値Ｓ_０を求める（Ｓ３２）。以下、プライマーセットを特定しないで評価値に言及するときは「評価値Ｓ」という。評価値Ｓは、ハイブリダイズする目的遺伝子の割合Ｃｔが大きくなるほど高い値をとり、クロスハイブリダイズする非目的遺伝子の割合Ｃｎが大きくなるほど低い値をとる。 Next, determine the evaluation value _{S 0} of the initial primer set (S32). Hereinafter, when the evaluation value is referred to without specifying the primer set, it is referred to as “evaluation value S”. The evaluation value S takes a higher value as the proportion Ct of the target gene to hybridize increases, and takes a lower value as the proportion Cn of the non-target gene to cross hybridize increases.

図８は、評価値Ｓについて説明するための図である。図８では、プライマー候補Ｐ１〜Ｐ６のうちの、プライマーＰ１、Ｐ３、Ｐ４の組み合わせからなるプライマーセットについて評価値Ｓを求める例を示している。 FIG. 8 is a diagram for explaining the evaluation value S. FIG. 8 shows an example in which the evaluation value S is obtained for a primer set composed of a combination of primers P1, P3, and P4 among the primer candidates P1 to P6.

図８に示すプライマーセットは、目的遺伝子Ｔ１〜Ｔ６のうちのＴ１〜Ｔ５をカバーしている。すなわち、プライマーＰ１が目的遺伝子Ｔ１、Ｔ３、Ｔ４にハイブリダイズし、プライマーＰ３が目的遺伝子Ｔ２、Ｔ４にハイブリダイズし、プライマーＰ４が目的遺伝子Ｔ１、Ｔ５にハイブリダイズする。従って、プライマーセットがカバーする目的遺伝子の割合Ｃｔは５／６である。また、図８に示すプライマーセットは、非目的遺伝子Ｎ１、Ｎ３、Ｎ４にクロスハイブリダイズするので、クロスハイブリダイズする非目的遺伝子の割合Ｃｎは３／６である。このように配列特異性ＤＢ１５に記憶されたデータを用いて、プライマーセットがハイブリダイズする目的遺伝子の割合Ｃｔおよび非目的遺伝子の割合Ｃｎを求める。 The primer set shown in FIG. 8 covers T1 to T5 among the target genes T1 to T6. That is, primer P1 hybridizes to target genes T1, T3, T4, primer P3 hybridizes to target genes T2, T4, and primer P4 hybridizes to target genes T1, T5. Therefore, the ratio Ct of the target gene covered by the primer set is 5/6. Further, since the primer set shown in FIG. 8 cross-hybridizes to non-target genes N1, N3, and N4, the ratio Cn of non-target genes that cross-hybridize is 3/6. Using the data stored in the sequence specificity DB 15 as described above, the ratio Ct of the target gene to which the primer set hybridizes and the ratio Cn of the non-target gene are obtained.

評価値Ｓは、目的遺伝子の割合Ｃｔおよび非目的遺伝子の割合Ｃｎを用いて、例えば、以下の式（２）または式（３）によって求める。
なお、上記算式は例示であり、別の算式を用いて評価値Ｓを決定することもできる。 The evaluation value S is obtained by the following formula (2) or formula (3) using the ratio Ct of the target gene and the ratio Cn of the non-target gene, for example.
In addition, the said formula is an illustration and the evaluation value S can also be determined using another formula.

次に、プライマーセット探索部１９は、プライマーセットの中から、１つのプライマーをランダムに選択する（Ｓ３４）。説明の便宜上、選択されたプライマーをプライマーＡという。プライマーセット探索部１９は、プライマーＡと交換する対象となるプライマーをプライマー候補の中から選択する（Ｓ３６）。 Next, the primer set search unit 19 randomly selects one primer from the primer set (S34). For convenience of explanation, the selected primer is referred to as primer A. The primer set search unit 19 selects a primer to be exchanged with the primer A from the primer candidates (S36).

具体的には、プライマーセット探索部１９は、プライマーセットに含まれるプライマーＡ以外のプライマー（例えば、プライマー数がｃ個であれば、プライマーＡ以外の（ｃ−１）個のプライマー）と、ダイマーを形成しないプライマーを交換可能なプライマー候補として選択する。ここでは、説明の便宜上、交換する対象として、ｍ個のプライマー候補が存在するとする。 Specifically, the primer set search unit 19 includes a primer other than the primer A included in the primer set (for example, if the number of primers is c, (c-1) primers other than the primer A) and a dimer. Primers that do not form are selected as exchangeable primer candidates. Here, for convenience of explanation, it is assumed that there are m primer candidates to be exchanged.

プライマーＡを交換する前のプライマーセットの評価値Ｓ_ｔとする。ｔは、プライマーセットに含まれるプライマーＡを別のプライマー候補で交換する繰り返し処理を行なった回数を示す。初期プライマーセットの場合には、ｔ＝０である。プライマーセット探索装置１０は、ｍ個のプライマー候補のそれぞれで置き換えた後のｍ個のプライマーセットのそれぞれの評価値Ｓ_{ｔ＋１，１}，Ｓ_{ｔ＋１，２}，・・・Ｓ_{ｔ＋１，ｍ}を計算する（Ｓ３８）。 And the evaluation value S _t of the primer sets of before replacing the primer A. t indicates the number of times of repeated processing for exchanging the primer A included in the primer set with another primer candidate. In the case of the initial primer set, t = 0. The primer set search apparatus 10 calculates the evaluation values S _{t + 1,1} , S _{t + 1,2} ,... S _{t + 1, m} of _{the m} primer sets after being replaced with each of the m primer candidates ( S38).

次に、プライマーセット探索部１９は、評価値Ｓ_{ｔ＋１，１}，Ｓ_{ｔ＋１，２}，・・・Ｓ_{ｔ＋１，ｍ}に基づいて、プライマーＡと交換するプライマー候補を決定し、プライマーセットを決定する。具体的には、まず、評価値Ｓが大きいほど大きい確率値をとる確率分布と、計算されたプライマーセットの評価値Ｓ_{ｔ＋１，１}，Ｓ_{ｔ＋１，２}，・・・Ｓ_{ｔ＋１，ｍ}とに基づいて、各プライマーセットを選択する確率値を決定する（Ｓ４０）。 Next, the primer set search unit 19 determines a candidate primer to be exchanged with the primer A based on the evaluation values S _{t + 1,1} , S _{t + 1,2} ,... S _{t + 1, m} and determines a primer set. Specifically, first, based on the probability distribution that takes a larger probability value as the evaluation value S is larger, and the calculated evaluation values S _{t + 1,1} , S _{t + 1} , ₂ ,... S _{t + 1, m of} the primer set. Then, a probability value for selecting each primer set is determined (S40).

図９は、確率分布を示す図である。図９に示す確率分布は、
をグラフ化したものである。式（４）において、ｍｉｎ［ａ，ｂ］は、ａとｂのうち小さい方を選択する関数である。このように、ｍｉｎ関数を用いて１とexp{k (Ｓ_t+1,i−Ｓ_t)}とを比較して小さい方の値を採用する構成により、現在より評価値が高くなる場合は、全て１となり、等確率で選択されることになる。なお、ｍｉｎ関数を用いないで、例えば、下記式（５）によって、確率分布を求めてもよい。
FIG. 9 is a diagram showing a probability distribution. The probability distribution shown in FIG.
Is a graph. In equation (4), min [a, b] is a function that selects the smaller of a and b. As described above, when the evaluation value is higher than the present value by using the smaller value by comparing 1 and exp {k (S _{t + 1, i} −S _t )} using the min function. , All become 1, and are selected with equal probability. In addition, you may obtain | require probability distribution by following formula (5), for example, without using min function.

プライマーセット探索部１９は、常に評価値Ｓが最大のプライマーセットを選択するのではなく、確率分布を用いてプライマーセットを選択する。図９に示す例では、評価値Ｓ_t+1,1のプライマーセットが選択される確率値はＰ１、評価値Ｓ_t+1,2のプライマーセットが選択される確率値はＰ２、評価値Ｓ_t+1,3のプライマーセットが選択される確率値はＰ３である。プライマーセット探索部１９は、評価値Ｓによって決まる確率値に従って、プライマーセットを決定し、プライマーセットの評価値Ｓを記憶部に記憶する（Ｓ４２）。 The primer set search unit 19 does not always select a primer set having the maximum evaluation value S, but selects a primer set using a probability distribution. In the example shown in FIG. 9, the probability value that the primer set of the evaluation value S _{t + 1,1} is selected is P1, the probability value of the primer set of the evaluation value S _{t + 1,2} is P2, and the evaluation value S _The probability value that the primer set of _{t + 1,3} is selected is P3. The primer set search unit 19 determines a primer set according to the probability value determined by the evaluation value S, and stores the evaluation value S of the primer set in the storage unit (S42).

プライマーセット探索部１９は、プライマーセット探索の終了条件を満たすか否かを判定する（Ｓ４４）。終了条件としては、例えば、プライマーセットの探索回数、すなわち、プライマーを交換した回数が所定の閾値ｎに達したこととすることができる。閾値ｎとしては、例えば、１００００回のオーダーの値を設定することができる。与えられた探索条件においてハイブリダイズする目的遺伝子の数を最大とし、クロスハイブリダイゼーションを最小とする理想的な解に近いプライマーセットを求めることができる。 The primer set search unit 19 determines whether or not the primer set search end condition is satisfied (S44). As the termination condition, for example, the number of primer set searches, that is, the number of primer replacements, can reach a predetermined threshold value n. As the threshold value n, for example, an order value of 10,000 times can be set. A primer set close to an ideal solution that maximizes the number of target genes that hybridize under given search conditions and minimizes cross hybridization can be obtained.

プライマーセット探索部１９は、終了条件を満たした場合には、記憶部に記憶された評価値Ｓが最大のプライマーセットを、探索結果として決定する（Ｓ４６）。 When the termination condition is satisfied, the primer set search unit 19 determines a primer set having the maximum evaluation value S stored in the storage unit as a search result (S46).

図６に戻って、プライマーセット探索装置１０は、プライマーセット探索部１９にて探索されたプライマーセットを探索結果として出力する（Ｓ２０）。以上、第１の実施の形態のプライマーセット探索装置１０の構成および動作について説明した。 Returning to FIG. 6, the primer set search apparatus 10 outputs the primer set searched by the primer set search unit 19 as a search result (S20). Heretofore, the configuration and operation of the primer set search device 10 of the first embodiment have been described.

本実施の形態のプライマーセット探索装置１０およびプライマーセット探索方法は、目的遺伝子にハイブリダイズするプライマーの割合Ｃｔと非目的遺伝子にクロスハイブリダイズするプライマーの割合Ｃｎとに基づいて求めたプライマーセットの評価値Ｓを用いて、プライマーセットを探索するので、クロスハイブリダイゼーションをできるだけ少なくし、かつ、できるだけ多くの目的遺伝子にハイブリダイズするプライマーセットを求めることができる。 The primer set search apparatus 10 and the primer set search method according to the present embodiment evaluate the primer set obtained based on the ratio Ct of the primer hybridizing to the target gene and the ratio Cn of the primer cross-hybridizing to the non-target gene. Since the primer set is searched using the value S, it is possible to obtain a primer set that minimizes cross-hybridization and hybridizes to as many target genes as possible.

また、プライマーセット探索装置１０およびプライマーセット探索方法は、評価値が高くなるほど高い確率値をとる確率分布とプライマーセットの評価値とに基づいて、交換するプライマー候補を選択するので、局所最大に陥ることなく、適切なプライマーセットを探索することができる。 In addition, the primer set search apparatus 10 and the primer set search method select a candidate primer to be exchanged based on a probability distribution that takes a higher probability value as the evaluation value becomes higher and an evaluation value of the primer set, so that a local maximum is reached. It is possible to search for an appropriate primer set.

（第２の実施の形態）
次に、本発明の第２の実施の形態のプライマーセット探索装置１０およびプライマーセット探索方法について説明する。第２の実施の形態のプライマーセット探索装置１０の基本的な構成は、第１の実施の形態と同じであるが（図１参照）、第２の実施の形態のプライマーセット探索装置１０は、プライマーセットの探索を行う際に、プライマーセットを選択するための確率分布を変化させる点が異なる。 (Second Embodiment)
Next, the primer set search device 10 and the primer set search method according to the second embodiment of the present invention will be described. The basic configuration of the primer set search device 10 of the second embodiment is the same as that of the first embodiment (see FIG. 1), but the primer set search device 10 of the second embodiment is The difference is that the probability distribution for selecting a primer set is changed when searching for the primer set.

第２の実施の形態では、評価値Ｓとして、
を用い、プライマー候補Ａを別のプライマー候補と入れ替えたときの確率Ｐｉを、
によって求める。 In the second embodiment, as the evaluation value S,
, And the probability Pi when replacing candidate primer A with another candidate primer,
Ask for.

第２の実施の形態では、プライマーを交換する繰り返し処理ごとに、ｔ＋１回の繰り返し処理後に求めたプライマーセットの評価値Ｓ_ｔ＋１とｔ回の繰り返し処理後に求めたプライマーセットの評価値Ｓ_ｔとを比較して、Ｓ_ｔ＋１＜Ｓ_ｔの場合には、ｋ_iを1.05倍し、確率分布を変更する。このように確率分布を変更することにより、評価値Ｓ_ｉ＋１が前回の評価値Ｓ_ｉより大きくなった場合には、ｋ_iを大きくすることにより、評価値Ｓが高いプライマーセットが選ばれる確率を高くする。これにより、与えられた探索条件においてハイブリダイズする目的遺伝子の数を最大とし、クロスハイブリダイゼーションを最小とする理想的な解に近いプライマーセットに早く辿り着くことができる。 In the second embodiment, for each iterative process of exchanging primer, and the evaluation value S _t primer sets obtained after iteration evaluation value S _{t + 1} and t times primer sets obtained after t + 1 iterations process In comparison, if S _{t + 1} <S _t , k _i is multiplied by 1.05 to change the probability distribution. By changing the probability distribution in this way, when the evaluation value S _{i + 1} becomes larger than the previous evaluation value S _i , the probability that a primer set having a high evaluation value S is selected is increased by increasing k _i. Make it high. As a result, it is possible to quickly reach a primer set close to an ideal solution that maximizes the number of target genes that hybridize under a given search condition and minimizes cross hybridization.

本実施の形態では、プライマーセットの評価値Ｓ_ｔ、Ｓ_ｔ＋１の大小関係に基づいて確率分布を変更する例を挙げたが、別の基準に従って確率分布を変更してもよい。例えば、プライマーＡを交換する繰り返し処理を行なった回数に基づいて、確率分布を変更してもよい。 In the present embodiment, the example in which the probability distribution is changed based on the magnitude relationship between the evaluation values S _t and S _{t + 1} of the primer set has been described. However, the probability distribution may be changed according to another criterion. For example, the probability distribution may be changed based on the number of times that the repetition process for exchanging the primer A is performed.

次に、本発明の実施例について説明する。
最初に実験条件について説明する。１０００個の目的遺伝子配列をランダムに選択し、１０００個の非目的遺伝子配列をハウスキーピング遺伝子からランダムに選択した。探索条件は、以下の表１に示すとおりである。
Next, examples of the present invention will be described.
First, experimental conditions will be described. 1000 target gene sequences were randomly selected, and 1000 non-target gene sequences were randomly selected from housekeeping genes. Search conditions are as shown in Table 1 below.

この探索条件で、２万回の繰り返し探索を行った。また、比較のため、Ｇｒｅｅｄｙアルゴリズムにより、プライマーセットの探索を行った。以下の表２に、探索されたプライマーセットによる目的遺伝子のカバー数と非目的遺伝子へのクロスハイブリダイゼーション数を示す。下記表において、括弧内は、目的遺伝子数から非目的遺伝子数を減算した値である。この数値が大きいほど、クロスハイブリダイゼーションが少なく、かつ、多くの目的遺伝子にハイブリダイズするプライマーセットであると評価できる。
Under this search condition, the search was repeated 20,000 times. For comparison, a primer set was searched by the Greedy algorithm. Table 2 below shows the number of target gene covers and the number of cross-hybridizations to non-target genes by the searched primer sets. In the following table, the values in parentheses are values obtained by subtracting the number of non-target genes from the number of target genes. It can be evaluated that the larger the numerical value, the less the cross-hybridization and the primer set that hybridizes to many target genes.

上記表２に示した結果によれば、目的遺伝子カバー数から非目的遺伝子へのクロスハイブリダイゼーション数を減算して、各実験結果を評価してみると、いずれの探索条件においても、本発明はＧｒｅｅｄｙアルゴリズムより適切な探索結果が得られたことが分かる。 According to the results shown in Table 2 above, subtracting the number of cross-hybridization from the target gene cover number to the non-target gene and evaluating each experimental result, the present invention is It can be seen that an appropriate search result was obtained from the Greedy algorithm.

以上、本発明のプライマーセット探索装置およびプライマーセット探索方法について、図面を参照して詳細に説明したが、本発明は上記した実施の形態に限定されるものではない。 As mentioned above, although the primer set search apparatus and primer set search method of the present invention have been described in detail with reference to the drawings, the present invention is not limited to the above-described embodiments.

上記した実施の形態では、プライマーセット探索の終了条件を、探索回数が予め設定された閾値に達したこととしたが、プライマーセット探索の終了条件はこれに限定されない。例えば、終了条件として、繰り返し処理ごとの評価値の振れ幅が一定の範囲内に収束したことや、評価値が所定の閾値を超えたことなどを終了条件としてもよい。 In the above-described embodiment, the primer set search end condition is that the search count has reached a preset threshold, but the primer set search end condition is not limited to this. For example, the end condition may be that the fluctuation width of the evaluation value for each repetition process has converged within a certain range, or that the evaluation value has exceeded a predetermined threshold.

上記した実施の形態では、熱力学特性計算部１６にてプライマー候補のＴｍ値を計算する例について説明したが、プライマー候補のＴｍ値をあらかじめ記憶しておいてもよい。あるいは、一度求めたＴｍ値を次回以降の探索のために蓄積していくこととしてもよい。これにより、プライマー候補のＴｍ値を計算する時間を割愛、あるいは短縮することができ、より短時間で適切なプライマーセットの探索を行うことができる。 In the above-described embodiment, an example in which the Tm value of the primer candidate is calculated by the thermodynamic characteristic calculation unit 16 has been described. However, the Tm value of the primer candidate may be stored in advance. Alternatively, the Tm value obtained once may be accumulated for the next and subsequent searches. Thereby, the time for calculating the Tm value of the primer candidate can be omitted or shortened, and an appropriate primer set can be searched for in a shorter time.

上記した実施の形態では、配列特異性の計算を行った後に熱力学特性の計算を行ってプライマー候補の絞り込みを行う例について説明したが、先に熱力学特性の計算を行って、温度条件Ｔｃに基づいてプライマー候補の絞り込みを行なってもよい。特に、プライマー候補の熱力学特性があらかじめ求めてある場合には、先に温度条件Ｔｃによる絞り込みを行うことが有効である。 In the above-described embodiment, the example in which the calculation of the thermodynamic characteristics is performed and the primer candidates are narrowed down after the calculation of the sequence specificity has been described. However, the thermodynamic characteristics are calculated first and the temperature condition Tc is calculated. Based on the above, candidate primers may be narrowed down. In particular, when the thermodynamic characteristics of the primer candidate are obtained in advance, it is effective to narrow down the temperature condition Tc first.

本発明は、非目的遺伝子群へのクロスハイブリダイゼーションが少なく、かつ、多くの目的遺伝子群にハイブリダイズするプライマーセットを探索でき、シークエンシングに用いるプライマーセットを探索するプライマー探索装置等として有用である。 INDUSTRIAL APPLICABILITY The present invention is useful as a primer search device that searches for primer sets used for sequencing, and can search for primer sets that hybridize to many target gene groups with little cross-hybridization to non-target gene groups. .

１０プライマーセット探索装置
１１目的遺伝子入力部
１２非目的遺伝子入力部
１３探索条件入力部
１４配列特異性計算部
１５配列特異性ＤＢ
１６熱力学特性計算部
１７温度条件入力部
１８ＤＢ更新部
１９プライマーセット探索部
２０出力部
３０ｃＤＮＡライブラリ
４０配列比較ユニット
４１３２並列配列比較回路
５０ＣＰＵ
５１ＲＡＭ
５２ＲＯＭ
５３プログラム
５４通信インターフェース
５５ハードディスク
５６操作部
５７ディスプレイ
５８ＦＰＧＡボード
５９データバス DESCRIPTION OF SYMBOLS 10 Primer set search apparatus 11 Target gene input part 12 Non-target gene input part 13 Search condition input part 14 Sequence specificity calculation part 15 Sequence specificity DB
16 thermodynamic characteristic calculation unit 17 temperature condition input unit 18 DB update unit 19 primer set search unit 20 output unit 30 cDNA library 40 sequence comparison unit 41 32 parallel sequence comparison circuit 50 CPU
51 RAM
52 ROM
53 Program 54 Communication Interface 55 Hard Disk 56 Operation Unit 57 Display 58 FPGA Board 59 Data Bus

Claims

A method for searching for a primer set that hybridizes to a target gene,
Inputting sequences of a plurality of target genes and a plurality of non-target genes;
Entering search conditions for primers to be included in the primer set;
Determining whether or not primer candidates that match the search conditions hybridize to each of the plurality of target genes and the plurality of non-target genes, and generating a database storing the results;
Randomly selecting a number of primers to be included in the primer set from the primer candidates to generate an initial primer set;
(1) The purpose of hybridizing a plurality of primer sets generated when one primer is randomly selected from the primers included in the primer set, and (2) the selected primer is replaced with each of a plurality of candidate primers. Gene ratio Ct and non-target gene ratio Cn are obtained based on the data stored in the database, and (3) a plurality of primer sets based on the target gene ratio Ct and the non-target gene ratio Cn (4) One primer set from the plurality of primer sets based on a probability distribution that gives a probability value corresponding to the evaluation value and each evaluation value of the plurality of primer sets (5) Store the selected primer set and its evaluation value in the storage unit. A search step of performing repeatedly the process of serial (1) to (5),
Outputting a primer set having the highest evaluation value among a plurality of primer sets stored in the storage unit as a search result;
A primer set search method comprising:

2. The primer set search according to claim 1, wherein the step of generating the database excludes primers that do not hybridize to any of the target genes from primers that match the search conditions and does not include them in the database. Method.

The primer set search method according to claim 1 or 2, wherein the evaluation value of the primer set is higher as the ratio Ct of the target gene is higher and lower as the ratio Cn of the non-target gene is higher.

The evaluation value S of the primer set is
S = Ct−α × Cn (where α> 0)
The primer set search method according to claim 3, which is calculated by:

The evaluation value S of the primer set is
S = (Ct + α) / (Cn + β) (where β ≠ 0)
The primer set search method according to claim 3, which is calculated by:

The primer set search method according to any one of claims 1 to 5, wherein the probability distribution is changed based on a change in an evaluation value stored in the storage unit.

The primer set search method according to any one of claims 1 to 5, wherein the probability distribution is changed based on the number of repetitions of the processes (1) to (5).

Calculating a Tm value for each primer candidate and storing the Tm value in the database;
Entering temperature conditions;
Deleting the candidate primer having a Tm value that does not match the input temperature condition from the database;
The primer set search method according to any one of claims 1 to 7.

The primer search method according to claim 1, wherein the search step selects a candidate primer that does not form a dimer with a primer contained in the primer set, as a candidate primer for exchanging one primer.

The step of generating the database includes obtaining the target gene or the non-target gene having a sequence portion that matches a complementary base sequence of a candidate primer at a ratio equal to or higher than a predetermined threshold as a gene to which the candidate primer hybridizes. Item 10. The primer set search method according to any one of Items 1 to 9.

The step of creating the database calculates whether a candidate primer hybridizes to the target gene or the non-target gene by using a dedicated circuit generated by an FPGA (Field Programmable Gate Array). The primer set search method in any one of -10.

The primer set search method according to claim 1, wherein the search step is ended when the number of repetitions of the processes (1) to (5) reaches a predetermined threshold value.

The primer set search method according to claim 1, wherein the search step ends when an evaluation value of the primer set stored in the storage unit reaches a predetermined threshold value.

A program for searching for a primer set that hybridizes to a target gene,
Inputting sequences of a plurality of target genes and a plurality of non-target genes;
Entering search conditions for primers to be included in the primer set;
Determining whether or not primer candidates that match the search conditions hybridize to each of the plurality of target genes and the plurality of non-target genes, and generating a database storing the results;
Randomly selecting a number of primers to be included in the primer set from the primer candidates to generate an initial primer set;
(1) The purpose of hybridizing a plurality of primer sets generated when one primer is randomly selected from the primers included in the primer set, and (2) the selected primer is replaced with each of a plurality of candidate primers. Gene ratio Ct and non-target gene ratio Cn are obtained based on the data stored in the database, and (3) a plurality of primer sets based on the target gene ratio Ct and the non-target gene ratio Cn (4) One primer set from the plurality of primer sets based on a probability distribution that gives a probability value corresponding to the evaluation value and each evaluation value of the plurality of primer sets (5) Store the selected primer set and its evaluation value in the storage unit. And a step to repeat the processing of the serial (1) to (5),
Outputting a primer set having the highest evaluation value among a plurality of primer sets stored in the storage unit as a search result;
A program that executes

An apparatus for searching for a primer set that hybridizes to a target gene,
Means for inputting sequences of a plurality of target genes and a plurality of non-target genes;
Means for inputting search conditions for primers to be included in the primer set;
Means for determining whether or not primer candidates that match the search conditions hybridize to each of the plurality of target genes and the plurality of non-target genes, and generating a database storing the results;
Means for randomly selecting the number of primers to be included in the primer set from the candidate primers and generating an initial primer set;
(1) The purpose of hybridizing a plurality of primer sets generated when one primer is randomly selected from the primers included in the primer set, and (2) the selected primer is replaced with each of a plurality of candidate primers. Gene ratio Ct and non-target gene ratio Cn are obtained based on the data stored in the database, and (3) a plurality of primer sets based on the target gene ratio Ct and the non-target gene ratio Cn (4) One primer set from the plurality of primer sets based on a probability distribution that gives a probability value corresponding to the evaluation value and each evaluation value of the plurality of primer sets (5) Store the selected primer set and its evaluation value in the storage unit. Serial (1) and the means to repeat the process to (5),
Means for outputting a primer set having the highest evaluation value among a plurality of primer sets stored in the storage unit as a search result;
A primer set search device comprising: