JP2009219366A

JP2009219366A - Method for judging haplotype

Info

Publication number: JP2009219366A
Application number: JP2008064140A
Authority: JP
Inventors: Keiko Yonezawa; 恵子米沢; Hisafumi Fukui; 寿文福井
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-03-13
Filing date: 2008-03-13
Publication date: 2009-10-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for efficiently improving a fixing rate of diplotype by selecting an SNP pair for carrying out haplotyping, and combining the SNP pair with SNP typing. <P>SOLUTION: The method for judging haplotype includes carrying out the haplotyping of a part of a plurality of the SNPs constituting the haplotype of the subject gene, carrying out the SNP typing of a part or the whole of the remainder, and carrying out the judgement of the haplotype of the subject gene from the result of both typings. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

遺伝子多型の検出と、それを用いた関連遺伝子探索を行う分野に関連する。また関連が実証された多型マーカーに関して、副作用などフェノタイプの予測を臨床で行う際にも用いられる。 This is related to the field of detecting gene polymorphism and searching for related genes using it. In addition, regarding polymorphic markers that have been proved to be relevant, they are also used in clinical predictions of phenotypes such as side effects.

遺伝子の多型とフェノタイプを関連付けることで、疾患感受性や副作用のマーカーとなる多型をスクリーニングする試みは、近年の多型検出技術の進歩とともに広く行なわれるようになっている。特に、300万〜1000万個とヒトゲノム中に大量に存在し、タイピングも比較的簡単なSNP(Single Nucleotide Polymorphism：一塩基多型)は、ゲノムワイドな関連解析に必須の多型となっている。 Attempts to screen polymorphisms that serve as markers of disease susceptibility and side effects by associating gene polymorphisms with phenotypes have been widely carried out with recent advances in polymorphism detection techniques. In particular, SNP (Single Nucleotide Polymorphism), which exists in the human genome in a large amount of 3 to 10 million, and is relatively easy to type, is an essential polymorphism for genome-wide association analysis. .

近年複数のヒトゲノムのシーケンスが得られるようになり、その多型についても詳細に調べられるようになった。とくに、2005年に発表されたHAPMAPは、ゲノム中での連鎖不平衡地図を提供し、SNPを用いた関連解析に大きな影響を与えている。 In recent years, multiple human genome sequences have been obtained, and their polymorphisms have been examined in detail. In particular, HAPMAP, published in 2005, provides linkage disequilibrium maps in the genome, greatly affecting association analysis using SNPs.

ヒトゲノムは2倍体であるために、SNPのタイピングを行なった結果は2つのアレルを与える。例えばあるSNPの野生型をA、変異型をGとする（以後A>Gと表記）。ここで野生型とは集団内での頻度が多いアレル、変異型とは少ないアレルのことで、通常変異型の頻度が1％以上のものを多型とよぶ。このときSNPタイピングの結果はAA、AG、GGの3種類であり、AA、GGをホモ、AGをヘテロという。このようにSNPのタイピングを行って得られる結果をジェノタイプとよぶ。 Since the human genome is diploid, the result of SNP typing gives two alleles. For example, a wild type of a certain SNP is A, and a mutant type is G (hereinafter referred to as A> G). Here, the wild type is an allele with a high frequency in the population, and the mutant type is an allele with a frequency of 1% or more, usually called a polymorphism. At this time, there are three types of SNP typing results: AA, AG, and GG. AA and GG are called homo and AG is called hetero. The result obtained by typing SNP in this way is called genotype.

複数のSNPについて考える。例えば2箇所のSNPをSNP1、SNP2としてそれぞれA>G、C>Tであるとする。2箇所のSNPタイピングを行った結果(ジェノタイプ)が、SNP1がAGヘテロ、SNP2がCTヘテロであったとする。2箇所のSNPが同一染色体上にあった場合、SNP1とSNP2で物理的に連結されているアレルが何であるかによって、A-C/G-Tの場合と、A-T/G-Cの場合がありうる。このように、物理的に連結されている一本の染色体上のSNPの組み合わせをハプロタイプとよぶ。ここでA-C/G-Tの場合には、A-CとG-Tのハプロタイプをもつことになり、A-T/G-Cの場合にはA-TとG-Cのハプロタイプを持つことになる。二本のハプロタイプのペアを、ディプロタイプと呼ぶ。ここでは、A-C/G-Tというディプロタイプ、もしくはA-T/G-Cというディプロタイプになる。 Think about multiple SNPs. For example, assume that two SNPs are SNP1 and SNP2, and A> G and C> T, respectively. It is assumed that the result of SNP typing at two locations (genotype) is that SNP1 is AG heterogeneous and SNP2 is CT heterozygous. When two SNPs are on the same chromosome, there can be A-C / G-T and A-T / G-C depending on what allele is physically linked by SNP1 and SNP2. A combination of SNPs on a single chromosome that is physically linked in this way is called a haplotype. Here, A-C / G-T has A-C and G-T haplotypes, and A-T / G-C has A-T and G-C haplotypes. A pair of two haplotypes is called a diplotype. Here, the diplotype is A-C / G-T, or the diplotype is A-T / G-C.

ディプロタイプは完全情報であり、ディプロタイプが分ればジェノタイプを知ることができるが、ジェノタイプからディプロタイプは分らないこともある。例えば上記のSNP1:AGヘテロ、SNP2:CTへテロの場合がそうであり、ディプロタイプがA-C/G-Tであるか、A-T/G-Cであるかは判定できない。しかし世の中に知られている多数のSNPタイピング手法によって得られるのはジェノタイプであり、そのためにディプロタイプ、もしくはその構成要素であるハプロタイプの情報は得られない場合がある。 The diplotype is complete information. If the diplotype is known, the genotype can be known, but the diplotype may not be known from the genotype. For example, in the case of the above SNP1: AG hetero, SNP2: CT heterogeneity, it cannot be determined whether the diplotype is A-C / G-T or A-T / G-C. However, the genotype is obtained by many SNP typing methods known in the world, and therefore information on the diplotype or its constituent haplotypes may not be obtained.

遺伝子の情報とフェノタイプを関連付ける相関解析においては、完全情報であるディプロタイプが分ることが望ましい。しかし上記で示したように、SNPのタイピング結果からは、ディプロタイプが分らない場合がある。このとき一般に用いられるのはハプロタイプ推定アルゴリズムであり、複数の人のジェノタイプ結果から集団内に存在するハプロタイプ頻度を統計的に推定する(非特許文献１)。 In correlation analysis that associates gene information with phenotypes, it is desirable to know the diplotype, which is complete information. However, as shown above, the diplotype may not be known from the SNP typing result. A haplotype estimation algorithm that is generally used at this time is a haplotype frequency that is statistically estimated from the genotype results of a plurality of persons (Non-Patent Document 1).

通常は、ある人に対して、推定した結果最も確率の高いディプロタイプをもつものとして解析を進めるが、この場合には解析結果の第一種の過誤(偽陰性)が大きくなる可能性があり、あまり適切な方法とはいえない。 Usually, a person is analyzed as having the most probable diplotype as a result of estimation, but in this case, the first type of error (false negative) in the analysis result may become large. This is not a very appropriate method.

このような問題への対処法として、上記のように推定結果が最も高い確率となるディプロタイプを１つ選ぶのではなく、可能性のあるディプロタイプすべて、確率の重みをつけて足しあげ、頻度の推定と同時に関連解析も行う方法が提案された(非特許文献２、特許文献１)。具体的には、ハプロタイプ頻度とディプロタイプ形に加え、ディプロタイプに基づいた浸透率も同時に推定するアルゴリズムが提供されている。この方法により、それぞれの固体のディプロタイプ型が決定されなくても、集団のジェノタイプ及びフェノタイプが与えられた下で、集団のハプロタイプ頻度、各個体のディプロタイプ分布と浸透率を最尤推定することが可能となった。 As a countermeasure to such a problem, instead of selecting one diplotype with the highest probability of estimation as described above, all possible diplotypes are added with a weight of probability. There has been proposed a method of performing association analysis simultaneously with estimation (Non-patent Document 2, Patent Document 1). Specifically, an algorithm for simultaneously estimating the penetration rate based on the diplotype in addition to the haplotype frequency and the diplotype form is provided. Even if the diplotype type of each individual is not determined by this method, the maximum likelihood estimation of the haplotype frequency of the population, the diplotype distribution and the penetrance of each individual is given given the genotype and phenotype of the population. It became possible to do.

上記の方法を用いれば、各個体のディプロタイプを決定できなくても、集団内での相関解析を行うことができ、疾患や薬剤応答に関連することが疑われる多型マーカー（ハプロタイプ）を探索することができる。しかし実際に得られた多型マーカーを臨床現場で応用する場合には、各個体のディプロタイプを決定できる必要がある。 Using the above method, even if the diplotype of each individual cannot be determined, correlation analysis within the population can be performed, and polymorphic markers (haplotypes) suspected to be related to disease or drug response are searched. can do. However, when the actually obtained polymorphic marker is applied in clinical practice, it is necessary to be able to determine the diplotype of each individual.

例えばあるハプロタイプが副作用に関連することがわかっている場合に、ジェノタイピングの結果からは、上記ハプロタイプを含むディプロタイプと、含まないディプロタイプの両方が可能である場合には、例え副作用に関連するハプロタイプを含むディプロタイプである確率が低かったとしても、無視して投薬することは危険を伴う。逆に危険性があるからといって投薬を行わない場合には、副作用はなくむしろ薬効が期待される人からも治療の機会を奪ってしまうことになる。 For example, when it is known that a haplotype is associated with side effects, the genotyping results indicate that both diplotypes with and without the haplotype are possible, for example, with side effects. Even if the probability of a diplotype including a haplotype is low, it is dangerous to disregard and administer it. On the other hand, if there is a risk, if no medication is given, there will be no side effects, and the treatment will be taken away from those who are expected to be effective.

しかし、異なるディプロタイプが、等しいジェノタイプを与える場合には、通常用いられているSNP検出結果を基に統計解析を行う手法では、実現可能なディプロタイプの事後確率分布を得られるだけで、ディプロタイプを１つに確定することはできない。 However, when different diplotypes give equal genotypes, the method of statistical analysis based on the commonly used SNP detection results can only obtain a posterior probability distribution of the diplotypes that can be realized. The type cannot be fixed to one.

このように異なるディプロタイプが、等しいジェノタイプを与える場合に、そのディプロタイプを決定する方法はいくつか知られている。最も一般的なのは家系情報を用いる方法で、両親のSNPタイピング結果から子供のディプロタイプが確定する場合がある。しかし、2箇所のSNP間で組み替えが起る可能性があり、また両親のジェノタイプによっては子供のディプロタイプが一意に決まらない場合もある。また両親のゲノムが必ずしも手に入るとは限らない。 When such different diplotypes give equal genotypes, several methods for determining the diplotype are known. The most common method is using pedigree information, where the diplotype of the child may be determined from the parents' SNP typing results. However, recombination may occur between the two SNPs, and the diplotype of the child may not be uniquely determined depending on the genotype of the parents. Also, the parents' genomes are not always available.

これに対して、ゲノムから直接ディプロタイプを直接検出しようとする、いわゆるハプロタイピングの開発が進められている。ハプロタイピングには大きく分けて、２つのタイプに分けることができる。第１のタイプは、ゲノムを段階的に希釈することで一倍体からの情報を得ようとするもの(非特許文献３)である。第２のタイプは、ハプロタイプを形成する2箇所のアレルがPCRによる増幅産物中で物理的に連結されていることを利用して検出しようとするもの(特許文献2)である。しかし前者の手法はまだ開発段階であり、後者の手法もSNPタイピングに比べて多くの工程数を要する。また後者の手法のみで複数SNPのハプロタイプを確定しようとするならば、すべてのSNPペアに対してハプロタイピングを行う必要があり、n個のSNPの場合には以下の回数分の検出が必要となる。 On the other hand, development of so-called haplotyping for directly detecting diplotypes directly from the genome is underway. Haplotyping can be broadly divided into two types. The first type is to obtain information from a haploid by diluting the genome in stages (Non-patent Document 3). The second type intends to detect using the fact that two alleles forming a haplotype are physically linked in an amplification product by PCR (Patent Document 2). However, the former method is still in the development stage, and the latter method requires a larger number of processes than SNP typing. If the haplotypes of multiple SNPs are to be determined only by the latter method, it is necessary to perform haplotyping for all SNP pairs. In the case of n SNPs, the following number of detections are required. Become.

Excoffier L, Slatkin M: Molecular Biology of Evolution Vol12 921-927,1995「Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population」Excoffier L, Slatkin M: Molecular Biology of Evolution Vol12 921-927, 1995 `` Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population '' Shibata K, Ito T, Kitamura Y, Iwasaki N, Tanaka H, Kamatani N: Genetics Vol168 525-539,2004「Simultaneous estimation of haplotype frequencies and quantitative trait parameters : applications to the test of association between phenotype and diplotype configuration」Shibata K, Ito T, Kitamura Y, Iwasaki N, Tanaka H, Kamatani N: Genetics Vol168 525-539, 2004 `` Simultaneous estimation of haplotype frequencies and quantitative trait parameters: applications to the test of association between phenotype and diplotype configuration '' Ding C and Cantor C: PNAS Vol100 7449-7453,2003 「Direct molecular haplotyping of long-range genomic DNA with M1-PCR」Ding C and Cantor C: PNAS Vol100 7449-7453,2003 `` Direct molecular haplotyping of long-range genomic DNA with M1-PCR '' 特開2004-354373号公報JP 2004-354373 A 特開2002-272482号公報JP 2002-272482 A

ターゲットとする領域に含まれる複数のSNPを考える場合に、SNPタイピングした結果得られるジェノタイプが、複数の異なるディプロタイプで等しくなる場合がある。このような場合には、SNPタイピングの結果を統計的に解析するたけでは、ディプロタイプを一意に確定することはできない。 When considering a plurality of SNPs included in a target region, genotypes obtained as a result of SNP typing may be equal for a plurality of different diplotypes. In such a case, the diplotype cannot be uniquely determined only by statistically analyzing the result of SNP typing.

ハプロタイプを直接検出するいわゆるハプロタイピングの方法は、現在さまざまな手法の開発が進められているが、一般にSNPタイピングに比べて多くの工程数を要する。またハプロタイピングのみで複数SNPによるハプロタイプを確定するためには、すべてのSNPペア間でハプロタイピングを行う必要があり多数の検出を要する。 Various methods are currently under development for so-called haplotyping methods for directly detecting haplotypes, but generally require a larger number of steps than SNP typing. Further, in order to determine a haplotype by a plurality of SNPs only by haplotyping, it is necessary to perform haplotyping between all SNP pairs, and a large number of detections are required.

そこで本発明では、ハプロタイピングを行うSNPペアを選択しSNPタイピングと組み合わせることで、効率的にディプロタイプの確定率を向上させる手法を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for efficiently improving the diplotype determination rate by selecting an SNP pair for haplotyping and combining it with SNP typing.

本発明のハプロタイプ判定法は、対象遺伝子のハプロタイプを構成する複数のＳＮＰの一部に対してハプロタイピングを行い、残りの一部或いは全部に対してＳＮＰタイピングを行い、両タイピングの結果より前記対象遺伝子のハプロタイプの判定を行うことを特徴とするハプロタイプ判定法である。 The haplotype determination method of the present invention performs haplotyping on a part of a plurality of SNPs constituting a haplotype of a target gene, performs SNP typing on the remaining part or all of the target gene, and determines the target based on the result of both typings. It is a haplotype determination method characterized by determining a haplotype of a gene.

本発明によれば、複数の異なるディプロタイプが、等しいジェノタイプを与える場合に、SNPタイピングをハプロタイピングと組み合わせることでディプロタイプの確定率を向上させる目的において、より少ないハプロタイピングで確定率を向上させることができる。 According to the present invention, when a plurality of different diplotypes give equal genotypes, the rate of determination is improved with less haplotyping in order to improve the rate of determination of diplotypes by combining SNP typing with haplotyping. Can be made.

本発明のハプロタイプ判定法では、対象遺伝子のハプロタイプを構成する複数のＳＮＰの一部に対してハプロタイピングを行い、残りの一部或いは全部に対してＳＮＰタイピングを行い、両タイピングの結果より対象遺伝子のハプロタイプの判定を行う。 In the haplotype determination method of the present invention, haplotyping is performed on a part of a plurality of SNPs constituting the haplotype of the target gene, SNP typing is performed on the remaining part or all of the target gene, and the target gene is determined from the result of both typings. Determine the haplotype.

ハプロタイピングの対象となるSNPは、等しいジェノタイプを持つ複数のディプロタイプに含まれるＳＮＰペアの中から頻度情報に基づいて選択されるが好ましい。また、ＳＮＰタイピングの対象となるＳＮＰは、ハプロタイピングの対象となるＳＮＰ以外のＳＮＰから選択されることが好ましい。更に、ハプロタイピングの対象となるＳＮＰは、ＳＮＰ同士の連鎖不平衡の情報に基づいて選択されることが好ましい。 The SNP to be haplotyping is preferably selected from SNP pairs included in a plurality of diplotypes having the same genotype based on frequency information. Moreover, it is preferable that SNP used as the object of SNP typing is selected from SNP other than SNP used as haplotyping object. Furthermore, it is preferable that the SNP to be haplotyping is selected based on information on linkage disequilibrium between SNPs.

以下に、図１〜５に基づいて本発明の実施の形態を詳細に説明する。図１は、ＳＮＰタイピング用とハプロタイピング用のペアをそれぞれ選定するためのアルゴリズムを示す図である。図２はデュプロタイプ確定のための判定フロー図である。また、図３は10箇所のＳＮＰを例として、それらの中で、Δ²＝１となるSNP同士をまとめて一つのグループとし、各グループの代表ＳＮＰのみを集めたものを簡略化したハプロタイプと呼ぶ場合を説明する図である。図４は、簡略化されたハプロタイプを構成するＳＮＰ間のすべてのペアについて、連鎖不平衡係数Ｄ'を計算し、Ｄ'≠１のペアのみを選択する場合を説明するための図である。図５は、10箇所のSNPの中で、本アルゴリズムにより選択された、ハプロタイピングをするＳＮＰと、ＳＮＰタイピングをするＳＮＰの模式図である。 Below, based on FIGS. 1-5, embodiment of this invention is described in detail. FIG. 1 is a diagram showing algorithms for selecting pairs for SNP typing and haplotyping, respectively. FIG. 2 is a determination flowchart for determining the duplotype. FIG. 3 shows 10 SNPs as an example. Among them, SNPs with Δ ² = 1 are grouped together into one group, and a haplotype obtained by collecting only representative SNPs of each group is simplified. It is a figure explaining the case of calling. FIG. 4 is a diagram for explaining a case where a linkage disequilibrium coefficient D ′ is calculated for all pairs between SNPs constituting a simplified haplotype and only a pair of D ′ ≠ 1 is selected. FIG. 5 is a schematic diagram of SNPs that perform haplotyping and SNPs that perform SNP typing, which are selected by the present algorithm among 10 SNPs.

上記のように、異なるディプロタイプが同一のジェノタイプを与える場合には、SNP検出のみからではジェノタイプを一意に確定することはできない。そこで上述したハプロタイピングを組み合わせることを考える。ここで、SNPタイピングに関しては多くの簡易な手法が存在するのに対して、ハプロタイピングの手法は煩雑であり、未だ確立された簡易な手法は存在しない。また同一染色体上に存在し、物理的に連結されているという特徴を利用した手法においては、SNP間の距離があまり遠く離れている場合にはハプロタイピングを行うことが難しく、一般には1セットのプライマーでPCR増幅が可能な長さに限定される。よって、ハプロタイピングをSNPタイピングに組み合わせる際には、できるだけ効率的に、少ない数のハプロタイピングでディプロタイプを決められることが望ましい。つまり、ハプロタイピングをすることが有効であるSNPペアを選択する必要がある。 As described above, when different diplotypes give the same genotype, the genotype cannot be uniquely determined only by SNP detection. Therefore, consider combining the haplotyping described above. Here, many simple methods exist for SNP typing, whereas haplotyping methods are complicated, and no simple method has yet been established. In addition, in the method using the feature that they exist on the same chromosome and are physically connected, it is difficult to perform haplotyping when the distance between SNPs is too far away. It is limited to a length that allows PCR amplification with primers. Therefore, when combining haplotyping with SNP typing, it is desirable to be able to determine the diplotype with as few haplotypings as possible. That is, it is necessary to select an SNP pair for which haplotyping is effective.

この選択基準として、ＳＮＰ同士の連鎖不平衡の情報に基づいて、ハプロタイピングをすることが有効であるＳＮＰペアを選択することが好ましい。 As this selection criterion, it is preferable to select an SNP pair for which haplotyping is effective based on information on linkage disequilibrium between SNPs.

更に、各ＳＮＰで構成されたディプロタイプの頻度情報と、ＳＮＰ同士の連鎖不平衡の情報に基づいて、後述するようにＳＮＰペアに対するディプロタイプの判定率の向上割合を算出し、ハプロタイピングによって判定が向上する組み合わせを選抜すると好ましい。各ＳＮＰで構成されたディプロタイプの頻度情報としては例えば既知の情報を利用できる。 Further, based on the frequency information of the diplotypes configured in each SNP and the linkage disequilibrium information between the SNPs, as will be described later, the improvement rate of the diplotype determination rate for the SNP pair is calculated and determined by haplotyping. It is preferable to select a combination that improves. As frequency information of the diplotype configured by each SNP, for example, known information can be used.

以下に、異なるディプロタイプが同一のジェノタイプを与える場合に、ディプロタイプを一意に確定するために有効な、SNPタイピングとハプロタイピングを組み合わせる方法について、具体例を示しながら述べる。 A method for combining SNP typing and haplotyping, which is effective for uniquely determining a diplotype when different diplotypes give the same genotype, will be described below with specific examples.

対象とする領域に、10箇所のSNPがあるとする。これらSNPよりなるハプロタイプに関して、その頻度情報は与えられているものとする。この10箇所のSNP全てのペアに関して、SNP間の連鎖を考慮する。 Assume that there are 10 SNPs in the target area. For the haplotypes composed of these SNPs, frequency information is given. For all 10 SNP pairs, consider linkage between SNPs.

複数のSNPが同一染色体の近い位置に存在する場合には、それらSNP間に連鎖不平衡が存在する。連鎖不平衡とはメンデルの独立の法則の例外であり、二つのSNPが同一の染色体上にある場合には、「異なる座位にあるアレルの子への分配は互いに独立」にはならないことによる。つまり同一の染色体上に存在する2ヶ所のSNPは、異なる座位にあるにも関わらず、片方のアレルが子へ分配される場合に、物理的に連結されているもう一箇所のアレルが同時に子へ分配される確率が高くなることによる。 When multiple SNPs are present at close positions on the same chromosome, linkage disequilibrium exists between these SNPs. Linkage disequilibrium is an exception to Mendel's independent law, because when two SNPs are on the same chromosome, the distribution of alleles at different loci is not independent of each other. In other words, when two SNPs on the same chromosome are located at different loci, but one of the alleles is distributed to the child, the other physically linked allele is simultaneously This is because the probability of being distributed to becomes higher.

連鎖不平衡の尺度にはいろいろなものが提唱されているが、特に頻繁に用いられるものにD'とΔ²がある。これらの具体的な定義を示すために、2箇所のSNP、すなわちSNP1とSNP2を考える。一方のSNP1はアレルa、bをもち、他方のSNP2はアレルc、dを持つとする。アレルの頻度をそれぞれPa、Pb(＝1-Pa)、Pc、Pd(＝1-Pc)とする。次に、2箇所のSNPの組み合わせであるハプロタイプはa-c、a-d、b-c、b-dの4種類があり、それぞれの頻度をPac、Pad、Pbc、Pbdとする。ここでこれらの変数は以下の関係を満たす。 Various measures of linkage disequilibrium have been proposed, and D 'and Δ ² are particularly frequently used. To illustrate these specific definitions, consider two SNPs, namely SNP1 and SNP2. One SNP1 has alleles a and b, and the other SNP2 has alleles c and d. The allele frequencies are Pa, Pb (= 1-Pa), Pc, and Pd (= 1-Pc), respectively. Next, there are four types of haplotypes that are combinations of two SNPs: ac, ad, bc, and bd, and the frequency of each is Pac, Pad, Pbc, and Pbd. Here, these variables satisfy the following relationship.

上記の変数の関係を表１に示す。 Table 1 shows the relationship between the above variables.

連鎖不平衡係数Dは、以下のように定義される。 The linkage disequilibrium coefficient D is defined as follows.

連鎖不平衡が存在しない場合にはハプロタイプの頻度は各SNPのアレル頻度の積で与えられるために、D=0となる。このDを用いて、D'とΔ²を以下のように定義する。 In the absence of linkage disequilibrium, the haplotype frequency is given by the product of the allele frequencies of each SNP, so D = 0. Using this D, D ′ and Δ ² are defined as follows.

Ｄ'は0〜１の値をとるようにDを規格化したものであり、Δ²は2 x 2分割表の独立性を表すX²統計量と、 D ′ is a standardization of D so that it takes a value between 0 and 1, Δ ² is an X ² statistic representing the independence of a 2 × 2 contingency table,

の関係がある。ここでnは染色体の総数を表す。 There is a relationship. Here, n represents the total number of chromosomes.

以下には上記の連鎖不平衡係数（Ｄ，Δ²）を用いたＳＮＰの選択方法を示すが、連鎖不平衡の尺度は様々あるのでこれらの情報に基づいて行われればよい。 The SNP selection method using the above-described linkage disequilibrium coefficient (D, Δ ² ) will be described below. However, there are various scales of linkage disequilibrium, and it may be performed based on such information.

次に、図３に示した10ヶ所のSNPについて考える。まず、上述の連鎖不平衡係数を用いて、図３に示した10ヶ所のSNPの中から、SNPタイピングするもの、ハプロタイピングするものを選択するアルゴリズムを以下に示す。なお、Δ²＝１の関係をもつSNP群がない場合は、次の全ＳＮＰからジェノタイプが等しくなるディプロタイプを選択する工程に進む。 Next, consider the 10 SNPs shown in FIG. First, an algorithm for selecting one to be SNP-typed and one to be haplotyped from the 10 SNPs shown in FIG. 3 using the above-described linkage disequilibrium coefficient is shown below. If there is no SNP group having a relationship of Δ ² = 1, the process proceeds to a step of selecting a diplotype having the same genotype from all the next SNPs.

最初に、連鎖不平衡係数Δ²を考える。Δ²は二箇所のSNPが独立の場合には0となり、完璧に連鎖(Perfect LD)している場合には1になる。ここで完璧な連鎖とは、SNP1のアレルがわかれば、SNP2のアレルが確定し、その逆も言える状態である。このような場合には、SNP1とSNP2のタイピング結果が与える情報は等しいために、双方をタイピングする必要はない。よって、Δ²＝１の関係をもつSNP群を1つにまとめてその中の１つを代表としてタイピングすればよいことになる。この様子を模式的に示したものが図３である。 First, consider the linkage disequilibrium coefficient Δ ² . Δ ² is 0 when the two SNPs are independent, and 1 when the SNP is perfectly linked (Perfect LD). Here, a perfect chain is a state in which if the SNP1 allele is known, the SNP2 allele is determined and vice versa. In such a case, since the information given by the typing results of SNP1 and SNP2 is equal, there is no need to type both. Therefore, the SNP groups having the relationship of Δ ² = 1 may be combined into one, and one of them may be typed as a representative. FIG. 3 schematically shows this state.

代表SNPの選択の仕方はΔ²＝１である限り任意であるが、例えば以下のような指標は検討に値する。
１．フェノタイプとの直接の関係が疑われるもの(エクソン領域でアミノ酸変異を伴うなど)。２．用いるSNPタイピングの手法に有利なもの(ΔTmが大きい、GCコンテンツが40%〜60％、SNP位置から〜20bp程度の近傍に別の多型をもたないなど)。 The method of selecting the representative SNP is arbitrary as long as Δ ² = 1, but for example, the following indicators are worth considering.
1. Suspected direct relationship with phenotype (eg, with amino acid mutation in exon region). 2. Advantageous to the SNP typing method used (large ΔTm, GC content 40% to 60%, no other polymorphism in the vicinity of ~ 20bp from SNP position, etc.).

図３ではグループは隣り合ったもの同士に限定されているが、実際には離れた位置にあるSNP同士がΔ²＝１の関係をもつ場合もあるので、考える領域内に存在するすべてのSNP間でΔ²を計算し、グルーピングを行う必要がある。こうしてグループ化されたSNP群からは代表となる１つのSNPのみを考えることにして、簡略化されたハプロタイプを構成する。図３では10箇所のSNPが、4箇所に減少したことが示されている。これプロセスは、図１におけるステップＳ１０１に対応する。以下、Ｓで示される三桁の番号は図１の各ステップを示す。なお、Δ²＝１がない場合は、そのまま以下のステップに移行する。 In FIG. 3, the groups are limited to those adjacent to each other, but in reality, there are cases where SNPs at distant positions have a relationship of Δ ² = 1, so all SNPs existing in the considered region It is necessary to calculate Δ ² between them and perform grouping. From the grouped SNPs, only one representative SNP is considered to constitute a simplified haplotype. FIG. 3 shows that 10 SNPs have decreased to 4 locations. This process corresponds to step S101 in FIG. Hereinafter, the three-digit number indicated by S indicates each step in FIG. If there is no Δ ² = 1, the process proceeds to the following step as it is.

次に、簡略化されたハプロタイプを用いて、それらを組み合わせたすべてのディプロタイプを求める。（Ｓ１０２）ここで原理的には、4箇所のSNPからは2⁴＝16種類のハプロタイプが可能であるが、与えられているハプロタイプの頻度から以下で検討するべきハプロタイプの数が決まる。ある頻度以上のハプロタイプを検討することになるが、その閾値は、最終的に得たい精度や、頻度分布自体の精度などから判断される。 Then, using the simplified haplotypes, find all the diplotypes that combine them. (S102) In principle, 2 ⁴ = 16 types of haplotypes are possible from the ^four SNPs, but the frequency of the given haplotypes determines the number of haplotypes to be examined below. Although haplotypes with a certain frequency or higher will be considered, the threshold value is determined based on the accuracy desired to be finally obtained and the accuracy of the frequency distribution itself.

このようにして得られたディプロタイプの中で、等しいジェノタイプを与える組み合わせを抽出する。４箇所のSNPからのハプロタイプは例えば下記表２（Ａ）のようにリスト化でき、そこから得られるディプロタイプは例えば下記表２（Ｂ）のようにリスト化できる。 Among the diplotypes obtained in this way, combinations that give the same genotype are extracted. The haplotypes from the four SNPs can be listed, for example, as shown in Table 2 (A) below, and the diplotypes obtained therefrom can be listed, for example, as shown in Table 2 (B) below.

6種類のハプロタイプから生成されるディプロタイプは21種類あるが、その中でジェノタイプが等しい組み合わせが1つある。それは、ディプロタイプがd1=(ATGG, GCGG)と、d2=(ACGG, GTGG)の場合で、等しいジェノタイプ(A/G,T/C,GG,GG)を与える。 There are 21 diplotypes generated from the six haplotypes, of which there is one combination with the same genotype. It gives the same genotype (A / G, T / C, GG, GG) when the diplotype is d1 = (ATGG, GCGG) and d2 = (ACGG, GTGG).

ジェノタイプとディプロタイプが1対１に決まるものに関して、その頻度の合計I₀を計算する。 For those whose genotype and diplotype are determined one-to-one, the total frequency I ₀ is calculated.

ここでAとは、1つのジェノタイプに対して、それを与える複数のディプロタイプが存在する場合のディプロタイプの集合である。（d1、d2∈A）。I₀はジェノタイプのみでディプロタイプを確定できる割合を表している。ここでαはi=jのときは１、i≠jのときは2となる係数を表す。 Here, A is a set of diplotypes when there are a plurality of diplotypes that give it to one genotype. (D1, d2∈A). I ₀ represents the ratio at which the diplotype can be determined only with the genotype. Here, α represents a coefficient that is 1 when i = j and 2 when i ≠ j.

次に、連鎖不平衡係数Ｄ'を考える。Ｄ'は、Ｄを規格化したものであることはすでに述べたが、Δ²と同様二箇所のSNPが独立な場合には0となる。Δ²が１となる場合にはＤ'も１となるが、Δ²が１にならない場合でもＤ'が１となる場合がある。それは、4種類のハプロタイプ頻度Pac、Pad、Pbc、Pbdのうちどれか1つが0になる場合である。これは、二箇所のSNPサイトのいずれにおいても連続した変異が起らず、またサイト間での組み替えが起っていない場合に達成される。 Next, consider the linkage disequilibrium coefficient D ′. As described above, D ′ is a standardized version of D. However, similarly to Δ ² , D ′ becomes 0 when two SNPs are independent. When Δ ² is 1, D ′ is also 1. However, even when Δ ² is not 1, D ′ may be 1. That is when one of the four haplotype frequencies Pac, Pad, Pbc, and Pbd is zero. This is achieved when there is no continuous mutation at any of the two SNP sites and no recombination between sites has occurred.

Ｄ'＝１の場合には、ジェノタイピングの結果のみからハプロタイプを確定することができるため、ハプロタイピングにより得られる追加情報はない。簡略化したハプロタイプを構成するSNPのすべてのペアに対してＤ'を計算し、Ｄ'≠１のペアのみを選択する。（Ｓ１０３）ここではSNPは4箇所で、2-3と3-4でＤ'=1であったとすると、Ｄ'≠1となる1-2、1-3、1-4、2-4の4種のSNPペアが選択される（図４）。 In the case of D ′ = 1, since the haplotype can be determined only from the result of genotyping, there is no additional information obtained by haplotyping. D ′ is calculated for all pairs of SNPs that make up the simplified haplotype, and only pairs with D ′ ≠ 1 are selected. (S103) Here, there are four SNPs, and if D ′ = 1 in 2-3 and 3-4, D ′ ≠ 1, and 1-2, 1-3, 1-4, 2-4 Four SNP pairs are selected (Figure 4).

ここまでのプロセスで、
（１）等しいジェノタイプをもつディプロタイプ（d1、d2∈A）、及び
（２）連鎖不平衡係数Ｄ'≠１となるSNPペア(1-2、1-3、1-4、2-4∈B)
が選択されている。この２群の間で、2のSNPペア間でハプロタイピングをした場合に、1のディプロタイプを判別することができるかを判定する。できない場合には×を、できる場合にはそのディプロタイプの頻度を記載する（Ｓ１０５）。 In the process so far,
(1) diplotypes with equal genotypes (d1, d2∈A), and (2) SNP pairs (1-2, 1-3, 1-4, 2-4 with linkage disequilibrium coefficient D ′ ≠ 1) ∈B)
Is selected. When haplotyping is performed between two SNP pairs between the two groups, it is determined whether one diplotype can be discriminated. If not possible, x is written, and if possible, the frequency of the diplotype is written (S105).

ここで、上記ディプロタイプの判定ができるか否か（Ｓ１０４）について詳しく説明する。SNPペア1-2のジェノタイプは表に示したようにA/G、T/Cで等しいが、SNPペア1-2のハプロタイピングにより、d1は（A-T,G-C）、d2は(A-C,G-T)を持つことがわかる。よってSNPペア1-2のハプロタイピングを行なえば、d1であるかd2であるかの判定が可能となる。これに対してSNPペア1-3のハプロタイピングから得られる情報は、d1は(A-G,G-G)、d2は(A-G,G-G)であり両者は等しい。つまり、SNPペア1-3のハプロタイピングをd1、d2共に等しい結果を与えるために、d1とd2を判定することはできない。d1とd3のジェノタイプはA/G、GGであり、SNP３に関してはホモである。つまり片方がホモの場合には、ハプロタイピングを行なっても、ジェノタイプ以上の情報を得ることはできない。両者がヘテロである場合でも、ハプロタイピングによって判別が可能でない場合もあるので、上記のようにディプロタイプd1、d2の該当部分を抜き出し(SNPペア1-2の場合には（A-T,G-C）とA-C,G-T))、判別が可能であるか否かを確認する必要がある。 Here, whether or not the diplotype can be determined will be described in detail (S104). As shown in the table, the genotypes of SNP pair 1-2 are equal to A / G and T / C. However, due to haplotyping of SNP pair 1-2, d1 is (AT, GC) and d2 is (AC, GT ) Therefore, if haplotyping of SNP pair 1-2 is performed, it is possible to determine whether it is d1 or d2. On the other hand, in the information obtained from the haplotyping of the SNP pair 1-3, d1 is (A-G, G-G), d2 is (A-G, G-G), and they are equal. That is, d1 and d2 cannot be determined in order to give the same result for haplotyping of SNP pair 1-3 for both d1 and d2. The genotypes of d1 and d3 are A / G, GG, and homozygous for SNP3. In other words, when one is homozygous, it is not possible to obtain more information than genotype even if haplotyping is performed. Even if both are heterogeneous, it may not be possible to distinguish by haplotyping, so extract the relevant part of diplotypes d1 and d2 as described above (in the case of SNP pair 1-2 (AT, GC) and AC, GT)), it is necessary to check whether or not discrimination is possible.

判定が可能な場合には、それによって確定する割合(ディプロタイプの頻度)をΔId1(1,2)として求める。全ての組み合わせが判定否である場合、対象とするＳＮＰペアではハプロタイピングとSNPタイピングの組み合わせによる効果を享受できないので、選択するＳＮＰ群を変更する。すべての判定が終了した後、各SNPペアにおけるΔIを以下のように計算する。（Ｓ１０６） If the determination is possible, the ratio (the frequency of the diplotype) determined thereby is obtained as ΔId1 (1,2). When all the combinations are determined to be unsuccessful, the target SNP pair cannot receive the effect of the combination of haplotyping and SNP typing, so the selected SNP group is changed. After all determinations are completed, ΔI in each SNP pair is calculated as follows. (S106)

ΔI(i,j)は、SNPペア(i,j)のハプロタイピングによってディプロタイプの判定率がどの程度上昇するかを示している。ここでは、 ΔI (i, j) indicates how much the diplotype determination rate is increased by haplotyping of the SNP pair (i, j). here,

である。よって、ΔI(1,3) =ΔI(1,4) =ΔI(2,4) =0となり、SNPペア1-2間のハプロタイピングのみが有効であることを示している。 It is. Therefore, ΔI (1,3) = ΔI (1,4) = ΔI (2,4) = 0, indicating that only haplotyping between SNP pairs 1-2 is effective.

次に、上記で有効とされたSNPペア1-2間でのハプロタイピングの効率を判定する。ハプロタイピングにはいくつかの異なる手法があるが、多くのものが、「物理的に連結している状態がPCRによって保存されること」を利用している。この場合には、ゲノムもしくはｍRNAにおいてSNP間の距離が、一度のPCRで増幅可能な距離である必要がある。よって、ハプロタイピングの効率の判断基準は、SNP1とSNP2の物理的距離が、検体処理後にゲノムもしくはｍRNAにおいて同一鎖上となる処理法の効率による。 Next, the efficiency of haplotyping between the SNP pairs 1-2 validated as described above is determined. There are several different approaches to haplotyping, but many take advantage of the fact that physically linked states are preserved by PCR. In this case, the distance between SNPs in the genome or mRNA needs to be a distance that can be amplified by one PCR. Therefore, the criteria for determining the efficiency of haplotyping depend on the efficiency of the processing method in which the physical distance between SNP1 and SNP2 is the same strand in the genome or mRNA after sample processing.

SNPペア1-2を構成するSNPは、図３に示したように、Δ²＝１によってそれぞれ3箇所、2箇所のSNPがグループ化された中の代表SNPである。Δ²＝１であれば、グループ中からどのSNPを選択するかは任意であることは先に示したが、これに加えて、「ハプロタイピングの効率」も検討する。一般に物理的に近い距離である方がハプロタイピングには有利であり、物理的距離が数百kbpに上る場合には、同一鎖上に二箇所のSNPをもつ増幅産物を得ることは難しい。好ましい閾値を、ハプロタイピングが可能であるか否かの閾値を5kbpとしたが、この値は用いられる増幅法によって変更可能なパラメータであることに注意する必要がある。 As shown in FIG. 3, the SNPs constituting the SNP pair 1-2 are representative SNPs in which three SNPs and two SNPs are grouped by Δ ² = 1, respectively. As described above, if Δ ² = 1, it is arbitrary to select which SNP from the group, but in addition to this, “efficiency of haplotyping” is also examined. In general, a physical distance is more advantageous for haplotyping, and when the physical distance is several hundred kbp, it is difficult to obtain an amplification product having two SNPs on the same strand. It should be noted that the preferred threshold value is 5 kbp as to whether haplotyping is possible, but this value is a parameter that can be changed depending on the amplification method used.

グループ内で選択可能なSNPに関して、遺伝子上でのその物理的距離が５００ｂｐ程度であれば、どの代表ＳＮＰペアを選択してもハプロタイピングは可能である。この場合にハプロタイピングの効率を左右するのは、各SNPのタイピング性能に依存する。一般に、野生型と変異型の判別においてハイブリダイゼーション法を用いる場合には、野生型ターゲットに野生型プローブが結合する場合と、野生型ターゲットに変異型のプローブが結合する場合のΔTmの差が重要になることが知られている。よって、物理的距離が500kbp以下となるSNPペアが複数存在する場合には、その中から、SNP箇所を中心に左右10bp（合計21bp)のプローブを仮定した場合のΔTmを計算し、ΔTmが最も大きくなるペアを選択する。
上述のような選択基準による評価を行うことでハプロタイピング用のＳＮＰを選択していけばよいが、具体的な方法として図１のようにΔI(i',j')が最も大きなペア間でのハプロタイピングの可能性をまず評価する（Ｓ１０７）。選択されたSNPペアに対して、その物理的距離が閾値（今は5kbp）を超えてしまう場合には、対応する(i,j)に対してハプロタイピングが可能なSNPペアを選択することができない。 With respect to SNPs that can be selected within a group, haplotyping is possible regardless of which representative SNP pair is selected as long as the physical distance on the gene is about 500 bp. In this case, the efficiency of haplotyping depends on the typing performance of each SNP. In general, when using the hybridization method to discriminate between wild type and mutant type, the difference in ΔTm between the wild type target binding to the wild type target and the mutant type probe binding to the wild type target is important. It is known to become. Therefore, if there are multiple SNP pairs with a physical distance of 500 kbp or less, calculate ΔTm assuming a 10 bp left and right probe (total 21 bp) centered on the SNP point, and ΔTm is the most Select a pair that grows.
The SNP for haplotyping may be selected by performing the evaluation based on the selection criteria as described above. However, as a specific method, between pairs having the largest ΔI (i ′, j ′) as shown in FIG. First, the possibility of haplotyping is evaluated (S107). If the physical distance of the selected SNP pair exceeds the threshold (currently 5 kbp), the SNP pair that can be haplotyped for the corresponding (i, j) can be selected. Can not.

この場合には、(i,j)をB（例えば表３の横列）の要素から除き（Ｓ１０８）、ΔI(i',j')が次に大きなペア(i',j')を選んで同じプロセスを繰り返せばよい。 In this case, (i, j) is removed from the elements of B (for example, the rows in Table 3) (S108), and the pair (i ', j') with the next largest ΔI (i ', j') is selected. You can repeat the same process.

図５に、選択結果の模式図を示した。代表SNP1-2を構成するグループからは、対応するSNP間の距離が最も近くなる2箇所のSNP(a,b)を選択してハプロタイピングを行う。代表SNP3(c)はΔ²＝１によってグループ化されたSNPが存在しないためにそのままSNPタイピングを行い、代表SNP4(d)からはグループ内から1つ選んでSNPタイピングを行う。 FIG. 5 shows a schematic diagram of the selection result. From the group constituting the representative SNP 1-2, haplotyping is performed by selecting two SNPs (a, b) at which the distance between the corresponding SNPs is closest. Since there is no SNP grouped by Δ ² = 1 in the representative SNP3 (c), SNP typing is performed as it is, and one SNP4 (d) is selected from the group and SNP typing is performed.

ここでの例は、ジェノタイプが等しいディプロタイプのセットがd1,d2のみであるので、ここでアルゴリズムが終了する。しかし、dkが多数ある場合には、Aからd1,d2を、Bから(i,j)を除き（Ｓ１０９、Ｓ１１０）、再びΔIを計算する。そして同じプロセスをAの要素もしくはBの要素がなくなるまで繰り返す（Ｓ１０６〜Ｓ１０９）。A（例えば表３の縦列）の要素がすべてなくなれば選択したＳＮＰをハプロタイピング用と確定し（Ｓ１１１）、すべてのディプロタイプが判定できることになる。Aの要素が残っているにもかかわらずBの要素がなくなる場合には、相の確定できないディプロタイプが存在することになる。これはある(i,j)に対して選択可能なすべてのSNPペアの物理的距離が閾値以上となり、(i,j)に対するハプロタイピングができない場合に該当する。繰り返しになるが、この閾値は用いる検体処理法に応じて変化するパラメータであり、以下では5kbpを閾値としている。 In this example, since only d1 and d2 have a set of diplotypes with the same genotype, the algorithm ends here. However, if there are many dk, A to d1, d2 and B from (i, j) are excluded (S109, S110), and ΔI is calculated again. The same process is repeated until there is no A element or B element (S106 to S109). When all the elements of A (for example, the column in Table 3) disappear, the selected SNP is determined for haplotyping (S111), and all diplotypes can be determined. If the B element disappears even though the A element remains, there is a diplotype whose phase cannot be determined. This corresponds to the case where the physical distances of all SNP pairs that can be selected for a certain (i, j) are equal to or greater than the threshold and haplotyping cannot be performed for (i, j). Again, this threshold is a parameter that varies depending on the sample processing method used, and in the following, 5 kbp is set as the threshold.

次に、上記アルゴリズムによって選択されたSNPに対して、SNPタイピングとハプロタイピングを同時に行う構成を示す。 Next, a configuration is shown in which SNP typing and haplotyping are simultaneously performed on the SNP selected by the above algorithm.

ＤＮＡチップを用いてSNPタイピングを行うためには、SNP箇所を含むプローブで野生型、変異型の双方にフルマッチとなる２種類プローブを基板上に固定したDNAチップを作製する。検体のSNP箇所を含むプライマーを設計し、PCR法を用いて増幅を行う。このとき同時に、蛍光標識(例えばCy3)によって、増幅産物への標識を行う。このようにして生成された増幅産物とDNAチップのハイブリダイゼーションを行うことによって、フルマッチとミスマッチのハイブリダイゼーション強度の差を、標識物のシグナル強度の差として判定することができる(図６)。 In order to perform SNP typing using a DNA chip, a DNA chip is prepared by immobilizing on a substrate two types of probes that fully match both wild-type and mutant-type probes that contain SNP sites. Design a primer containing the SNP site of the sample and perform amplification using the PCR method. At the same time, the amplification product is labeled with a fluorescent label (eg, Cy3). By performing hybridization between the amplification product thus generated and the DNA chip, a difference in hybridization intensity between the full match and mismatch can be determined as a difference in signal intensity of the label (FIG. 6).

ハプロタイピングに関しては、例えばアレル特異的PCRとDNAチップを用いた方法を用いることができる。この手法は、ハプロタイピングを行いたいSNPペア(a,b)の一方に対して、野生型、変異型それぞれに異なる色素(Cy3,Cy5)で標識をしたアレル特異的プライマーを設定して、もう片方のSNPを増幅産物内に含む形でPCRを行う。もう片方のSNPに対応する野生型、変異型2種類のプローブを基板上に固定したDNAチップを作製し、上記アレル特異的PCRを行った産物とのハイブリダイゼーション反応を行う。蛍光検出の結果、色素の種類とハイブリダイゼーションの起る位置からハプロタイプを特定する手法である(図７)。他にも、プローブの混合によってハプロタイプを検出する方法(米国特許6306643B１号明細書)などを用いることができる。 For haplotyping, for example, a method using allele-specific PCR and a DNA chip can be used. In this method, allele-specific primers labeled with different dyes (Cy3, Cy5) are set for one of the SNP pairs (a, b) for which haplotyping is desired. Perform PCR with one SNP in the amplification product. A DNA chip in which two types of wild-type and mutant-type probes corresponding to the other SNP are immobilized on a substrate is prepared, and a hybridization reaction is carried out with the product subjected to the allele-specific PCR. As a result of fluorescence detection, the haplotype is identified from the type of dye and the position where hybridization occurs (FIG. 7). In addition, a method of detecting a haplotype by mixing probes (US Pat. No. 6,063,463 B1) can be used.

上記SNPタイピングとハプロタイピングの手法は、双方共にDNAチップを用いて行うことができる。よって、上記SNPタイピング用に設計されたプローブと、ハプロタイピング用に設計されたプローブを同一基板上に固定し、同時にDNAチップ上でハイブリダイゼーションを行い、蛍光検出することができる(図８)。 Both the SNP typing and haplotyping methods can be performed using a DNA chip. Therefore, the probe designed for SNP typing and the probe designed for haplotyping can be immobilized on the same substrate and simultaneously hybridized on a DNA chip to detect fluorescence (FIG. 8).

このように多型を選択してSNPタイピング、ハプロタイピングを組み合わせることにより、従来のSNPタイピングのみを用いる手法では判別できなかった、等しいジェノタイプを与える２つのディプロタイプd1,d2を判別することが可能になる。 In this way, by selecting polymorphism and combining SNP typing and haplotyping, it is possible to discriminate two diplotypes d1 and d2 that give equal genotypes, which could not be discriminated by the conventional method using only SNP typing. It becomes possible.

ここでd1,d2の判別は、SNPタイピングのみを行う手法では、タイピング数を増やしても(4箇所から10箇所にしても)達成しえないことが重要である。 Here, it is important that the determination of d1 and d2 cannot be achieved even if the number of typing is increased (even if the number is 4 to 10) in the method of performing only SNP typing.

次に、図９を参照して、本発明により等しいジェノタイプを与える複数のディプロタイプの判別を可能とするために、最も効率のよい多型を選択するアルゴリズムを実現するコンピューターシステムについて説明する。 Next, with reference to FIG. 9, a computer system that implements an algorithm for selecting the most efficient polymorphism to enable discrimination of a plurality of diplotypes that give equal genotypes according to the present invention will be described.

図９は、本実施形態によるハプロタイプ推定が適用される情報処理装置の構成を示すブロック図である。本実施形態のハプロタイプ推定方法は、中央処理装置（ＣＰＵ）91，記憶装置92、ＲＡＭ93、入出力装置94がバス95により接続された装置に実装される。すなわち、一般的なパーソナルコンピュータ、ワークステーション等に実装可能である。 FIG. 9 is a block diagram illustrating a configuration of an information processing apparatus to which haplotype estimation according to the present embodiment is applied. The haplotype estimation method of this embodiment is implemented in a device in which a central processing unit (CPU) 91, a storage device 92, a RAM 93, and an input / output device 94 are connected by a bus 95. That is, it can be mounted on a general personal computer, a workstation or the like.

図９において、中央処理装置（ＣＰＵ）91は、記憶装置92に保存された本実施形態のプログラムや、本実施形態のプログラム実行に必要なデータ等を、ＲＡＭ93上に一時的に記憶し、本実施形態のプログラムの実行を行う。入出力装置94は、ディスプレイ、キーボード、ポインティングデバイス、印刷装置、ネットワークインターフェイス等を含み、本実施形態のプログラムの実行に際して、使用者とのインタラクションを行う。多くの場合、本実施形態のプログラムの実行のトリガは、この入出力装置94を介して、使用者が行う。また、使用者の実行結果参照や、プログラム実行時のパラメータ制御をこの入出力装置94
を介して行う。 In FIG. 9, a central processing unit (CPU) 91 temporarily stores the program of the present embodiment stored in the storage device 92, data necessary for program execution of the present embodiment, and the like on the RAM 93, The program of the embodiment is executed. The input / output device 94 includes a display, a keyboard, a pointing device, a printing device, a network interface, and the like, and performs interaction with the user when executing the program of the present embodiment. In many cases, the execution of the program of the present embodiment is triggered by the user via the input / output device 94. Also, the input / output device 94 is used for referring to the execution result of the user and for parameter control during program execution.
Do through.

図10は、本実施形態によるハプロタイプ推定を行うプログラムを説明するためのフローチャートである。各ステップは、図９に示す記憶装置92に保存されているプログラムが、ＲＡＭ93上に展開され、中央処理装置（ＣＰＵ）94により実行される。データの入出力等は、適宜入出力装置94を介して行なわれる。 FIG. 10 is a flowchart for explaining a program for performing haplotype estimation according to the present embodiment. In each step, a program stored in the storage device 92 shown in FIG. 9 is expanded on the RAM 93 and executed by the central processing unit (CPU) 94. Data input / output and the like are performed through the input / output device 94 as appropriate.

101は、入出力装置94からハプロタイプを入力するステップである。102は、連鎖不平衡係数等の指標を用いて簡略化されたハプロタイプを構成するステップである。203は、102によって構成された簡略化されたハプロタイプから、ハプロタイピングをする箇所を決定し、結果を出力するステップである。 101 is a step of inputting a haplotype from the input / output device 94. Reference numeral 102 denotes a step of constructing a simplified haplotype using an index such as a linkage disequilibrium coefficient. 203 is a step of determining a haplotyping location from the simplified haplotype constituted by 102 and outputting the result.

次に、本発明の実施例について説明する。
（実施例１）
上記特許文献１において用いられている、SAA遺伝子に関するハプロタイプデータを用いて本発明の提案する手法の有用性を示す。但し、上記文献内ではSAA遺伝子のハプロタイプとして、SAA1とSAA2の2遺伝子にまたがって考察しているが、本発明ではSAA1遺伝子の5箇所のSNPによるハプロタイプのみを用いる。 Next, examples of the present invention will be described.
Example 1
The usefulness of the technique proposed by the present invention will be described using haplotype data relating to the SAA gene used in Patent Document 1. However, in the above document, the SAA gene haplotype is considered to span two genes, SAA1 and SAA2, but in the present invention, only the haplotypes of the SAA1 gene based on five SNPs are used.

SAA1の5箇所のSNPは1. -61C>G、2. -13T>C、3. -2G>A、4. 2995C>T、5. 3010C>Tであり、ハプロタイプ頻度は以下のように与えられる。ここで上記5SNPに関しては、SNP間でΔ²＝１となるものは存在しなかった。 The five SNPs in SAA1 are 1. -61C> G, 2.-13T> C, 3. -2G> A, 4. 2995C> T, 5. 3010C> T, and the haplotype frequency is given as follows: It is done. Here, with respect to the 5SNP, there was no SNP between which Δ ² = 1.

次に、上記10種類のハプロタイプ(累積ハプロタイプ頻度99.8％)より生成されるすべてのディプロタイプを求める。ディプロタイプは55種類あるが、その中で、等しいジェノタイプを与えるディプロタイプを以下に示す。 Next, all diplotypes generated from the 10 types of haplotypes (cumulative haplotype frequency 99.8%) are obtained. There are 55 diplotypes, of which diplotypes that give equal genotypes are listed below.

SAA1では、5種類のジェノタイプに対して、複数のディプロタイプが対応するために、11種類のディプロタイプが判別できない(d1〜d11)。例えばd1とd2は双方共に(CC,T/C,GG,C/T,CC)というジェノタイプを与えるし、d3、d4、d5はすべて(C/G,T/C,GG,CC,C/T)というジェノタイプを与える。 In SAA1, since a plurality of diplotypes correspond to 5 types of genotypes, 11 types of diplotypes cannot be distinguished (d1 to d11). For example, d1 and d2 both give a genotype of (CC, T / C, GG, C / T, CC), and d3, d4, d5 are all (C / G, T / C, GG, CC, C / T) gives the genotype.

このように、1つのジェノタイプに複数のディプロタイプが対応して判定ができない場合には、一般に頻度の高い方のディプロタイプで代表する。そうするとd1、d3、d6、d8、d10が採用されることになり、d2、d4、d5、d7、d9、d11の値は反映されない。この場合に、実際に存在しても認識されずに切り捨てられてしまうディプロタイプの割合は0.954％となり、100人に一人となる。 As described above, when a plurality of diplotypes cannot be determined in correspondence with one genotype, the diplotype having the higher frequency is generally represented. Then, d1, d3, d6, d8, and d10 are adopted, and the values of d2, d4, d5, d7, d9, and d11 are not reflected. In this case, the percentage of diplotypes that actually exist but are not recognized and discarded is 0.954%, which is one in 100 people.

次に、SAA1の5箇所のSNP間すべてのペアに対して、連鎖不平衡係数D'を計算する。ペアの総数は10種(1-2、1-3、1-4、1-5、2-3、2-4、2-5、3-4、3-5、4-5)であるが、計算の結果D'=1となるのはSNPペア1-4、2-3、4-5であることが分る。よってD'=1とならないペアは(1-2、1-3、1-5、2-4、2-5、3-4、3-5)の7種である。 Next, a linkage disequilibrium coefficient D ′ is calculated for all pairs between the five SNPs of SAA1. The total number of pairs is 10 (1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5) It can be seen that the calculation result D ′ = 1 is the SNP pairs 1-4, 2-3, and 4-5. Therefore, there are seven pairs (1-2, 1-3, 1-5, 2-4, 2-5, 3-4, 3-5) that do not satisfy D ′ = 1.

11種類の等しいジェノタイプに対応するディプロタイプと、7種のD'≠1となるSNPペアに対して、各SNPペアの間でハプロタイピングを行った場合に、ディプロタイプの判定が可能か否かをまとめた表を以下に示す。 Whether diplotype can be determined when haplotyping is performed between each SNP pair for 11 diplotypes corresponding to the same genotype and 7 SNP pairs with D '≠ 1 A table summarizing these is shown below.

表より、2-4のハプロタイピングが最も有効であることが示される。2-4のハプロタイピングを行うことによりd1、d2、d6、d7、d8、d9の相が確定する。 The table shows that 2-4 haplotyping is most effective. Phases d1, d2, d6, d7, d8, and d9 are determined by performing haplotyping of 2-4.

次に、Aから上記確定された6つのディプロタイプ(d1、d2、d6、d7、d8、d9)を除き、Bからハプロタイピングを行うSNPペア2-4を除いて作成した表を以下に示す。 Next, a table created by excluding the above 6 confirmed diplotypes (d1, d2, d6, d7, d8, d9) from A and excluding SNP pair 2-4 performing haplotyping from B is shown below. .

表より、次に有効なのは2-5のハプロタイピングであることが示される。これによりd3の相が確定する。 The table shows that 2-5 haplotyping is the next most effective. This establishes the d3 phase.

以下同様に、Aからd3を除き、Bから2-5を除いて作成した表を示す。 Hereinafter, similarly, a table created by excluding A to d3 and excluding B to 2-5 is shown.

表より、SNPペア1-3、3-4、3-5のハプロタイピングは等しいΔIを与えるために同等の効果があることがわかる。ここで1-3はSNPペア間の距離が＜500bpとなって他のペアよりも高い効率が得られるために、1-3のハプロタイピングを行い、d10、d11の相を確定する。同様に、SNPペア1-2、1-5でタイピングを行ってもd4、d5の相を確定することができるが、SNPペア間の距離が＜500bpとなる1-2を選択する。 From the table, it can be seen that haplotyping of SNP pairs 1-3, 3-4, and 3-5 has the same effect to give equal ΔI. Here, since the distance between SNP pairs is <500 bp and higher efficiency is obtained than other pairs, 1-3 haplotyping is performed and the phases d10 and d11 are determined. Similarly, even if typing is performed with SNP pairs 1-2 and 1-5, the phases d4 and d5 can be determined, but 1-2 is selected such that the distance between the SNP pairs is <500 bp.

以上より、SAA1のディプロタイプを求めるための構成としては、
「2-4(-13T>C : 2995C>T)、2-5(-13T>C : 3010C>T)、1-2(-61C>T : -13T>C)、1-3(-61C>T : -2G>A) のハプロタイピング」
を行うことですべての相を確定することができる。今回検討したSAA1の5箇所のSNPに関しては、SNPタイピングのみが必要となるSNPはなかった。 From the above, as a configuration for obtaining the diplotype of SAA1,
`` 2-4 (-13T> C: 2995C> T), 2-5 (-13T> C: 3010C> T), 1-2 (-61C> T: -13T> C), 1-3 (-61C > T: -2G> A) Haplotyping "
All phases can be confirmed by performing. Regarding the five SNPs of SAA1 examined this time, there was no SNP that required only SNP typing.

次に実際にDNAマイクロアレイを用いてSAA1のハプロタイプを検出する方法を示す。ここで示される手法は、米国特許6306643号明細書で示された方法を用いているが、この手法に限定されるものではない。はじめにSAA1領域全体の増幅(5箇所のSNPを含む)を行う。その際に用いたプライマーを以下に示す。 Next, a method for actually detecting the haplotype of SAA1 using a DNA microarray will be described. The method shown here uses the method shown in US Pat. No. 6,036,463, but is not limited to this method. First, the entire SAA1 region is amplified (including 5 SNPs). The primer used in that case is shown below.

ここで、Tmの計算時の条件を以下に示す。 Here, conditions for calculating Tm are shown below.

また、それぞれのSNPに対応したプローブを以下のように設計した。 Moreover, the probe corresponding to each SNP was designed as follows.

次に、上記プローブを用いてハプロタイピング用のプローブを作成する。以下に、米国特許6306643号明細書の方法によるハプロタイピングの場合について、SNPペア2-4に対して具体的に説明する。 Next, a haplotyping probe is created using the probe. Hereinafter, the case of haplotyping by the method of US Pat. No. 6,306,463 will be specifically described with respect to the SNP pair 2-4.

本実施例においては上記明細書のように基板上合成は行わず、液相で合成した後精製した5’末端地オール標識オリゴ・プローブ4種（-13C>Tの野生型、変異型と、2995C>Tの野生型、変異型）を以下のような組み合わせで等量ずつ混合する。
（１）-13C>T 野生型＋ 2995C>T 野生型
（２）-13C>T 野生型＋ 2995C>T 変異型
（３）-13C>T 変異型＋ 2995C>T 野生型
（４）-13C>T 変異型＋ 2995C>T 変異型
混合した溶液を特開平11-187900号公報に示された方法で基板上に吐出し固定する。 In this example, the synthesis on the substrate was not performed as in the above specification, and 4 types of 5 'terminal all-labeled oligo probes purified in the liquid phase and then purified (-13C> T wild type, mutant type, and 2995C> T wild type and mutant type) are mixed in equal amounts in the following combinations.
(1) -13C> T wild type + 2995C> T wild type (2) -13C> T wild type + 2995C> T mutant type (3) -13C> T mutant type + 2995C> T wild type (4) -13C > T variant + 2995C> T The variant mixed solution is discharged onto the substrate and fixed by the method disclosed in JP-A-11-187900.

もう一つの方法では、上記と同様の組み合わせの配列を用いるが、プローブを混合するのではなく、両配列を連続してもつプローブを合成する。例えば上記１の例では、-13C>T 野生型の配列をもち、連続して2995C>T 野生型の配列をもつ37bpのプローブ(5’末端地オール標識)を合成し、特開平11-187900号公報に示された方法で基板上に吐出し固定する。 In another method, the same combination of sequences as described above is used, but instead of mixing the probes, a probe having both sequences in succession is synthesized. For example, in the above example 1, a 37 bp probe (5 ′ terminal all-label) having a sequence of −13C> T wild type and having a sequence of 2995C> T wild type was synthesized, and JP-A-11-187900 It discharges and fixes on a board | substrate by the method shown by gazette.

検体は、PSC（Pharma SNP Consortium）由来の抽出DNAをヒューマンサイエンス研究資源バンク（HSRRB）より購入した。本実施例で用いるにあたり、抽出ゲノムを上記SAA1 Forward、Reverseのプライマーを用いて増幅した産物に関して、シーケンサー（ABI Prism 3100 Genetic Analyzer）により配列を取得した。10検体分の上記5箇所のSNPに関する、シーケンサーによるジェノタイプ結果と、それにより判定されるハプロタイプの結果を以下に示す。表に示したように、10検体中5検体のハプロタイプが判定できず、上記2〜3種類のディプロタイプの可能性があることがわかった。 Samples were purchased from the Human Science Research Resource Bank (HSRRB) extracted DNA from PSC (Pharma SNP Consortium). When used in this example, the sequence of the product obtained by amplifying the extracted genome using the SAA1 Forward and Reverse primers was obtained by a sequencer (ABI Prism 3100 Genetic Analyzer). The genotype results by the sequencer and the haplotype results determined by the five SNPs for 10 samples are shown below. As shown in the table, haplotypes of 5 samples out of 10 samples could not be determined, and it was found that there is a possibility of the above two to three diplotypes.

本実施例では、これらの検体に対するハプロタイピングを行う。 In this embodiment, haplotyping is performed on these specimens.

以下に、DNAチップの作製から検出までの一連の流れをより詳細に示す。ここではプローブ核酸をインクジェット方式（特開平11-187900号公報）で基板担体上に固定化したDNAマイクロアレイを用いた実施例について述べるが、この方法に限定されるものではない。 Hereinafter, a series of flow from preparation of a DNA chip to detection will be described in more detail. Here, an example using a DNA microarray in which a probe nucleic acid is immobilized on a substrate carrier by an ink jet method (Japanese Patent Laid-Open No. 11-187900) will be described, but the present invention is not limited to this method.

(マイクロアレイの構成)
図６にマイクロアレイ上にプローブが固定されている様子を示す。プローブの固定は特開平11-187900号公報に詳細が示されているように、表面処理を行った基板にインクジェットにより5’末端をチオール化されたオリゴDNAを吐出する方法を用いる。ここでプローブとなるDNAは25塩基程度の長さをもち、(株)ベックスから購入したものである。 (Configuration of microarray)
FIG. 6 shows how the probe is fixed on the microarray. As described in detail in Japanese Patent Application Laid-Open No. 11-187900, the probe is immobilized using a method of discharging oligo DNA having a 5 ′ end thiolated by inkjet onto a surface-treated substrate. Here, the DNA used as a probe has a length of about 25 bases and was purchased from Bex Corporation.

(ターゲットの準備)
検体由来の核酸の増幅反応(PCR)の例を以下に示す。増幅反応液組成の例を以下に示す。ここでForward/Reverse Primerの配列は上に示したものだが、5’末端Cy3標識のF Primer＋5’末端リン酸化のR Primerの組み合わせと、5’末端Cy3標識のR Primer＋5’末端リン酸化のF Primerの組み合わせの二種類でPCRを行った。これにより、その後の片鎖化処理により、Cy3標識された鎖のみが残り、リン酸化された鎖は分解され、一本鎖のターゲットとハイブリダイゼーション反応を行うことになる。 (Preparing the target)
An example of amplification reaction (PCR) of nucleic acid derived from a specimen is shown below. An example of the amplification reaction solution composition is shown below. Here, the forward / reverse primer sequence is as shown above, but the combination of 5'-end Cy3-labeled F Primer + 5'-end phosphorylated R Primer and 5'-end Cy3-labeled R Primer + 5'-end phosphorylated F Primer PCR was performed with two types of combinations. As a result, only the Cy3-labeled chain remains in the subsequent single-stranding treatment, and the phosphorylated chain is decomposed to perform a hybridization reaction with the single-stranded target.

--PCR溶液組成--
Takara LA Taq：0.25μl
Genome DNA(50ng/μl)：1μl
Forward/Reverse Primer(1μM)：3 μl
dNTP (2.5mM)：4.5μl
buffer Ｉ：12.5μl
H₂O：0.75μl
Total：25μl
上記組成の反応液を図１５に示す温度サイクルのプロトコルに従って、サーマルサイクラーを用い増幅反応を行った。反応終了後、電気泳動（BioAnalyzer: Agilent社製）により、増幅産物の定量を行う。 --PCR solution composition--
Takara LA Taq: 0.25μl
Genome DNA (50ng / μl): 1μl
Forward / Reverse Primer (1μM): 3 μl
dNTP (2.5 mM): 4.5 μl
buffer I: 12.5 μl
H ₂ O: 0.75 μl
Total: 25μl
The reaction solution having the above composition was subjected to an amplification reaction using a thermal cycler according to the temperature cycle protocol shown in FIG. After completion of the reaction, the amplification product is quantified by electrophoresis (BioAnalyzer: Agilent).

(片鎖化処理)
前述の増幅したPCR産物を用い、片鎖化処理を行って一本鎖のターゲットをつくる。反応は上記の定量結果を参考に、溶液中に50ngの増幅産物を含むように調整する。またコントロールとして、Strandase λ Exonucleaseの代わりに、Strandase λ Exonucleaseを100倍に希釈した溶液を加えて反応を行う。 (Single-strand processing)
Using the amplified PCR product described above, a single-stranded target is produced by performing a single-stranded treatment. The reaction is adjusted so that 50 ng of amplification product is contained in the solution with reference to the above quantitative results. As a control, instead of Strandase λ Exonuclease, a solution obtained by diluting Strandase λ Exonuclease 100 times is added to carry out the reaction.

--方鎖化反応溶液組成--
PCR産物50ng＋H₂O：8μl
10xStrandase Buffer：1μl
Strandase λ Exonuclease：1μl
Total：10μl
上記組成の反応液を37℃で20分保持した後、精製用カラム（QUIAGEN QIAquick PCR Purification Kit: QUIGEN社製）を用いてプライマー等を除去する。精製終了後、電気泳動（BioAnalyzer: Agilent社製）により産物の定量を行う。このとき、片鎖化反応ができた場合にはシグナルは観察されない。よって、上記で等量のPCR産物を加え酵素の代わりに酵素を100倍に希釈した溶液を加えた、コントロール反応の産物量(モル濃度)と等しい量が片鎖化されて存在するとして、以下のハイブリダイゼーション反応を行う。 --Stranding reaction solution composition--
PCR product 50ng + H ₂ O: 8μl
10xStrandase Buffer: 1μl
Strandase λ Exonuclease: 1μl
Total: 10μl
After holding the reaction solution having the above composition at 37 ° C. for 20 minutes, primers and the like are removed using a purification column (QUIAGEN QIAquick PCR Purification Kit: manufactured by QUIGEN). After purification, the product is quantified by electrophoresis (BioAnalyzer: Agilent). At this time, no signal is observed when the single-stranded reaction is completed. Therefore, if an equal amount of PCR product is added and a solution in which the enzyme is diluted 100-fold instead of the enzyme is added, and an amount equal to the amount of product (molar concentration) of the control reaction is present in a single-stranded form, Perform the hybridization reaction.

（ハイブリダイゼーション）
水切りしたＤＮＡマイクロアレイをハイブリダイゼーション装置（Genomic Solutions Inc. Hybridization Station）にセットし、以下に示すハイブリダイゼーション溶液、条件でハイブリダイゼーション反応を行う。ハイブリダイゼーション装置を用いずに、スライドガラスとハイブリダイゼーション用のチャンバーを用いてマニュアルで反応を行ってもよい。 (Hybridization)
The drained DNA microarray is set in a hybridization apparatus (Genomic Solutions Inc. Hybridization Station), and a hybridization reaction is performed with the following hybridization solution and conditions. The reaction may be performed manually using a slide glass and a hybridization chamber without using a hybridization apparatus.

（ハイブリダイゼーション溶液）
以下にハイブリダイゼーション溶液の組成の一例を示す。
「６×ＳＳＰＥ／１０％Ｆｏｒｍａｍｉｄｅ／ターゲット（未知検体由来の核酸）（ＰＣＲ後片鎖化した産物 0.5nM）／０．０５％ＳＤＳ」
前述の増幅後片鎖化した産物0.5nM相当をバッファー（ＳＳＰＥ）に溶かし、最終濃度が１０％になるようにＦｏｒｍａｍｉｄｅを加える。この溶液に最終濃度が０．０５％になるようにＳＤＳ溶液を加え、ハイブリダイゼーション溶液とする。なお、バッファー（ＳＳＰＥ）の濃度は、最終溶液の状態で６×ＳＳＰＥとなるよう、予め計算しておく。 (Hybridization solution)
An example of the composition of the hybridization solution is shown below.
“6 × SSPE / 10% Formamide / Target (Nucleic acid derived from an unknown sample) (Product obtained by single strand after PCR 0.5 nM) /0.05% SDS”
Dissolve 0.5 nM of the product that has been single-stranded after amplification in buffer (SSPE), and add Formamide to a final concentration of 10%. An SDS solution is added to this solution so that the final concentration is 0.05% to obtain a hybridization solution. Note that the concentration of the buffer (SSPE) is calculated in advance so as to be 6 × SSPE in the final solution state.

上記ハイブリダイゼーション溶液を、９２℃に加温し２分間保持したあと、さらに６０℃で４時間保持した。その後、２×ＳＳＣおよび０．１％ＳＤＳを用いて、５０℃で洗浄をした。さらに２×ＳＳＣを用いて２０℃で洗浄を行い、必要に応じて通常のマニュアルに従い純水でリンス、スピンドライ装置で水切りを行った。 The hybridization solution was heated to 92 ° C. and held for 2 minutes, and further held at 60 ° C. for 4 hours. Thereafter, washing was performed at 50 ° C. using 2 × SSC and 0.1% SDS. Furthermore, it wash | cleaned at 20 degreeC using 2 * SSC, and rinsed with the pure water according to the normal manual as needed, and drained with the spin dryer.

（蛍光測定）
前述のＤＮＡマイクロアレイを、ＤＮＡマイクロアレイ用蛍光検出装置（Ａｘｏｎ社製、ＧｅｎｅＰｉｘ４０００Ｂ）を用いて、以下の条件で蛍光測定を行った。蛍光測定波長をＣｙ３およびＣｙ５測定波長とし、蛍光測定値が３００００以下となるように励起光の強さを調整して測定した。 (Fluorescence measurement)
The above-described DNA microarray was subjected to fluorescence measurement under the following conditions using a fluorescence detection apparatus for DNA microarray (Axon, GenePix 4000B). The fluorescence measurement wavelengths were set to Cy3 and Cy5 measurement wavelengths, and the intensity of excitation light was adjusted so that the fluorescence measurement value was 30000 or less.

（スポット解析）
蛍光測定結果の画像を、マイクロアレイ用のデータ解析ソフトＡｒｒａｙＰｒｏ（ＭｅｄｉａＣｙｂｅｒｎｅｔｉｃｓ社製）で解析を行い、各スポットに対する輝度値のデータを得た。 (Spot analysis)
The image of the fluorescence measurement result was analyzed with the data analysis software ArrayPro (manufactured by Media Cybernetics) for microarray, and the brightness value data for each spot was obtained.

（結果）
上記ハプロタイピングにより、相の確定した10検体の結果を以下に示す。SNPペア2‐4のハプロタイピングにより＃348、＃493、＃484が、SNPペア2‐5のハプロタイピングにより＃418が、SNPペア1‐3のハプロタイピングにより＃317のディプロタイプを判別することができた。 (result)
The results of 10 specimens whose phases have been determined by the haplotyping are shown below. # 348, # 493, # 484 are identified by haplotyping of SNP pair 2-4, # 418 is identified by haplotyping of SNP pair 2-5, and # 317 diplotype is identified by haplotyping of SNP pair 1-3. I was able to.

（実施例２）
実施例１では、ハプロタイピングの手法として5kbp程度離れたSNPであっても対応可能な手法を用いたが、そのような手法が用いられない場合がある。その際には、可能な範囲内でより有効なハプロタイピングをSNPタイピングの組み合わせを選択する必要がある。実施例２では、ハプロタイピングが500bp以下（５００ｂｐ以内）のSNP間で有効である場合について、最適なSNP選択の様子を示す。またハプロタイピングの手法としては、アレル特異的PCRと基板上のハイブリダイゼーションを用いた手法を用いる。 (Example 2)
In the first embodiment, as a haplotyping method, a method that can cope with even an SNP separated by about 5 kbp is used, but such a method may not be used. In that case, it is necessary to select a combination of SNP typing and haplotyping that is more effective within the possible range. In the second embodiment, an optimum SNP selection state is shown for a case where haplotyping is effective between SNPs of 500 bp or less (within 500 bp). As a haplotyping method, a method using allele-specific PCR and hybridization on a substrate is used.

対象は実施例１と同様にSAA1の5箇所にSNPとする。与えられたハプロタイプ頻度を基に等しいジェノタイプを与えるディプロタイプを求め、Ｄ'＝１とならないペア(1-2,1-3, 1-5,2-4,2-5,3-4,3-5)と、判定ができないディプロタイプ(d1〜d11)についての表６（以下に同じ表を表１４として示す）を作成するところまでは、実施例１と同様である。 The target is SNP in five locations of SAA1 as in Example 1. Find a diplotype that gives an equal genotype based on a given haplotype frequency, and pair that does not have D ′ = 1 (1-2,1-3, 1-5,2-4,2-5,3-4, The process is the same as in Example 1 until 3-6) and Table 6 for diplotypes (d1 to d11) that cannot be determined are prepared (the same table is shown as Table 14 below).

しかし本実施例で用いられるハプロタイピングの手法では、正確なタイピングが可能であるSNPペアの距離が500bp以下であるとする。その場合には、2-4や2-5のハプロタイピングはできないために、1-2，1-3のみでハプロタイピングを行う。このときd5〜d11の判定が可能となる。 However, in the haplotyping method used in this embodiment, it is assumed that the distance between SNP pairs that can be accurately typed is 500 bp or less. In that case, since haplotyping of 2-4 and 2-5 is not possible, haplotyping is performed only with 1-2 and 1-3. At this time, determination of d5 to d11 is possible.

以上より、SAA１のディプロタイプを求めるための構成としては、
（１）2995C>T、3010C>T のSNPタイピング
（２）-61C>G、-13T>C、-2G>Aのハプロタイピング
を行なえばよい。こうすることで、d5〜d11のディプロタイプを判定することが可能になる。d1〜d4に関しては判定ができないために、確率の高い方のディプロタイプで代表すると、d2、d4の値が反映されない。よってこの場合に切り捨てられてしまうディプロタイプの割合は0.037％となり、2500人に一人となる。SNPタイピングのみを行った場合には100人に一人であったものが、一回のハプロタイピングを組み合わせることにより、2500人に一人に減少できたことになる。 From the above, as a configuration for obtaining the diplotype of SAA1,
(1) SNP typing of 2995C> T, 3010C> T (2) Haplotyping of -61C> G, -13T> C, -2G> A may be performed. By doing so, it becomes possible to determine the diplotypes of d5 to d11. Since d1 to d4 cannot be determined, the values of d2 and d4 are not reflected when represented by the diplotype with the higher probability. Therefore, the percentage of diplotypes that are truncated in this case is 0.037%, which is one in 2500 people. When only SNP typing was performed, one person per 100 people could be reduced to one person per 2500 people by combining one haplotyping.

しかしいくら頻度が低くても、d2もしくはd4が疾患や副作用と相関するアレルである場合には、d2やd4を検出する必要がでてくる。そのような場合には、ハプロタイピング手法でもっと距離の長いSNPペアに対応するものを選択するか、もしくはｍRNAから増幅するなどの工夫が必要になる。本実施例では、ハプロタイプのどれかにフェノタイプと相関をもつことを想定していないために、判定できるディプロタイプ頻度の割合に着目している。しかし本発明によるアルゴリズムは、フェノタイプ相関ハプロタイプが特定されている場合にも応用することが可能である。 However, no matter how low the frequency, if d2 or d4 is an allele that correlates with a disease or a side effect, it is necessary to detect d2 or d4. In such a case, it is necessary to select a haplotyping method corresponding to a longer distance SNP pair or to amplify from mRNA. In the present embodiment, since it is not assumed that any of the haplotypes has a correlation with the phenotype, the ratio of the diplotype frequency that can be determined is focused. However, the algorithm according to the present invention can also be applied when a phenotype correlation haplotype is specified.

次に、本実施例におけるハプロタイピングとSNPタイピングを同時に行う手法を具体的に示す。SNPタイピング用のプローブ、プライマーを以下のように設定する。4(2995C>T)と5(3010C>T)は距離が近いので、双方を含むようにプライマーを設計した(ＰＣＲ産物長516bp)。 Next, a method for simultaneously performing haplotyping and SNP typing in the present embodiment will be specifically described. Set the probes and primers for SNP typing as follows. Since 4 (2995C> T) and 5 (3010C> T) are close to each other, primers were designed to include both (PCR product length 516 bp).

但しここで、Tm計算時の条件は以下の値を用いた。 However, here, the following values were used as conditions for Tm calculation.

ハプロタイピングに関しては、1-2と1-3を同時にハプロタイピングするように設計するために、1(-61C>G)においてアレル特異的プライマーを設定し、2(-13T>C)と3(-2G>A)を増幅産物に含むようにReverse Primerを設定した(PCR産物長486bp)。また2(-13T>C)と3(-2G>A)に対しては、基板上に固定するプローブを設計した。 For haplotyping, to design to haplotype 1-2 and 1-3 simultaneously, set an allele-specific primer at 1 (-61C> G) and 2 (-13T> C) and 3 ( Reverse Primer was set so that -2G> A) was included in the amplification product (PCR product length 486 bp). For 2 (-13T> C) and 3 (-2G> A), we designed a probe to be fixed on the substrate.

SNP箇所と設計されたプライマー、プローブの位置関係を図１１に示す。 FIG. 11 shows the positional relationship between the SNP site and the designed primers and probes.

また以下の実施例では、SNP用もしくはハプロタイピング用の増幅を行う前に、SAA1領域全体の増幅を行ってテンプレートを作成している。その際に用いたプライマーを以下に示す。 In the following examples, the template is created by performing amplification of the entire SAA1 region before performing amplification for SNP or haplotyping. The primer used in that case is shown below.

検体は実施例１と同様、PSC株由来の抽出ゲノム10検体を用いた。用いた10検体の5箇所のSNPに関するジェノタイプデータは、実施例１に示したとおりである。 As in Example 1, 10 extracted genome samples derived from the PSC strain were used. The genotype data relating to the 5 SNPs of the 10 specimens used are as shown in Example 1.

(マイクロアレイの構成)
図６にマイクロアレイ上にプローブが固定されている様子を示す。プローブの固定は特開平11-187900号公報に詳細が示されているように、表面処理を行った基板にインクジェットにより３’末端をチオール化されたオリゴDNAを吐出する方法を用いる。ここでプローブとなるDNAは25塩基程度の長さをもち、(株)ベックスから購入したものである。 (Configuration of microarray)
FIG. 6 shows how the probe is fixed on the microarray. As described in detail in Japanese Patent Application Laid-Open No. 11-187900, the probe is immobilized using a method of discharging oligo DNA having a 3 ′ end thiolated by inkjet onto a surface-treated substrate. Here, the DNA used as a probe has a length of about 25 bases and was purchased from Bex Corporation.

(ターゲットの準備)
検体由来の核酸の増幅反応(PCR)の例を以下に示す。増幅反応液組成の例を以下に示す。 (Preparing the target)
An example of amplification reaction (PCR) of nucleic acid derived from a specimen is shown below. An example of the amplification reaction solution composition is shown below.

--PCR溶液組成--
Takara LA Taq：0.25μl
Genome DNA(50ng/μl)：1μl
Forward/Reverse Primer(1uM)：3μl
dNTP (2.5mM)：4.5μl
buffer I：12.5μl
H₂O：0.75μl
Total：25μl
上記組成の反応液を図１５に示す温度サイクルのプロトコルに従って、サーマルサイクラーを用い増幅反応を行った。 --PCR solution composition--
Takara LA Taq: 0.25μl
Genome DNA (50ng / μl): 1μl
Forward / Reverse Primer (1uM): 3μl
dNTP (2.5 mM): 4.5 μl
buffer I: 12.5 μl
H ₂ O: 0.75 μl
Total: 25μl
The reaction solution having the above composition was subjected to an amplification reaction using a thermal cycler according to the temperature cycle protocol shown in FIG.

反応終了後、精製用カラム（QUIAGEN QIAquick PCR Purification Kit: QUIGEN社製）を用いてPrimerを除去した後、電気泳動（BioAnalyzer: Agilent社製）により、増幅産物の定量を行う。 After completion of the reaction, the primer is removed using a purification column (QUIAGEN QIAquick PCR Purification Kit: manufactured by QUIGEN), and the amplification product is quantified by electrophoresis (BioAnalyzer: manufactured by Agilent).

(SNPタイピング用検体処理)
前述の増幅したPCR産物を用い、SNP箇所を含む領域のPCRを行う。増幅ではCy3標識されたPrimerを用いる。このときのプロトコルを以下に示す。
--PCR溶液組成--
AmpliTaq Gold (Applied Biosystems)：0.2μl
Template Genome DNA：4ng
Forward/Reverse Primer：1μM each
dNTP mix：0.2 mM each
10xbuffer：2.5μl
Total：25μl
上記組成の反応液を図１６に示す温度サイクルのプロトコルに従って、サーマルサイクラーを用い増幅反応を行った。 (Sample processing for SNP typing)
Using the amplified PCR product described above, PCR is performed on the region containing the SNP site. For amplification, a Cy3-labeled Primer is used. The protocol at this time is shown below.
--PCR solution composition--
AmpliTaq Gold (Applied Biosystems): 0.2 μl
Template Genome DNA: 4ng
Forward / Reverse Primer: 1μM each
dNTP mix: 0.2 mM each
10xbuffer: 2.5μl
Total: 25μl
The reaction solution having the above composition was subjected to an amplification reaction using a thermal cycler according to the temperature cycle protocol shown in FIG.

(ハプロタイピング用検体処理：アレル特異的PCR)
前述の増幅したPCR産物を用い、アレル特異的PCRを行う。増幅ではCy3標識およびCy5標識されたForward Primerを用いてPCRを行う。このときのプロトコルを以下に示す。 (Haplotyping specimen processing: allele-specific PCR)
Allele-specific PCR is performed using the amplified PCR product described above. For amplification, PCR is performed using a Cy3-labeled and Cy5-labeled Forward Primer. The protocol at this time is shown below.

--PCR溶液組成--
AmpliTaq Gold (Applied Biosystems)：0.2μl
Template Genome DNA：4 ng
Forward/Reverse Primer：0.06μM each
dNTP mix：0.2 mM each
10xbuffer：2.5μl
Total：25μl
上記組成の反応液を図１７に示す温度サイクルのプロトコルに従って、サーマルサイクラーを用い増幅反応を行った。 --PCR solution composition--
AmpliTaq Gold (Applied Biosystems): 0.2 μl
Template Genome DNA: 4 ng
Forward / Reverse Primer: 0.06μM each
dNTP mix: 0.2 mM each
10xbuffer: 2.5μl
Total: 25μl
The reaction solution having the above composition was subjected to an amplification reaction using a thermal cycler according to the temperature cycle protocol shown in FIG.

（ハイブリダイゼーション溶液）
以下にハイブリダイゼーション溶液の組成の一例を示す。
「６×ＳＳＰＥ／１０％Ｆｏｒｍａｍｉｄｅ／ターゲット（未知検体由来の核酸）（ＰＣＲ産物１００ｎｇ）／０．０５％ＳＤＳ」
前述の増幅した未知検体由来の核酸１００ｎｇ相当をバッファー（ＳＳＰＥ）に溶かし、最終濃度が１０％になるようにＦｏｒｍａｍｉｄｅを加える。この溶液に最終濃度が０．０５％になるようにＳＤＳ溶液を加え、ハイブリダイゼーション溶液とする。なお、バッファー（ＳＳＰＥ）の濃度は、最終溶液の状態で６×ＳＳＰＥとなるよう、予め計算しておく。 (Hybridization solution)
An example of the composition of the hybridization solution is shown below.
“6 × SSPE / 10% Formamide / Target (Nucleic acid derived nucleic acid) (PCR product 100 ng) /0.05% SDS”
The nucleic acid equivalent to 100 ng of nucleic acid derived from the above-mentioned amplified unknown sample is dissolved in a buffer (SSPE), and Formamide is added so that the final concentration becomes 10%. An SDS solution is added to this solution so that the final concentration is 0.05% to obtain a hybridization solution. Note that the concentration of the buffer (SSPE) is calculated in advance so as to be 6 × SSPE in the final solution state.

上記ハイブリダイゼーション溶液を、９２℃に加温し２分間保持したあと、さらに５０℃で４時間保持した。その後、２×ＳＳＣおよび０．１％ＳＤＳを用いて、４０℃で洗浄をした。さらに２×ＳＳＣを用いて２０℃で洗浄を行い、必要に応じて通常のマニュアルに従い純水でリンス、スピンドライ装置で水切りを行った。 The hybridization solution was heated to 92 ° C. and held for 2 minutes, and further held at 50 ° C. for 4 hours. Thereafter, washing was performed at 40 ° C. using 2 × SSC and 0.1% SDS. Furthermore, it wash | cleaned at 20 degreeC using 2 * SSC, and rinsed with the pure water according to the normal manual as needed, and drained with the spin dryer.

（結果）
本実施例によるハプロタイピングにより、相の確定した10検体の結果を以下に示す。SNPペア1‐3のハプロタイピングにより＃317の相が確定し、SNPペア1‐2、1‐3のハプロタイピングによりd4、d5ではないことがわかるために、＃418の相がd3と判定することができた。 (result)
The results of 10 specimens whose phases are determined by haplotyping according to this example are shown below. Because the haplotyping of SNP pair 1-3 determines the phase of # 317 and the haplotyping of SNP pairs 1-2 and 1-3 indicates that it is not d4 or d5, the phase of # 418 is determined to be d3. I was able to.

（実施例３）
本実施例では、ヒトのＡＬＤＨ２遺伝子に関するハプロタイプデータを用いて、本発明の提案する手法の有用性を示す。
［１．ハプロタイプデータの取得］
HapMapプロジェクト(http://www.hapmap.org/)により公開されているＡＬＤＨ２の９箇所のＳＮＰを取得した。９箇所のＳＮＰはそれぞれ、以下の表２１と図１２に示すようになっている。表２１の「rsＳＮＰid」の列は、各ＳＮＰのIDを示す。「alleles」の列にはＳＮＰ位置におけるalleleを現わし、「A/G」のようになっている場合は、「A」が野生型で「G」が変異型である。「MAF」の列には、Minor Allele(すなわち、変異型のallele)の頻度を表す。 (Example 3)
In this example, the usefulness of the technique proposed by the present invention is shown using haplotype data relating to the human ALDH2 gene.
[1. Acquisition of haplotype data]
Nine SNPs of ALDH2 published by the HapMap project (http://www.hapmap.org/) were obtained. The nine SNPs are as shown in Table 21 below and FIG. The column of “rsSNPid” in Table 21 indicates the ID of each SNP. In the column “alleles”, allele at the SNP position appears. When “A / G” is shown, “A” is a wild type and “G” is a mutant type. The column “MAF” represents the frequency of Minor Allele (ie, mutant allele).

［２．連鎖不平衡係数の計算］
HaploView (http://www.broad.mit.edu/mpg/haploview/)を用いて、それぞれのＳＮＰ間の連鎖不平衡係数を計算した。計算結果を表２２に示す。 [2. Calculation of linkage disequilibrium coefficient]
The linkage disequilibrium coefficient between each SNP was calculated using HaploView (http://www.broad.mit.edu/mpg/haploview/). Table 22 shows the calculation results.

［３．簡略化ハプロタイプの作成］
表２２より、の値が1となるＳＮＰがあることが分った。以下の表２３にの値が1となるＳＮＰの組み合わせを表す。 [3. Creating simplified haplotypes]
From Table 22, it was found that there was an SNP with a value of 1. Table 23 below shows combinations of SNPs with a value of 1.

図１３は、の値が1となっているＳＮＰ位置を線で結んでグループ化した図である。 FIG. 13 is a diagram in which SNP positions having a value of 1 are grouped by connecting with a line.

以上より、表２４と図１４に示すように、ハプロタイプを簡略化する。 As described above, the haplotype is simplified as shown in Table 24 and FIG.

上記の表２４に示す4箇所のＳＮＰの組み合わせから、考えられるハプロタイプの候補を次のように組み立てる。 From the combinations of four SNPs shown in Table 24 above, possible haplotype candidates are assembled as follows.

［４．ハプロタイプ候補の作成］
HaploViewの機能により、以下の表２５に示すハプロタイプとその頻度が求められた。 [4. Create haplotype candidates]
The haplotypes shown in Table 25 below and their frequencies were determined by the function of HaploView.

先に作成した簡略化ハプロタイプで、表２５の組み合わせを示すと、表２６のようになる。 Table 26 shows the combinations of Table 25 with the simplified haplotypes created earlier.

［５．ジェノタイプ候補の作成］
表２６に示すハプロタイプからなる、可能性のあるジェノタイプ全てを作成すると、以下の表２７のようになる。 [5. Create Genotype Candidate]
When all possible genotypes composed of the haplotypes shown in Table 26 are created, Table 27 below is obtained.

［６．ハプロタイプ決定のための方針決定］
上記表２７の結果より、本実施例で対象とする9箇所からなるSNPsからなるハプロタイプの決定のためには、表２８に示す４箇所のSNP位置にてジェノタピングを行えば良いということが分った。 [6. Policy decision for haplotype determination]
From the results of Table 27 above, it can be seen that in order to determine the haplotype consisting of the nine SNPs targeted in this example, genotyping should be performed at the four SNP positions shown in Table 28. It was.

本発明でのアルゴリズムを示す図である。It is a figure which shows the algorithm in this invention. 判定フロー図である。It is a determination flowchart. 10箇所のSNPの中で、Δ²＝１となるSNP同士をまとめて一つのグループとし、各グループの代表SNPのみを集めたものを簡略化したハプロタイプと呼ぶ場合を説明する図である。It is a figure explaining the case where SNP which becomes (DELTA) ² = 1 is put together into one group among 10 SNPs, and what collected only the representative SNP of each group is called the simplified haplotype. 簡略化されたハプロタイプを構成するSNP間のすべてのペアについて、連鎖不平衡係数D'を計算し、D'≠１のペアのみを選択する場合を説明するための図である。It is a figure for demonstrating the case where the linkage disequilibrium coefficient D 'is calculated about all the pairs between SNPs which comprise the simplified haplotype, and only the pair of D' ≠ 1 is selected. 10箇所のSNPの中で、本アルゴリズムにより選択された、ハプロタイピングをするSNPと、SNPタイピングをするSNPの模式図である。It is the schematic diagram of SNP which performs haplotyping, and SNP which performs SNP typing selected by this algorithm in ten SNPs. DNAチップを用いたSNPタイピングの模式図である。It is a schematic diagram of SNP typing using a DNA chip. DNAチップを用いたハプロタイピングの模式図である。It is a schematic diagram of haplotyping using a DNA chip. DNAチップにより、SNPタイピングとハプロタイピングを同時に行う構成を示す図である。SNPaとbではハプロタイピング用のプローブを、cとdではSNPタイピング用のプローブを固定した様子を示している。４SNPそれぞれに対して野生型(W)と変異型(M)のプローブを固定している。It is a figure which shows the structure which performs SNP typing and haplotyping simultaneously with a DNA chip. SNPa and b show haplotyping probes, and c and d show SNP typing probes fixed. Wild type (W) and mutant type (M) probes are immobilized on each of the 4 SNPs. 実施形態によるハプロタイピング箇所決定を行うプログラムを適用可能な情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus which can apply the program which performs the haplotyping location determination by embodiment. 実施形態によるコンピューターシステムのフローチャートである。It is a flowchart of the computer system by embodiment. 実施例2における、プライマーとプローブの位置を表した図である。FIG. 4 is a diagram showing the positions of primers and probes in Example 2. ＡＬＤＨ２のＳＮＰ位置を示した図である。It is the figure which showed the SNP position of ALDH2. ＡＬＤＨ２のＳＮＰで連鎖不平衡係数になっているＳＮＰの位置を示した図である。It is the figure which showed the position of SNP used as the linkage disequilibrium coefficient by SNP of ALDH2. ＡＬＤＨ２の簡略化されたハプロタイプの図である。FIG. 4 is a simplified haplotype diagram of ALDH2. PCR反応の温度サイクルを示す図である。It is a figure which shows the temperature cycle of PCR reaction. PCR反応の温度サイクルを示す図である。It is a figure which shows the temperature cycle of PCR reaction. PCR反応の温度サイクルを示す図である。It is a figure which shows the temperature cycle of PCR reaction.

Explanation of symbols

９１中央処理装置
９２記憶装置
９３ＲＡＭ
９４入出力装置
９５バス
１０１ハプロタイプを入力するステップ
１０２簡略化されたハプロタイプを構成するステップ
１０３ハプロタイピングする箇所を出力するステップ 91 Central processing unit 92 Storage device 93 RAM
94 Input / Output Device 95 Bus 101 Step 102 for Inputting Haplotype Step 102 for Constructing Simplified Haplotype 103 Step for Outputting Location for Haplotyping

Claims

Perform haplotyping on a part of the plurality of SNPs constituting the haplotype of the target gene, perform SNP typing on the remaining part or all, and determine the haplotype of the target gene from the result of both typings Haplotype determination method characterized by

The haplotype determination method according to claim 1, wherein the SNP to be haplotyping is selected based on frequency information from SNP pairs included in a plurality of diplotypes having the same genotype.

The haplotype determination method according to claim 2, wherein the SNP that is a target of the SNP typing is selected from SNPs other than the SNP that is the target of the haplotyping.

The haplotype determination method according to claim 1 or 2, wherein the SNP to be haplotyping is selected based on information on linkage disequilibrium between SNPs.

2. The haplotype determination method according to claim 1, wherein the SNP to be haplotyping is selected from the following steps.
1) calculating a linkage disequilibrium coefficient Δ ² between SNPs by the following formula (1), grouping SNPs where Δ ² = 1, and selecting a representative SNP in the group;
(However, when one SNP has alleles a and b and the other SNP has alleles c and d, the allele frequencies are Pa, Pb (= 1-Pa), Pc, Pd (= 1-Pc), (Haplotype frequency is Pac, Pad, Pbc, Pbd.)
2) selecting a diplotype having the same genotype from all the combinations of the representative SNPs;
3) Calculate the linkage disequilibrium coefficient D ′ according to the following equation (2) for all pairs of the representative SNPs,
D '≠ 1
Selecting a pair to be
4) A step of narrowing down representative SNP pairs that can be haplotyped based on the frequency of representative SNP pairs in the selected diplotype.

6. The haplotype determination method according to claim 5, wherein the representative SNP pair capable of being haplotyped in step 4) has a position on the gene of 500 bp or less.

2. The haplotype determination method according to claim 1, wherein the SNP typing and the haplotyping are performed simultaneously using a DNA chip.