JP2020174614A

JP2020174614A - Method for determining risk of attention deficit hyperactivity syndrome

Info

Publication number: JP2020174614A
Application number: JP2019080847A
Authority: JP
Inventors: ヨスバニ宇田川; Yosvani Utagawa; 鶴黄; He Huang; バラン伊里佐藤; Baran Iri Sato
Original assignee: Genesis Healthcare Corp
Current assignee: Genesis Healthcare Corp
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2020-10-29
Anticipated expiration: 2039-04-22
Also published as: JP7138073B2

Abstract

To provide a method for determining the risk of attention deficit hyperactivity syndrome.SOLUTION: The present invention provides a method for determining the risk of attention deficit hyperactivity syndrome, on the basis of genotype information of a single nucleotide polymorphism set at least including rs12942547, rs889472, rs10883437, rs3132613, rs16998073, rs889140, rs4722404, rs2307121, and rs371998411.SELECTED DRAWING: Figure 1

Description

本発明は注意欠陥多動性症候群のリスクを判定する方法に関する。 The present invention relates to a method for determining the risk of attention deficit hyperactivity syndrome.

疾患のリスクの判定に用いるために、一塩基多型（以下、「ＳＮＰ」ともいう。）と疾患との関連性の特定が進められている。ＮＣＢＩＳＮＰＤａｔａｂａｓｅは、ヒトのＳＮＰをまとめたデータベースであり、ＳＮＰにｒｓ番号を付して管理している。本明細書におけるｒｓ番号も、このＮＣＢＩＳＮＰＤａｔａｂａｓｅにおける登録番号を意味するものとする。 Identification of the relationship between single nucleotide polymorphisms (hereinafter, also referred to as "SNP") and diseases is being promoted in order to use them for determining the risk of diseases. The NCBI SNP Database is a database that summarizes human SNPs, and manages SNPs with rs numbers. The rs number in the present specification also means the registration number in this NCBI SNP Database.

本明細書においてｒｓ番号で特定されるＳＮＰと、当該ＳＮＰに関連する疾患、病態又は状態等として非特許文献等で開示されているものとの関係は、以下のとおりである。
rs12942547：炎症性腸疾患に関するＳＮＰ（非特許文献 1）
rs889472：慢性腎臓病に関するＳＮＰ（非特許文献 2）
rs889472：血液中の尿酸濃度に関するＳＮＰ（非特許文献 3）
rs10883437：肝臓酵素レベル（ALT）に関するＳＮＰ（非特許文献 4）
rs3132613：バセドウ病に関するＳＮＰ（非特許文献 5）
rs16998073：収縮期血圧に関するＳＮＰ（非特許文献 6）
rs16998073：拡張期血圧に関するＳＮＰ（非特許文献 7）
rs889140：血液中のアディポネクチン濃度に関するＳＮＰ（非特許文献 8）
rs4722404：アトピー性皮膚炎に関するＳＮＰ（非特許文献 9）
rs2307121：角膜の厚さに関するＳＮＰ（非特許文献 10）
rs371998411：血小板数に関するＳＮＰ（非特許文献 11）
rs371998411：平均赤血球ヘモグロビン量に関するＳＮＰ（非特許文献 12）
rs371998411：ヘマトクリット値に関するＳＮＰ（非特許文献 13）
rs371998411：平均赤血球容積に関するＳＮＰ（非特許文献 14）
rs371998411：赤血球数に関するＳＮＰ（非特許文献 15）
rs371998411：平均赤血球ヘモグロビン濃度に関するＳＮＰ（非特許文献 16） The relationship between the SNP specified by the rs number in the present specification and those disclosed in non-patent documents as diseases, pathological conditions, conditions, etc. related to the SNP is as follows.
rs12942547: SNP for inflammatory bowel disease (Non-Patent Document 1)
rs889472: SNP for chronic kidney disease (Non-Patent Document 2)
rs889472: SNP regarding uric acid concentration in blood (Non-Patent Document 3)
rs10883437: SNP on liver enzyme level (ALT) (Non-Patent Document 4)
rs3132613: SNP on Graves' disease (Non-Patent Document 5)
rs16998073: SNP related to systolic blood pressure (Non-Patent Document 6)
rs16998073: SNP on diastolic blood pressure (Non-Patent Document 7)
rs889140: SNP regarding adiponectin concentration in blood (Non-Patent Document 8)
rs4722404: SNP on atopic dermatitis (Non-Patent Document 9)
rs2307121: SNP regarding corneal thickness (Non-Patent Document 10)
rs371998411: SNP on platelet count (Non-Patent Document 11)
rs371998411: SNP on mean corpuscular hemoglobin amount (Non-Patent Document 12)
rs371998411: SNP on hematocrit value (Non-Patent Document 13)
rs371998411: SNP on mean corpuscular volume (Non-Patent Document 14)
rs371998411: SNP on red blood cell count (Non-Patent Document 15)
rs371998411: SNP on mean corpuscular hemoglobin concentration (Non-Patent Document 16)

Fuyuno Y, Yamazaki K, Takahashi A, Esaki M, Kawaguchi T, Takazoe M, et al. Genetic characteristics of inflammatory bowel disease in a Japanese population. J. Gastroenterol. 2016;51: 672-81.Fuyuno Y, Yamazaki K, Takahashi A, Esaki M, Kawaguchi T, Takazoe M, et al. Genetic characteristics of inflammatory bowel disease in a Japanese population. J. Gastroenterol. 2016; 51: 672-81. Okada Y, Sim X, Go MJ, Wu JY, Gu D, Takeuchi F, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nat. Genet. 2012;44: 904-9.Okada Y, Sim X, Go MJ, Wu JY, Gu D, Takeuchi F, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nat. Genet. 2012; 44: 904-9 .. Okada Y, Sim X, Go MJ, Wu JY, Gu D, Takeuchi F, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nat. Genet. 2012;44: 904-9.Okada Y, Sim X, Go MJ, Wu JY, Gu D, Takeuchi F, et al. Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nat. Genet. 2012; 44: 904-9 .. Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat. Genet. 2011;43: 1131-8.Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P, et al. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat. Genet. 2011; 43: 1131-8. Nakabayashi K, Tajima A, Yamamoto K, Takahashi A, Hata K, Takashima Y, et al. Identification of independent risk loci for Graves' disease within the MHC in the Japanese population. J. Hum. Genet. 2011;56: 772-8.Nakabayashi K, Tajima A, Yamamoto K, Takahashi A, Hata K, Takashima Y, et al. Identification of independent risk loci for Graves' disease within the MHC in the Japanese population. J. Hum. Genet. 2011; 56: 772- 8. 8. Kato N, Takeuchi F, Tabara Y, Kelly TN, Go MJ, Sim X, et al. Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians. Nat. Genet. 2011;43: 531-8.Kato N, Takeuchi F, Tabara Y, Kelly TN, Go MJ, Sim X, et al. Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians. Nat. Genet. 2011; 43: 531-8. Kato N, Takeuchi F, Tabara Y, Kelly TN, Go MJ, Sim X, et al. Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians. Nat. Genet. 2011;43: 531-8.Kato N, Takeuchi F, Tabara Y, Kelly TN, Go MJ, Sim X, et al. Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians. Nat. Genet. 2011; 43: 531-8. Wu Y, Gao H, Li H, Tabara Y, Nakatochi M, Chiu YF, et al. A meta-analysis of genome-wide association studies for adiponectin levels in East Asians identifies a novel locus near WDR11-FGFR2. Hum. Mol. Genet. 2014;23: 1108-19.Wu Y, Gao H, Li H, Tabara Y, Nakatochi M, Chiu YF, et al. A meta-analysis of genome-wide association studies for adiponectin levels in East Asians identifies a novel locus near WDR11-FGFR2. Hum. Mol. Genet. 2014; 23: 1108-19. Hirota T, Takahashi A, Kubo M, Tsunoda T, Tomita K, Sakashita M, et al. Genome-wide association study identifies eight new susceptibility loci for atopic dermatitis in the Japanese population. Nat. Genet. 2012;44: 1222-6.Hirota T, Takahashi A, Kubo M, Tsunoda T, Tomita K, Sakashita M, et al. Genome-wide association study identifies eight new susceptibility loci for atopic dermatitis in the Japanese population. Nat. Genet. 2012; 44: 1222-6 .. Lu Y, Vitart V, Burdon KP, Khor CC, Bykhovskaya Y, Mirshahi A, et al. Genome-wide association analyses identify multiple loci associated with central corneal thickness and keratoconus. Nat. Genet. 2013;45: 155-63.Lu Y, Vitart V, Burdon KP, Khor CC, Bykhovskaya Y, Mirshahi A, et al. Genome-wide association analyzes identify multiple loci associated with central corneal thickness and keratoconus. Nat. Genet. 2013; 45: 155-63. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010;42: 210-5.Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010; 42: 210-5. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010;42: 210-5.Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010; 42: 210-5. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010;42: 210-5.Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010; 42: 210-5. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010;42: 210-5.Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010; 42: 210-5. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010;42: 210-5.Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010; 42: 210-5. Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010;42: 210-5.Kamatani Y, Matsuda K, Okada Y, Kubo M, Hosono N, Daigo Y, et al. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet. 2010; 42: 210-5.

本発明は、注意欠陥多動性症候群（以下、「本疾患」ともいう。）のリスクを判定する方法を提供することを課題とする。 An object of the present invention is to provide a method for determining the risk of attention deficit hyperactivity syndrome (hereinafter, also referred to as "the disease").

本発明者らは上記課題を解決するために鋭意検討した。その結果、これまで一見すると本疾患との関連性がないと認められる個々の一塩基多型を、一つのまとまったセットとしてみたときに、本疾患との関連性があることを見出した。そして、その関連性を用いることで、本疾患のリスクを判定する本発明を完成するに至った。 The present inventors have diligently studied to solve the above problems. As a result, it was found that individual single nucleotide polymorphisms, which are seemingly not related to this disease at first glance, are related to this disease when viewed as one set. Then, by using the relationship, the present invention for determining the risk of the present disease has been completed.

すなわち、本発明の方法では、本疾患との関連性を見出した、rs12942547、rs889472、rs10883437、rs3132613、rs16998073、rs889140、rs4722404、rs2307121、及びrs371998411を少なくとも含む一塩基多型セット（以下、「本ＳＮＰセット」ともいう。）の遺伝子型情報に基づいて、本疾患のリスクを判定する。 That is, in the method of the present invention, a single nucleotide polymorphism set containing at least rs12942547, rs889472, rs10883437, rs3132613, rs16998073, rs889140, rs4722404, rs2307121, and rs371998411 found to be related to this disease (hereinafter, "this SNP"). The risk of this disease is determined based on the genotype information of "set").

本発明の方法においては、「一塩基多型セット」とは、複数の一塩基多型の一つのまとまったセットを意味し、この一つのセットにより本疾患との関連性が見出されている。 In the method of the present invention, the "single nucleotide polymorphism set" means one set of a plurality of single nucleotide polymorphisms, and the association with the present disease has been found by this one set. ..

また、本発明の方法における「遺伝子型情報」とは、一塩基多型における２つのホモ接合型（ＡＡ，ＢＢ）と、ヘテロ接合型（ＡＢ）に分類して示される、一塩基多型の遺伝子型（Ｇｅｎｏｔｙｐｅ）の情報を意味し、「本ＳＮＰセットの遺伝子型情報」とは、本ＳＮＰセットにおいて特定される各一塩基多型の遺伝子型情報を一まとまりとしたセットを意味し、言い換えれば、各ｒｓ番号で示される塩基配列中の各ＳＮＰの多型となる塩基に関する情報のセットである。本ＳＮＰセットの遺伝子型情報は、図１において示すとおりである。 Further, the "genotype information" in the method of the present invention is a single nucleotide polymorphism, which is classified into two homoconjugate types (AA, BB) and a heterozygous type (AB) in the single nucleotide polymorphism. It means genotype information, and "genotype information of this SNP set" means a set of genotype information of each single nucleotide polymorphism specified in this SNP set, in other words. For example, it is a set of information regarding the genotypic base of each SNP in the base sequence indicated by each rs number. The genotype information of this SNP set is as shown in FIG.

本発明によれば、本疾患のリスクを判定することができる。 According to the present invention, the risk of this disease can be determined.

本ＳＮＰセットの遺伝子型情報を示す。The genotype information of this SNP set is shown. 本ＳＮＰセットの遺伝子型情報にＳＮＰ毎の接合型に対応付ける値の関係を示した変換テーブルの一例を示す。An example of a conversion table showing the relationship between the genotype information of this SNP set and the values associated with the junction type for each SNP is shown. 本ＳＮＰセットを用いたモデルのＲＯＣ曲線とＡＵＣを示す。また、Ｎ個のＳＮＰを含む本ＳＮＰセットから１つのＳＮＰを任意に抜いたＮ−１個のＳＮＰを含むＳＮＰセットを、「比較ＳＮＰセット」ともいい、各比較ＳＮＰセットを表す場合には、比較ＳＮＰセット１、比較ＳＮＰセット２と、記載する。The ROC curve and AUC of the model using this SNP set are shown. Further, an SNP set containing N-1 SNPs obtained by arbitrarily removing one SNP from the present SNP set containing N SNPs is also referred to as a "comparison SNP set", and when representing each comparison SNP set, it is referred to as a "comparison SNP set". It is described as a comparative SNP set 1 and a comparative SNP set 2. 比較ＳＮＰセット１を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 1 are shown. 比較ＳＮＰセット２を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 2 are shown. 比較ＳＮＰセット３を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 3 are shown. 比較ＳＮＰセット４を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 4 are shown. 比較ＳＮＰセット５を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 5 are shown. 比較ＳＮＰセット６を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 6 are shown. 比較ＳＮＰセット７を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 7 are shown. 比較ＳＮＰセット８を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 8 are shown. 比較ＳＮＰセット９を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 9 are shown. 比較ＳＮＰセット１０を用いたモデルのＲＯＣ曲線とＡＵＣを示す。The ROC curve and AUC of the model using the comparative SNP set 10 are shown.

本発明の実施の形態について説明する。以下の実施形態は、本発明を説明するための例示であり、本発明をこの実施形態にのみ限定する趣旨ではない。本発明は、その要旨を逸脱しない限り、様々な形態で実施することができる。 Embodiments of the present invention will be described. The following embodiments are examples for explaining the present invention, and the present invention is not intended to be limited to this embodiment. The present invention can be implemented in various forms as long as it does not deviate from the gist thereof.

本実施形態において、注意欠陥多動性症候群（ＡＤＨＤ）とは、注意力が乏しいか注意の持続時間が短い状態、年齢不相応の過剰な活動性や衝動性のため機能や発達が妨げられている状態、あるいはこれら両方に該当する状態をいう。 In this embodiment, attention deficit hyperactivity syndrome (ADHD) is a condition in which attention is poor or the duration of attention is short, and function and development are impaired due to excessive activity and impulsivity disproportionate to age. A state, or a state that corresponds to both of these.

また、本実施形態において、本疾患は、一般には、本疾患に関する医学会の公表するガイドラインに沿って診断される疾患、医療用医薬品の添付文書において、効能・効果の欄に記載される疾患、あるいは、医薬・医療業界において汎用される用語として理解される疾患の少なくともいずれかを意味するものと解することができる。 Further, in the present embodiment, the present disease is generally a disease diagnosed in accordance with the guidelines published by the Medical Society regarding the present disease, and a disease described in the column of efficacy / effect in the attached document of the medical drug. Alternatively, it can be understood to mean at least one of the diseases understood as a general term in the pharmaceutical / medical industry.

本実施形態の方法においては、一見すると本疾患との関連性がないと認められる所定数の一塩基多型セットを用いて、本疾患のリスクを判定する。 In the method of the present embodiment, the risk of the disease is determined using a predetermined number of single nucleotide polymorphism sets that are seemingly unrelated to the disease.

本疾患のリスクとは、本疾患の罹りやすさや罹りにくさなどの本疾患に罹る可能性をいう。「リスクを判定する」とは、例えば、現在または将来において本疾患に罹る可能性をいくつかのレベルに分けて出力することや、数値により出力することを含む。本疾患のリスクの判定には、疾患に罹りやすい傾向にあるのか、罹りにくい傾向にあるのかといった、疾患に対する遺伝的要因あるいは遺伝的感受性についての評価が含まれる。 The risk of this disease refers to the possibility of contracting this disease such as susceptibility to this disease and susceptibility to this disease. "Determining the risk" includes, for example, outputting the possibility of contracting the disease at present or in the future by dividing it into several levels, or outputting it numerically. Determining the risk of the disease includes assessing genetic factors or susceptibility to the disease, such as whether it is prone to disease or not.

なお、本疾患のリスクを判定するにあたっては、本疾患のリスクの判定を受ける対象者が、本疾患のリスクの判定時において、実際に本疾患に罹患しているか（発症しているか）否かは問わない。 In determining the risk of this disease, whether or not the subject who receives the determination of the risk of this disease is actually suffering (or developing) this disease at the time of determining the risk of this disease. Does not matter.

本実施形態の方法では、本ＳＮＰセットで特定される各ＳＮＰの遺伝子型を２つのホモ接合型（ＡＡ，ＢＢ）と、ヘテロ接合型（ＡＢ）に分類した遺伝子型のセットである、本ＳＮＰセットの遺伝子型情報を用いる。そして、本ＳＮＰセットの遺伝子型情報に基づいて、対象者の本疾患のリスクを判定する。 In the method of the present embodiment, the present SNP is a set of genotypes in which the genotypes of each SNP specified in the present SNP set are classified into two homozygous types (AA, BB) and heterozygous types (AB). Use the genotype information of the set. Then, the risk of the disease of the subject is determined based on the genotype information of the SNP set.

本実施形態の方法で用いる本ＳＮＰセットは、本疾患との関連性が従来認められていなかったＳＮＰを含むセットである。すなわち、通常は、本ＳＮＰセットに含まれるＳＮＰを個別に分析したとしても、本疾患のリスクを判定することはできない。しかしながら、本実施形態の方法では、本ＳＮＰセットに含まれる各ＳＮＰの遺伝子型情報を一まとまりのセットとして分析することにより、本疾患のリスクを判定することができる。また、本ＳＮＰセットを分析した場合と、比較ＳＮＰセットを分析した場合とを比較すると、本ＳＮＰセットを分析した場合の方が統計的に有意な結果が得られている。すなわち、本実施形態の方法において、本ＳＮＰセットを分析して本疾患のリスクを判定することで、高精度が高い、あるいは予測能力が高いリスクの判定方法を提供することができる。 The SNP set used in the method of the present embodiment is a set including an SNP whose association with the disease has not been previously recognized. That is, usually, even if the SNPs included in this SNP set are analyzed individually, the risk of this disease cannot be determined. However, in the method of the present embodiment, the risk of this disease can be determined by analyzing the genotype information of each SNP included in this SNP set as a set. Further, when comparing the case where the present SNP set is analyzed and the case where the comparative SNP set is analyzed, a statistically significant result is obtained when the present SNP set is analyzed. That is, in the method of the present embodiment, by analyzing the SNP set to determine the risk of the disease, it is possible to provide a risk determination method having high accuracy or high predictive ability.

以下、本ＳＮＰセットに含まれる各ＳＮＰに関連して、ｒｓ番号と、各ＳＮＰが存在する染色体番号（性染色体の場合には、ＸかＹで示す）と、各ＳＮＰの染色体上の位置と、ｒｓ番号に対応する塩基配列と、を列記する。なお、各ｒｓ番号で示される塩基配列中において、ＳＮＰは［］で囲って示す。例えば、［Ａ／Ｇ］と表記した場合には、その塩基配列の位置においてＡ又はＧの一塩基多型があることを示す。また、各ＳＮＰに関する塩基配列や疾患などの情報は、例えば、ｒｓ番号に基づいてＮＣＢＩＳＮＰＤａｔａｂａｓｅを検索することで得られる。それらの情報は当該Ｄａｔａｂａｓｅにより参照可能であり、また、本明細書で援用する。なお、以下に記す染色体上の位置は、ａｓｓｅｍｂｌｙｇｅｎｏｍｅのバージョンＧＲＣｈ３７に対応するものである。 Hereinafter, in relation to each SNP included in this SNP set, the rs number, the chromosome number in which each SNP exists (in the case of a sex chromosome, indicated by X or Y), and the position of each SNP on the chromosome. , The base sequence corresponding to the rs number and the base sequence are listed. In the base sequence indicated by each rs number, the SNP is indicated by enclosing it in []. For example, when it is expressed as [A / G], it indicates that there is a single nucleotide polymorphism of A or G at the position of the base sequence. In addition, information such as a base sequence and a disease related to each SNP can be obtained, for example, by searching the NCBI SNP Database based on the rs number. Such information can be referred to by the relevant Database and is incorporated herein by reference. The positions on the chromosome described below correspond to the version GRCh37 of the assembly genome.

ｒｓ１２９４２５４７
染色体番号１７
染色体上の位置４０５２７５４４
塩基配列ＧＧＣＣＣＣＴＴＴＡＴＴＴＡＴＴＡＴＴＴＴＴＴＡＡ［Ａ／Ｇ］ＴＴＴＡＡＡＡＧＡＡＡＴＡＡＴＡＡＧＣＡＡＡＣＣＧ（配列番号１） rs12942547
Chromosome number 17
Position on the chromosome 40527544
Nucleotide sequence GGCCCCTTTATTTTATTATTTTTTTAA [A / G] TTTAAAAGAAATAATAAGCAAACCG (SEQ ID NO: 1)

ｒｓ８８９４７２
染色体番号１６
染色体上の位置７９６４５９８９
塩基配列ＴＣＣＴＡＣＡＡＴＡＣＡＴＡＡＴＡＡＡＧＧＡＡＡＧ［Ｔ／Ｇ］ＡＧＴＴＧＴＧＴＧＴＧＣＡＡＡＣＴＴＴＧＣＡＧＣＣ（配列番号２） rs889472
Chromosome number 16
Position on the chromosome 76945989
Nucleotide sequence TCCTACAATACATAATAAAAGGAAAG [T / G] AGTTGTGTGTAGCAAACTTTGCAGCC (SEQ ID NO: 2)

ｒｓ１０８８３４３７
染色体番号１０
染色体上の位置１０１７９５３６１
塩基配列ＴＧＧＡＣＣＡＡＡＣＣＡＡＴＧＴＡＴＡＴＣＴＴＡＣ［Ｔ／Ａ］ＴＡＴＡＣＴＧＡＴＴＣＡＴＧＴＣＴＣＡＴＧＴＣＴＣ（配列番号３） rs10883437
Chromosome number 10
Position on the chromosome 1017956561
Nucleotide sequence TGGACCAAACCAATGTATATCTCC [T / A] TATTACTGATACCATGTCATCATCTTC (SEQ ID NO: 3)

ｒｓ３１３２６１３
染色体番号６
染色体上の位置３０５３７６０６
塩基配列ＴＡＧＧＡＴＣＡＴＣＴＣＣＣＣＴＧＣＴＧＧＡＡＡＣ［Ｃ／Ｇ］ＴＡＡＴＣＡＧＡＧＡＴＣＴＴＴＡＴＴＴＴＡＴＴＣＡ（配列番号４） rs3132613
Chromosome number 6
Position on the chromosome 30537606
Nucleotide sequence TAGGATCATTCCCCTCTGGAAAAC [C / G] TAATCAGAGATACTTTATTTTATTCA (SEQ ID NO: 4)

ｒｓ１６９９８０７３
染色体番号４
染色体上の位置８１１８４３４１
塩基配列ＣＣＴＧＧＣＣＴＣＡＧＴＴＴＡＡＡＣＣＣＡＧＧＧＧ［Ｔ／Ａ］ＡＴＧＴＴＧＴＡＡＡＴＡＴＴＧＡＧＡＣＧＧＴＣＴＣ（配列番号５） rs16998073
Chromosome number 4
Position on the chromosome 81184341
Nucleotide sequence CCTGGCCCTCAGTTAAAACCCAGGGG [T / A] ATGTGTAAATATTGAGACGGTTCC (SEQ ID NO: 5)

ｒｓ８８９１４０
染色体番号１９
染色体上の位置３３８８９０００
塩基配列ＴＡＧＧＧＧＣＣＡＡＧＧＣＣＴＧＧＧＧＧＧＴＧＴＴ［Ｔ／Ｃ］ＧＧＴＧＧＧＡＡＡＧＣＣＡＴＴＧＴＣＴＴＡＣＣＣＣ（配列番号６） rs889140
Chromosome number 19
Position on the chromosome 33889000
Nucleotide sequence TAGGGGCCAAAGGCCTGGGGGGGGTGTT [T / C] GGTGGGAAAAGCCATTGTCTTACCC (SEQ ID NO: 6)

ｒｓ４７２２４０４
染色体番号７
染色体上の位置３１２８７８９
塩基配列ＡＣＣＡＣＴＡＡＣＴＧＡＧＴＡＧＡＧＴＴＣＡＡＧＣ［Ｔ／Ｃ］ＧＧＧＧＣＡＡＧＴＣＡＣＴＴＧＡＣＣＴＣＡＡＴＴＴ（配列番号７） rs4722404
Chromosome number 7
Position on the chromosome 3128789
Nucleotide sequence ACCACTAACTGAGTAGATTCAAGC [T / C] GGGGGCAAGTCACTTGACCTCAATTT (SEQ ID NO: 7)

ｒｓ２３０７１２１
染色体番号５
染色体上の位置６４６２５５１２
塩基配列ＣＴＡＧＡＣＡＴＴＡＡＴＡＴＴＴＴＴＴＧＡＡＣＴＧ［Ｔ／Ｃ］ＡＧＴＣＡＧＴＧＣＴＧＧＡＡＡＡＴＧＴＣＴＡＧＡＣ（配列番号８） rs2307121
Chromosome number 5
Position on the chromosome 64625512
Nucleotide sequence CTAGACATTAATATTTTTTGAACTG [T / C] AGTCAGTGCTGGAAAATGTCTAGAC (SEQ ID NO: 8)

ｒｓ３７１９９８４１１
染色体番号６
染色体上の位置１３５４１８６３５
塩基配列ＡＡＴＴＣＡＣＴＣＴＧＧＡＣＡＧＣＡＧＡＴＧＴＴＡ［Ｔ／Ｃ］ＴＡＴＡＴＣＡＡＡＡＣＣＡＣＡＡＡＡＴＧＴＴＡＴＣ（配列番号９） rs371998411
Chromosome number 6
Position on the chromosome 1354118635
Nucleotide sequence AATTCACTCGGACAGCAGATGTTA [T / C] TATATCAAACCACAAAATGTTAC (SEQ ID NO: 9)

本実施形態の方法において、本ＳＮＰセットを構成する各ＳＮＰはｒｓ番号により特定される塩基配列を参照することによって特定可能であるが、本明細書において記載するｒｓ番号が他のｒｓ番号と併合され、新たなｒｓ番号が付与された場合には、本明細書において該当するｒｓ番号は、併合後のｒｓ番号及び併合される他のｒｓ番号をも意味する。また、本明細書において記載するｒｓ番号が複数のｒｓ番号の併合により付与された番号である場合には、本明細書において該当するｒｓ番号は、その他の元となるｒｓ番号をも意味する。 In the method of the present embodiment, each SNP constituting the SNP set can be specified by referring to the base sequence specified by the rs number, but the rs number described in the present specification is merged with another rs number. When a new rs number is assigned, the corresponding rs number in the present specification also means the rs number after the merger and another rs number to be merged. Further, when the rs number described in the present specification is a number assigned by merging a plurality of rs numbers, the corresponding rs number in the present specification also means another original rs number.

また、ＳＮＰに関する各ｒｓ番号で示される上記塩基配列は、特定の塩基配列として示しているが、人種の相違等によって、当該塩基配列において該当するＳＮＰ以外の部分における塩基配列は変更されてもよい。 Further, the above-mentioned base sequence indicated by each rs number related to SNP is shown as a specific base sequence, but even if the base sequence in the portion other than the corresponding SNP in the base sequence is changed due to differences in race or the like. Good.

本実施形態の方法は、いずれの人種の被検者に対しても用いることができるが、特に、アジア人に好適に用いることができる。アジア人の中でも日本人等の東アジア人の被検者により好適に用いることができる。また、本実施形態の方法は、いずれの性別の被検者に対しても用いてもよい。 The method of the present embodiment can be used for subjects of any race, but can be particularly preferably used for Asians. Among Asians, it can be preferably used by East Asian subjects such as Japanese. In addition, the method of the present embodiment may be used for a subject of any gender.

以下、本ＳＮＰセットの遺伝子型情報を分析することにより本疾患のリスクを判定する方法の一態様について説明する。但し、判定方法は、以下に限定されない。 Hereinafter, one aspect of a method for determining the risk of this disease by analyzing the genotype information of this SNP set will be described. However, the determination method is not limited to the following.

はじめに、対象者の試料を用いて、試料中の本ＳＮＰセットに含まれる各ＳＮＰの遺伝子型を特定する。ＳＮＰの検出に用いる試料としては、染色体ＤＮＡを含む試料であれば特に制限されない。このような試料としては、例えば、唾液、血液、尿等の体液サンプル；口腔粘膜などの細胞サンプル；毛髪等の体毛などが挙げられる。ＳＮＰの検出には、これらの試料から常法により単離した染色体ＤＮＡを直接使用してもよいし、単離した染色体ＤＮＡを増幅して、増幅後の染色体ＤＮＡを使用してもよい。 First, the genotype of each SNP contained in the present SNP set in the sample is identified by using the sample of the subject. The sample used for detecting SNP is not particularly limited as long as it is a sample containing chromosomal DNA. Examples of such a sample include body fluid samples such as saliva, blood, and urine; cell samples such as oral mucosa; and body hair such as hair. For the detection of SNP, the chromosomal DNA isolated by a conventional method from these samples may be directly used, or the isolated chromosomal DNA may be amplified and the amplified chromosomal DNA may be used.

ＳＮＰの検出は、通常の遺伝子多型解析方法によって行うことができる。例えば、ＤＮＡチップ法（ＤＮＡマイクロアレイ）、サンガー法を用いた従来型のシーケンサーや次世代シーケンサー（ＮＧＳ；ＮｅｘｔＧｅｎｅｒａｔｉｏｎＳｅｑｕｅｎｃｅｒ）などを用いたシーケンス解析、ＰＣＲ（ＰｏｌｙｍｅｒａｓｅＣｈａｉｎＲｅａｃｔｉｏｎ）、ハイブリダイゼーション、インベーダー法などが挙げられるが、これらに限定されない。 Detection of SNPs can be performed by conventional gene polymorphism analysis methods. For example, sequence analysis using a DNA chip method (DNA microarray), a conventional sequencer using a Sanger method, a next-generation sequencer (NGS; Next Generation Sequencer), PCR (Polymerase Chain Reaction), hybridization, an invader method, etc. However, it is not limited to these.

ＤＮＡチップ法では、ＳＮＰ部位を含む多数のＤＮＡ断片（プローブ）を基板上に配置したＤＮＡチップを用い、染色体ＤＮＡをチップ上のプローブとハイブリダイズさせて、結合部位を蛍光又は電流により検出することにより、染色体ＤＮＡの配列を解析する。ＳＮＰ解析に用いられるＤＮＡチップとしては、ＳＮＰ部位を含む塩基配列を検出可能なオリゴヌクレオチドプローブが配置されたチップが挙げられる。 In the DNA chip method, a DNA chip in which a large number of DNA fragments (probes) including an SNP site are arranged on a substrate is used, chromosomal DNA is hybridized with a probe on the chip, and the binding site is detected by fluorescence or current. The sequence of chromosomal DNA is analyzed by. Examples of the DNA chip used for SNP analysis include a chip on which an oligonucleotide probe capable of detecting a base sequence containing an SNP site is arranged.

また、シーケンス解析は通常のサンガー法により行うことができる。例えば、多型を示す塩基の５'側の数十塩基の位置に設定したプライマーを使用してシーケンス反応を行い、その解析結果から、該当する位置がどの種類の塩基であるかを決定することができる。なお、シーケンス反応の前に、あらかじめＳＮＰ部位を含む断片をＰＣＲなどによって増幅しておくことが好ましい。効率の観点からはＮＧＳ技術を使用してもよい。 In addition, sequence analysis can be performed by a normal Sanger method. For example, a sequence reaction is performed using a primer set at a position of several tens of bases on the 5'side of a base showing a polymorphism, and the analysis result is used to determine what kind of base the corresponding position is. Can be done. Prior to the sequence reaction, it is preferable to amplify the fragment containing the SNP site in advance by PCR or the like. From the viewpoint of efficiency, NGS technology may be used.

また、ＳＮＰの検出は、例えば従来のＰＣＲによる増幅の有無を調べることによって行うことができる。例えば、多型を示す塩基を含む領域に対応する配列を有し、かつ、３'末端が各多型に対応するプライマーをそれぞれ用意する。それぞれのプライマーを使用してＰＣＲを行い、増幅産物の有無によってどのタイプの多型であるかを決定することができる。また、ＬＡＭＰ法（Ｌｏｏｐ−ＭｅｄｉａｔｅｄＩｓｏｔｈｅｒｍａｌＡｍｐｌｉｆｉｃａｔｉｏｎ；特許第３３１３３５８号明細書）、ＮＡＳＢＡ法（ＮｕｃｌｅｉｃＡｃｉｄＳｅｑｕｅｎｃｅ−ＢａｓｅｄＡｍｐｌｉｆｉｃａｔｉｏｎ；特許２８４３５８６号明細書）、ＩＣＡＮ法（ＩｓｏｔｈｅｒｍａｌａｎｄＣｈｉｍｅｒｉｃｐｒｉｍｅｒ−ｉｎｉｔｉａｔｅｄＡｍｐｌｉｆｉｃａｔｉｏｎｏｆＮｕｃｌｅｉｃａｃｉｄｓ；特許第３４３３９２９号公報）などによって増幅の有無を調べることもできる。その他、単鎖増幅法やＮＧＳを用いた解析法を用いてもよい。 Further, the detection of SNP can be performed, for example, by examining the presence or absence of amplification by conventional PCR. For example, a primer having a sequence corresponding to a region containing a base showing a polymorphism and having a 3'end corresponding to each polymorphism is prepared. PCR can be performed using each primer to determine which type of polymorphism is present depending on the presence or absence of amplification products. In addition, the LAMP method (Loop-Mediated Isothermal Amplification; Patent No. 33133358), the NASBA method (Nucleic Acid Sequence-Based Amplification; Patent No. 2843586; Patent No. 2843586), and the ICAN method (ISAN). It is also possible to check the presence or absence of amplification by referring to Japanese Patent No. 343929). In addition, a single chain amplification method or an analysis method using NGS may be used.

また、ＳＮＰ部位を含むＤＮＡ断片を増幅し、増幅産物の電気泳動における移動度の違いによってどのタイプの多型であるかを決定することもできる。このような方法としては、例えば、ＰＣＲ−ＳＳＣＰ（ｓｉｎｇｌｅ−ｓｔｒａｎｄｃｏｎｆｏｒｍａｔｉｏｎｐｏｌｙｍｏｒｐｈｉｓｍ）法（Ｇｅｎｏｍｉｃｓ．１９９２Ｊａｎ１；１２（１）：１３９−１４６．）などが挙げられる。具体的には、まず、目的のＳＮＰを含むＤＮＡを増幅し、増幅したＤＮＡを一本鎖ＤＮＡに解離させる。次いで、解離させた一本鎖ＤＮＡを非変性ゲル上で分離し、分離した一本鎖ＤＮＡのゲル上での移動度の違いによってどのタイプの多型であるかを決定することができる。 It is also possible to amplify a DNA fragment containing an SNP site and determine which type of polymorphism it is based on the difference in mobility of the amplification product in electrophoresis. Examples of such a method include the PCR-SCSP (single-strand conformation polymorphism) method (Genomics. 1992 Jan 1; 12 (1): 139-146.). Specifically, first, the DNA containing the target SNP is amplified, and the amplified DNA is dissociated into a single-stranded DNA. The dissociated single-stranded DNA can then be separated on a non-denatured gel and the type of polymorphism can be determined by the difference in mobility of the separated single-stranded DNA on the gel.

さらに、多型を示す塩基が制限酵素認識配列に含まれる場合は、制限酵素による切断の有無によって解析することもできる（ＲＦＬＰ（ＲｅｓｔｒｉｃｔｉｏｎＦｒａｇｍｅｎｔＬｅｎｇｔｈＰｏｌｙｍｏｒｐｈｉｓｍ）法）。この場合、まず、ＤＮＡ試料を制限酵素により切断する。次いで、ＤＮＡ断片を分離し、検出されたＤＮＡ断片の大きさによってどのタイプの多型であるかを決定することができる。 Furthermore, when a base showing a polymorphism is included in the restriction enzyme recognition sequence, it can be analyzed by the presence or absence of cleavage by the restriction enzyme (RFLP (Restriction Fragment Length Polymorphism) method). In this case, first, the DNA sample is cleaved with a restriction enzyme. The DNA fragment can then be separated and the size of the detected DNA fragment can be used to determine what type of polymorphism it is.

また、ハイブリダイゼーションの有無を調べることによって多型の種類を解析することも可能である。すなわち、各塩基に対応するプローブを用意し、いずれのプローブにハイブリダイズするかを調べることによってＳＮＰがいずれの塩基であるかを調べることもできる。 It is also possible to analyze the types of polymorphisms by examining the presence or absence of hybridization. That is, it is also possible to check which base the SNP is by preparing a probe corresponding to each base and checking which probe it hybridizes to.

このようにして、本ＳＮＰセットの各ＳＮＰに関して、対象者の遺伝子型データを決定することができる。なおここで、「対象者の遺伝子型データ」とは、対象者の有する遺伝子型情報をいう。 In this way, the genotype data of the subject can be determined for each SNP of this SNP set. Here, the "genotype data of the subject" means the genotype information possessed by the subject.

次いで、本ＳＮＰセットの遺伝子型情報に基づいて、本疾患のリスクを判定する。リスクの判定には、任意のモデルを用いることができる。モデルとしては、特に制限されないが、例えば、本ＳＮＰセットの遺伝子型情報を用いて、対象者の遺伝子型データから算出される特徴量を入力とし、本疾患のリスクを出力とするロジスティック回帰モデルを用いることができる。当該ロジスティック回帰モデルは、予め、本疾患に罹患したヒトの遺伝子型データと、本疾患に罹患していないヒトの遺伝子型データと、を学習データとして用いてパラメータを機械学習している。 The risk of the disease is then determined based on the genotype information of the SNP set. Any model can be used to determine the risk. The model is not particularly limited, but for example, a logistic regression model that uses the genotype information of this SNP set, inputs a feature amount calculated from the genotype data of the subject, and outputs the risk of this disease. Can be used. In the logistic regression model, parameters are machine-learned in advance using genotype data of humans suffering from this disease and genotype data of humans not suffering from this disease as learning data.

また、疾患のリスクを判定するモデルとしては、ロジスティック回帰モデルに代えて、多層パーセプトロン、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）及びＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）などのニューラルネットワーク、ガウシアンカーネル等の任意のカーネル関数を用いるサポートベクターマシーン、回帰木としてモデル化したランダムフォレスト、重回帰分析、隠れマルコフモデルなどを利用したモデル、統計モデルや確率モデルなど種々の他のモデルを採用することもできる。また、種々のモデルを組み合わせて総合的な判定を行うモデルを採用することもできる。 In addition, as a model for determining the risk of a disease, instead of the logistic regression model, an arbitrary kernel function such as a multi-layer perceptron, a neural network such as CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network), or a Gaussian kernel is used. Various other models such as support vector machines, random forests modeled as recurrent trees, multiple regression analysis, models using hidden Markov models, statistical models and probabilistic models can also be adopted. It is also possible to adopt a model that makes a comprehensive judgment by combining various models.

次いで、モデルを用いた本疾患のリスク判定の一例について説明する。まず、本疾患のリスクの判定をする対象者の遺伝子型データを、モデルに入力可能な特徴量に変換する。本実施形態の方法における特徴量は、例えば、本ＳＮＰセットの各ＳＮＰについて、対象者の遺伝子型データがホモ接合型（ＡＡ）、ホモ接合型（ＢＢ）、又はヘテロ接合型（ＡＢ）のいずれであるかを示すパラメータである。遺伝子型は、相同染色体のＳＮＰが共にＧ（グアニン）であることを示す“ＧＧ”や、一方がＧ（グアニン）で、他方がＡ（アデニン）であることを示す“ＡＧ”などヌクレオチドにより表記されることが一般的であるため、対象者の遺伝子型データを、本ＳＮＰセットの遺伝子型情報を用いるモデルに入力可能なパラメータに変換する。しかし、モデルが、このようなパラメータへの変換の必要がないものである場合には、上記変換は必要とされない。 Next, an example of risk determination of this disease using a model will be described. First, the genotype data of the subject who determines the risk of this disease is converted into a feature amount that can be input to the model. The feature amount in the method of the present embodiment is, for example, whether the genotype data of the subject is homozygous (AA), homozygous (BB), or heterozygous (AB) for each SNP of the SNP set. It is a parameter indicating whether or not. The genotype is represented by nucleotides such as "GG" indicating that both SNPs of homologous chromosomes are G (guanine) and "AG" indicating that one is G (guanine) and the other is A (adenine). Therefore, the genotype data of the subject is converted into parameters that can be input to the model using the genotype information of this SNP set. However, if the model does not need to be converted to such parameters, then the above conversion is not required.

対象者の遺伝子型データの特徴量への変換は、例えば、本ＳＮＰセットに含まれるＳＮＰ１つ１つに関して、対象者の遺伝子型データに値を付すことにより行うことができる。例えば、各ＳＮＰについて、対象者の遺伝子型データがホモ接合型（ＡＡ）、ホモ接合型（ＢＢ）、又はヘテロ接合型（ＡＢ）のいずれに該当するのかに応じて、そのＳＮＰに値（例えば、０又は１）を対応づける。これにより、対象者の遺伝子型データを特徴量に変換することができる。なお、以下では、各ＳＮＰに対応させる値を０又は１とした場合を例に説明するが、ＳＮＰに対応させる値は０又は１の２つの値に限られるものではない。 The conversion of the genotype data of the subject into a feature amount can be performed, for example, by adding a value to the genotype data of the subject with respect to each SNP included in this SNP set. For example, for each SNP, the value (eg, for example) of the SNP depends on whether the subject's genotype data corresponds to homozygous (AA), homozygous (BB), or heterozygotes (AB). , 0 or 1). This makes it possible to convert the genotype data of the subject into a feature amount. In the following, the case where the value corresponding to each SNP is 0 or 1 will be described as an example, but the value corresponding to the SNP is not limited to two values of 0 or 1.

接合型に対応づける値はＳＮＰごとに決めることができる。例えば、あるＳＮＰは、対象者の遺伝子型データがホモ接合型（ＡＡ）である場合に値１を対応付け、ホモ接合型（ＢＢ）及びヘテロ接合型（ＡＢ）である場合に値０を対応付けるようにし、他のＳＮＰは、対象者の遺伝子型データがヘテロ接合型（ＡＢ）である場合に値１を対応付け、ホモ接合型（ＡＡ）及びホモ接合型（ＢＢ）である場合に値０を対応付けるようにしてもよい。そのほか、対象者の遺伝子型データがヘテロ接合型（ＡＢ）及びホモ接合型（ＢＢ）である場合に値１を対応付け、ホモ接合型（ＡＡ）である場合に値０を対応付けるようにしてもよい。 The value associated with the junction type can be determined for each SNP. For example, one SNP associates a value of 1 when the subject's genotype data is homozygous (AA) and associates a value of 0 when it is homozygous (BB) and heterozygous (AB). Thus, the other SNPs associate a value of 1 when the subject's genotype data is heterozygous (AB) and a value of 0 when they are homozygous (AA) and homozygous (BB). May be associated with each other. In addition, if the genotype data of the subject is heterozygous (AB) and homozygous (BB), the value 1 is associated, and if the genotype data is homozygous (AA), the value 0 is associated. Good.

上記のように、対象者の遺伝子型データを特徴量に変換することができる。この特徴量への変換において対応付けに使用する値は、任意に決定することができる。例えば、上記非特許文献に基づいて、もともと各ＳＮＰが関連する疾患に関係の高い遺伝子型に対して値１を対応付けるようにし、かつ、各ＳＮＰが関連する疾患に関係の低い遺伝子型に対して値０を対応付けるようにすることができる。 As described above, the genotype data of the subject can be converted into a feature amount. The value used for the association in the conversion to the feature amount can be arbitrarily determined. For example, based on the above non-patent document, a value 1 is originally associated with a genotype that is highly related to a disease associated with each SNP, and a genotype that is not related to a disease associated with each SNP is assigned. The value 0 can be associated.

このような、ＳＮＰごとの接合型とその接合型に対応付ける値の関係は、図１のような本ＳＮＰセットの遺伝子型情報をもとに、例えば、図２のような変換テーブルとして表すこともできる。図２の変換テーブルでは、網掛けをした遺伝子型と一致する場合にはそのＳＮＰに対応付ける値を１とし、一致しない場合には対応付ける値を０とする。なお、図１及び２の具体的な遺伝子型の表記において、Ａはアデニン、Ｇはグアニン、Ｃはシトシン、Ｔはチミンを示す。但し、特徴量の変換テーブルの形式は、図２に限定されるものではない。 The relationship between the junction type for each SNP and the value associated with the junction type can be expressed as, for example, a conversion table as shown in FIG. 2 based on the genotype information of the present SNP set as shown in FIG. it can. In the conversion table of FIG. 2, if it matches the shaded genotype, the value associated with the SNP is set to 1, and if it does not match, the value associated with it is set to 0. In the notation of specific genotypes in FIGS. 1 and 2, A indicates adenine, G indicates guanine, C indicates cytosine, and T indicates thymine. However, the format of the feature amount conversion table is not limited to FIG.

最後に、本ＳＮＰセットの遺伝子型情報に基づいて、対象者の本疾患のリスクを判定する。より具体的には、本ＳＮＰセットの遺伝子型情報に基づく変換テーブルを用いて、対象者の遺伝子型データを、モデルに入力可能に変換した特徴量として算出し、当該特徴量を所定の判定モデルに入力し、対象者の本疾患のリスクを判定することができる。 Finally, the risk of the disease in the subject is determined based on the genotype information of the SNP set. More specifically, using the conversion table based on the genotype information of this SNP set, the genotype data of the subject is calculated as a feature amount converted so that it can be input to the model, and the feature amount is calculated as a predetermined determination model. You can enter in to determine the subject's risk of this disease.

判定モデルにおいて、特徴量には、本ＳＮＰセットのＳＮＰごとに、本疾患のリスクと正の相関があることを表す重みづけや、本疾患のリスクと負の相関があることを表す重みづけをすることができる。例えば、rs889472、rs371998411、rs12942547、及びrs16998073に対応付けた値（特徴量）ついては、本疾患のリスクと正の相関があることを表す重みづけを行い、rs2307121、rs3132613、rs4722404、及びrs889140に対応付けた値（特徴量）ついては、本疾患のリスクと負の相関があることを表す重みづけを行い、rs10883437に対応付けた値（特徴量）ついては、本疾患のリスクと正の相関又は負の相関があることを表す重みづけを行うことができる。 In the judgment model, the feature amount is weighted to indicate that there is a positive correlation with the risk of this disease or a weight indicating that there is a negative correlation with the risk of this disease for each SNP of this SNP set. can do. For example, the values (features) associated with rs889472, rs371998411, rs12942547, and rs16998073 are weighted to indicate that they have a positive correlation with the risk of this disease, and are associated with rs2307121, rs3132613, rs4722404, and rs889140. The value (feature amount) is weighted to indicate that there is a negative correlation with the risk of this disease, and the value (feature amount) associated with rs10883437 is positively or negatively correlated with the risk of this disease. Weighting can be performed to indicate that there is.

例えば、特徴量に対して重みづけを行う場合には、rs889472の遺伝子型がGG、rs371998411の遺伝子型がTC、rs12942547の遺伝子型がAG、rs16998073の遺伝子型がAA、及びrs10883437の遺伝子型がTTである場合に、本疾患のリスクと正の相関があることを表す重みづけを行い、rs2307121の遺伝子型がCC、rs3132613の遺伝子型がCC、rs4722404の遺伝子型がTC、rs889140の遺伝子型がTC、及びrs10883437の遺伝子型がTAである場合に、本疾患のリスクと負の相関があることを表す重みづけを行うことができる。また、特徴量として値０を対応付けた各ＳＮＰの遺伝子型の場合には、本疾患のリスクと相関がない或いは無視しうる程度に低いものと評価することができる。 For example, when weighting features, the genotype of rs889472 is GG, the genotype of rs371998411 is TC, the genotype of rs12942547 is AG, the genotype of rs16998073 is AA, and the genotype of rs10883437 is TT. If, weighting is performed to indicate that there is a positive correlation with the risk of this disease, genotype rs2307121 is CC, genotype rs3132613 is CC, genotype rs4722404 is TC, and genotype rs889140 is TC. , And when the genotype of rs10883437 is TA, weighting can be performed to indicate that there is a negative correlation with the risk of this disease. In addition, in the case of the genotype of each SNP associated with the value 0 as the feature amount, it can be evaluated that there is no correlation with the risk of this disease or it is low enough to be ignored.

このような、本疾患のリスクとの相関を表す重みづけは、本疾患に罹患したヒトの遺伝子型データと、本疾患に罹患していないヒトの遺伝子型データと、を学習データとして用いてパラメータを機械学習することにより特定される。この際、あるモデルにおいて、あるＳＮＰが本疾患のリスクと正の相関があることを表す重み付けがなされるとした場合、他のモデルにおいても同様にそのＳＮＰは本疾患のリスクと正の相関があることを表す重み付けがなされることが通常である。すなわち、モデルの種類等によって、あるＳＮＰにおいて本疾患のリスクとの相関関係が逆になるような事態は想定し難い。なお、重みづけの具体的な値はモデルによって異なり、特に制限されるものではない。 Such weighting representing the correlation with the risk of this disease is a parameter using genotype data of humans suffering from this disease and genotype data of humans not suffering from this disease as learning data. Is identified by machine learning. At this time, if a certain model is weighted to indicate that a certain SNP has a positive correlation with the risk of this disease, the SNP has a positive correlation with the risk of this disease in other models as well. It is usual that a weight is given to indicate that there is. That is, it is difficult to imagine a situation in which the correlation with the risk of this disease is reversed in a certain SNP depending on the type of model or the like. The specific value of the weighting differs depending on the model and is not particularly limited.

ここで、本ＳＮＰセットのなかで本疾患のリスクと正の相関があることを表す重みづけを行うＳＮＰのまとまりを「正相関ＳＮＰセット」といい、本疾患のリスクと負の相関があることを表す重みづけを行うＳＮＰのまとまりを「負相関ＳＮＰセット」という。本ＳＮＰセットは、正相関ＳＮＰセットと負相関ＳＮＰセットとを含むものであり、このような本ＳＮＰセットの遺伝子型情報に基づくことにより、対象者の本疾患のリスクを、リスクが上昇する要因とリスクが低下する要因の両面を総合して判定することができる。 Here, a group of SNPs that are weighted to indicate that there is a positive correlation with the risk of this disease in this SNP set is called a "positive correlation SNP set" and has a negative correlation with the risk of this disease. A group of SNPs that are weighted to represent is called a "negatively correlated SNP set". This SNP set includes a positively correlated SNP set and a negatively correlated SNP set, and based on such genotype information of this SNP set, the risk of this disease of the subject is increased. It is possible to comprehensively judge both the factors that reduce the risk and the factors that reduce the risk.

上記のようにして得られる判定結果は、本疾患の専門医が本疾患を診断する際の補助としても用いられる。また、上記のようにして判定した本疾患のリスクと、対象者からのアンケート結果とに基づいて、本疾患のリスクの判定結果は補正されしてもよい。また、本疾患のリスクと、対象者からのアンケート結果とに基づいて、対象者に対して、生活改善に関するアドバイスを出力してもよい。 The judgment result obtained as described above is also used as an aid when a specialist of this disease diagnoses this disease. In addition, the determination result of the risk of the present disease may be corrected based on the risk of the present disease determined as described above and the result of the questionnaire from the subject. In addition, advice on improving life may be output to the subject based on the risk of this disease and the results of a questionnaire from the subject.

本発明は、プライマーやプローブなどの検査試薬を提供することもできる。このようなプローブとしては、上記ＳＮＰ部位を含み、ハイブリダイズの有無によってＳＮＰ部位の塩基の種類を判定できるプローブが挙げられる。また、プライマーとしては、上記ＳＮＰ部位を増幅するためのＰＣＲに用いることのできるプライマー、又は上記ＳＮＰ部位をシーケンス解析するために用いることのできるプライマーが挙げられる。本実施形態の検査試薬はこれらのプライマーやプローブに加えて、ＰＣＲ用のポリメラーゼやバッファー、ハイブリダイゼーション用試薬などを含むものであってもよい。 The present invention can also provide test reagents such as primers and probes. Examples of such a probe include a probe that includes the SNP site and can determine the type of base of the SNP site depending on the presence or absence of hybridization. In addition, examples of the primer include a primer that can be used for PCR for amplifying the SNP site, and a primer that can be used for sequence analysis of the SNP site. In addition to these primers and probes, the test reagent of the present embodiment may include a polymerase or buffer for PCR, a reagent for hybridization, and the like.

以下、本実施形態を実施例によりさらに具体的に説明する。但し、本実施形態はこれらの実施例に限定されない。 Hereinafter, this embodiment will be described in more detail with reference to Examples. However, this embodiment is not limited to these examples.

本ＳＮＰセットと本疾患との関連性を、以下のように検証した。 The relationship between this SNP set and this disease was verified as follows.

遺伝子解析サービスの利用者７万３千人以上から、利用者の同意のもと、唾液試料と、各種疾患の罹患情報を収集した。罹患情報とは、例えば、本疾患に罹患している場合に１、罹患していない場合に０となる数値である。唾液試料から、利用者ごとの遺伝子型データを特定し、利用者の遺伝子型データと各種罹患情報とを対応付けたデータベースを構築した。このデータベースの中から、本疾患に罹患している被検者１８７名と、罹患していないコントロール１８７名との症例対照セットを構築した。 With the consent of more than 73,000 users of the gene analysis service, saliva samples and information on the morbidity of various diseases were collected. The morbidity information is, for example, a numerical value that becomes 1 when suffering from this disease and 0 when not suffering from this disease. The genotype data for each user was identified from the saliva sample, and a database was constructed in which the genotype data of the user was associated with various morbidity information. From this database, a case-control set of 187 subjects suffering from this disease and 187 controls not suffering from the disease was constructed.

次いで、被検者及びコントロールの本ＳＮＰセットの各ＳＮＰの遺伝子型を、２つのホモ接合型（ＡＡ，ＢＢ）と、ヘテロ接合型（ＡＢ）に分類した。そして、遺伝子型が図２に示す網掛けをした変換テーブルの遺伝子型と一致する場合には、ｘ_iの値を１とし、一致しない場合には０として、ｘ₁〜ｘ_Nを以下の数式（１）で表されるロジスティック回帰モデルの説明変数とした。例えば、ｒｓ１２９４２５４７の場合は、遺伝子型が“ＡＧ”であるときにｘ₁の値を１とし、遺伝子型が“ＡＡ”又は“ＧＧ”であるときにはｘ₁の値を０とした。なお、本実施例ではＮ＝１０である。また、以下の数式で表されるロジスティック回帰モデルの目的変数は、本疾患に罹患している確率を表す０から１の間の値ｐ（罹患情報）とした。
α＝０．１
Next, the genotypes of each SNP in this SNP set of subjects and controls were classified into two homozygous types (AA, BB) and heterozygous types (AB). Then, when the genotype matches the genotype of the shaded conversion table shown in FIG. 2, the value of x _i is set to 1, and when the genotype does not match, it is set to 0, and x _{1 to} x _N are set to the following mathematical formulas. It was used as an explanatory variable of the logistic regression model represented by (1). For example, in the case of rs12942547, the value of x ₁ was set to 1 when the genotype was "AG", and the value of x ₁ was set to 0 when the genotype was "AA" or "GG". In this embodiment, N = 10. In addition, the objective variable of the logistic regression model represented by the following mathematical formula was a value p (morbidity information) between 0 and 1 representing the probability of suffering from this disease.
α = 0.1

１．ＡＵＣによるモデルの検証
本ＳＮＰセットを用いた判定方法の精度について説明する。上記データベースから、テスト用に、利用者の遺伝子型情報と罹患情報とを対応付けたデータセットを作成した。データセットにおける各利用者の本ＳＮＰセットの各ＳＮＰの遺伝子型を、ホモ接合型（ＡＡ，ＢＢ）と、ヘテロ接合型（ＡＢ）に分類し、分類した各遺伝子型が図２に示す網掛けをした遺伝子型と一致する場合には、ｘ_iの値を１と評価し、一致しない場合には０と評価して、ｘ₁〜ｘ_Nを特徴量として算出した。 1. 1. Model verification by AUC The accuracy of the judgment method using this SNP set will be described. From the above database, a data set in which the genotype information of the user and the morbidity information were associated was created for the test. The genotypes of each SNP in this SNP set of each user in the data set are classified into homozygous types (AA, BB) and heterozygotes (AB), and the classified genotypes are shaded as shown in FIG. When the genotype was matched, the value of x _i was evaluated as 1, and when they did not match, the value was evaluated as 0, and x ₁ to x _N were calculated as feature quantities.

利用者毎の本ＳＮＰセットに関する特徴量を上記ロジスティック回帰モデル（以下、「判定モデル」ともいう。）に入力し、各利用者が本疾患に罹患しているか否かを予測し、その偽陽性率と真陽性率を算出し、ＲＯＣ（ＲｅｃｅｉｖｅｒＯｐｅｒａｔｉｎｇＣｈａｒａｃｔｅｒｉｓｔｉｃ）曲線とＡＵＣ（ＡｒｅａＵｎｄｅｒｔｈｅＣｕｒｖｅ）をそれぞれ求めた。より具体的には、判定モデルについて５分割交差検証を行い、５つのＲＯＣ曲線（ＲＯＣｆｏｌｄ１からＲＯＣｆｏｌｄ５）を求めて、その平均（ＭｅａｎＲＯＣ）と標準偏差（±１ｓｔｄ．ｄｅｖ．）を求めた。図３中の破線（Ｌｕｃｋ）は、本疾患に罹患しているか否かをランダムに出力する場合であり、予測能力が無いモデルのＲＯＣ曲線に対応している。 The feature amount related to this SNP set for each user is input to the above logistic regression model (hereinafter, also referred to as "judgment model"), it is predicted whether or not each user has this disease, and the false positive rate is predicted. The rate and the true positive rate were calculated, and the ROC (Receiver Operating Characteristic) curve and the AUC (Area Under the Curve) were obtained, respectively. More specifically, the judgment model is subjected to 5-fold cross-validation, 5 ROC curves (ROC fold 1 to ROC fold 5) are obtained, and the average (Mean ROC) and standard deviation (± 1st d. Dev.) Are obtained. I asked. The broken line (Luck) in FIG. 3 is a case where whether or not the patient is suffering from this disease is randomly output, and corresponds to the ROC curve of the model having no predictive ability.

また、同様にして、本ＳＮＰセットから１つのＳＮＰを除いた各比較ＳＮＰセットに対して、上記と同様にそれぞれロジスティック回帰モデル（以下、「比較判定モデル」ともいう。）を作成した。そして、各比較ＳＮＰ関する特徴量を各比較判定モデルに入力し、各利用者が本疾患に罹患しているか否かを予測し、偽陽性率と真陽性率を算出し、ＲＯＣ曲線とＡＵＣをそれぞれ求めた。その結果を図４以降に示す。 Further, in the same manner, a logistic regression model (hereinafter, also referred to as “comparison determination model”) was created for each comparative SNP set excluding one SNP from the present SNP set in the same manner as described above. Then, the feature amount related to each comparative SNP is input to each comparative judgment model, it is predicted whether or not each user has this disease, the false positive rate and the true positive rate are calculated, and the ROC curve and AUC are obtained. I asked for each. The results are shown in FIGS. 4 and later.

本ＳＮＰセットを用いて本疾患を判定した場合、ＡＵＣは０．８０±０．０６であり、ランダムな出力の場合（ＡＵＣ＝０．５）と比べて有意に高く、本ＳＮＰセットを用いる判定モデルの予測能力が高いことが確認できる。 When this disease was judged using this SNP set, the AUC was 0.80 ± 0.06, which was significantly higher than that in the case of random output (AUC = 0.5), and the judgment using this SNP set. It can be confirmed that the predictive power of the model is high.

一方、各比較ＳＮＰセットを用いる比較判定モデルの場合、ＡＵＣは本ＳＮＰセットを用いる場合よりも低い。従って、各比較ＳＮＰセットを用いる比較判定モデルのＡＵＣは、ランダムな出力の場合（ＡＵＣ＝０．５）よりも高いものの、本ＳＮＰセットを用いる判定モデルのＡＵＣ（０．８０±０．０６）よりも総じて低いことが確認できる。 On the other hand, in the case of the comparison determination model using each comparison SNP set, the AUC is lower than that in the case of using the present SNP set. Therefore, although the AUC of the comparison judgment model using each comparison SNP set is higher than that in the case of random output (AUC = 0.5), the AUC of the judgment model using this SNP set (0.80 ± 0.06). It can be confirmed that it is generally lower than.

よって、本ＳＮＰセットに含まれるＳＮＰ全てを用いて判定することで、本ＳＮＰセットから１つのＳＮＰを除いた各比較ＳＮＰセットを用いる場合よりも、本疾患に罹患しているか否かを高精度で予測できることが分かった。 Therefore, by making a judgment using all the SNPs included in this SNP set, it is possible to determine whether or not the patient is suffering from this disease with higher accuracy than when using each comparative SNP set excluding one SNP from this SNP set. It turned out that it can be predicted with.

２．ウィルコクソンの順位和検定による検証
本ＳＮＰセットを用いる判定モデルが、各比較ＳＮＰセットを用いる比較判定モデルよりも有意に優れたモデルであることを確かめるために、ノンパラメトリック検定の一種であるウィルコクソンの順位和検定を行った。具体的には、本ＳＮＰセットを用いる判定モデルのＡＵＣと、各比較ＳＮＰセットを用いる比較判定モデルのＡＵＣとに差が無いという帰無仮説を設定し、有意水準を０．０１としてウィルコクソンの順位和検定を行った。 2. Wilcoxon rank sum test verification Wilcoxon rank, which is a type of nonparametric test, to confirm that the judgment model using this SNP set is significantly superior to the comparative judgment model using each comparative SNP set. The sum test was performed. Specifically, the null hypothesis that there is no difference between the AUC of the judgment model using this SNP set and the AUC of the comparison judgment model using each comparative SNP set is set, and the Wilcoxon rank is set with the significance level as 0.01. The sum test was performed.

その結果、ｐ値は、ｒｓ３１３２６１３（ＣＣ）を除外した比較ＳＮＰセットにおいては４．２０×１０^-18であり、ｒｓ１６９９８０７３（ＡＡ）を除外した比較ＳＮＰセットにおいては４．０８×１０^-18であり、ｒｓ１２９４２５４７（ＡＧ）を除外した比較ＳＮＰセットにおいては４．０８×１０^-18であり、その他の比較ＳＮＰセットにおいてはいずれも３．９６×１０^-18であり、最大でも４．２０×１０^-18であることから、帰無仮説が棄却されることが確認された。すなわち、本ＳＮＰセットを用いる判定モデルのＡＵＣと、各比較ＳＮＰセットを用いる比較判定モデルのＡＵＣとは、統計的に有意な差があり、本ＳＮＰセットを用いる判定モデルは、各比較ＳＮＰセットを用いる比較判定モデルよりも優れたモデルであるといえる。 As a result, the p-value is 4.20 × 10 ^-18 in the comparative SNP set excluding rs3132613 (CC) and 4.08 × 10 ^-18 in the comparative SNP set excluding rs16998073 (AA). , rs12942547 (AG) was 4.08 × 10 ^-18 in Comparative SNP set excluding the other was 3.96 × 10 ^-18 none in Comparative SNP set, 4.20 × at most 10 ^{- Since} it is ¹⁸ , it was confirmed that the null hypothesis was rejected. That is, there is a statistically significant difference between the AUC of the judgment model using this SNP set and the AUC of the comparison judgment model using each comparison SNP set, and the judgment model using this SNP set uses each comparison SNP set. It can be said that the model is superior to the comparison judgment model used.

上記のとおり、本実施形態の方法は、本疾患に罹患しているか否かを予測する精度が、ランダムな予測の場合の精度よりも有意に高いという効果を有する。また、本実施形態の方法は、本ＳＮＰセットの遺伝子型情報に基づく本疾患の判定の結果と、比較ＳＮＰセットの遺伝子型情報に基づく本疾患の判定の結果との間に、有意な差があるという効果を有する。当該効果は、本ＳＮＰセットの遺伝子型情報と本疾患の間に、これまで見出されていなかった潜在的な相関性があることに基づくものであると考えられる。上記で例示したロジスティック回帰モデルやその他のモデルは、本ＳＮＰセットの遺伝子型情報を前提として、本疾患に罹患したヒトと本疾患に罹患していないヒトの遺伝子型に関するデータと罹患情報を学習データとして用いてパラメータを機械学習することなどにより得られるものである。すなわち、いずれのモデルも上記潜在的な相関性を表す一つの表現型に過ぎず、本実施形態の方法の実施において使用されるモデルの種類は特に限定されるものではない。 As described above, the method of the present embodiment has the effect that the accuracy of predicting whether or not the patient has the disease is significantly higher than the accuracy of the random prediction. In addition, the method of the present embodiment has a significant difference between the result of determination of this disease based on the genotype information of this SNP set and the result of determination of this disease based on the genotype information of the comparative SNP set. Has the effect of being. The effect is believed to be based on a potential correlation previously unseen between the genotype information of the SNP set and the disease. The logistic regression model and other models exemplified above are based on the genotype information of this SNP set, and learn data and disease information on the genotypes of humans suffering from this disease and those not suffering from this disease. It is obtained by machine learning the parameters by using as. That is, each model is only one phenotype representing the above-mentioned potential correlation, and the type of model used in the implementation of the method of the present embodiment is not particularly limited.

本発明の方法は、医療やヘルスケアに関連する分野において、本疾患のリスクを判定し、その予防および／または治療に貢献するものである。 The method of the present invention determines the risk of the disease and contributes to its prevention and / or treatment in the fields related to medical treatment and healthcare.

Claims

A method for determining the risk of attention deficit hyperactivity syndrome based on the genotype information of a single nucleotide polymorphism set containing at least rs12942547, rs889472, rs10883437, rs3132613, rs16998073, rs889140, rs4722404, rs2307121, and rs371998411.