JP6929015B2

JP6929015B2 - Biomarker search device, biomarker search method and program

Info

Publication number: JP6929015B2
Application number: JP2016029120A
Authority: JP
Inventors: 滋真矢; 貴史小磯; 研植野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2016-02-18
Filing date: 2016-02-18
Publication date: 2021-09-01
Anticipated expiration: 2036-02-18
Also published as: JP2017146238A

Description

本発明の実施形態は、バイオマーカー探索装置、バイオマーカー探索方法およびプログラムに関する。 Embodiments of the present invention relate to biomarker search devices, biomarker search methods and programs.

人間のゲノムは、およそ３０億の塩基対の配列が二本で構成されており、さらに２２種類の染色体の常染色体とＸ、Ｙの性染色体に分かれている。各塩基対は同一民族内でほぼ一致するものの、個人によって異なる塩基対が、複数箇所に渡って存在する。その異なる塩基対は、ＳＮＰ（Single-Nucleotide Polymorphisms：一塩基多型）と呼ばれている。 The human genome is composed of two sequences of about 3 billion base pairs, and is further divided into 22 types of autosomal chromosomes and X and Y sex chromosomes. Although each base pair is almost the same within the same ethnic group, there are multiple base pairs that differ from person to person. The different base pairs are called SNPs (Single-Nucleotide Polymorphisms).

ＳＮＰの中には疾病の形質発現に影響を与えるものがあることが知られている。ただし形質とは、例えば疾病の有無である。また、個々のＳＮＰは、単独では形質発現との関連性が認めにくいが、複数のＳＮＰの組合せにより、その形質が発現する可能性も示唆されている。このような、疾病としての形質発現との関連性が認められるＳＮＰの組合せをバイオマーカー候補と呼ぶ。 It is known that some SNPs affect the phenotypic expression of diseases. However, the trait is, for example, the presence or absence of a disease. In addition, although it is difficult to recognize the relationship between individual SNPs and trait expression alone, it has been suggested that the trait may be expressed by a combination of a plurality of SNPs. Such a combination of SNPs that is found to be related to phenotypic expression as a disease is called a biomarker candidate.

バイオマーカー候補の中で、医学的な検証や生物学的な因果分析、生活習慣などの環境要因や、年齢などの別の要因による影響の有無等の検証を経て、真にその因果関係が導出されたものがバイオマーカーとされ、公にある形質発現と関連性のある情報として、その知見を用いた治療などの実サービスに適用可能な有用な情報として認知される。 Among the biomarker candidates, the true causal relationship is derived through medical verification, biological causal analysis, verification of environmental factors such as lifestyle, and the presence or absence of influence by other factors such as age. The biomarker is used as a biomarker, and is recognized as useful information applicable to actual services such as treatment using the knowledge as information related to public trait expression.

このようなＳＮＰの組合せを検出する技術は、近年のゲノム解析技術によってもたらされた。ＳＮＰの組合せには膨大な数があるため、すべての組合せについて、疾病の形質発現への影響を調べるのは容易ではない。このため、ＳＮＰの組合せを限定して探索を行わざるを得ないのが実情である。二つまでの組合せに関しては全探索を行う手法がある。または、高々二種類のＳＮＰの組合せを基にＳＮＰごとのランキングを作成し、組合せの探索を行う手法もある。しかしながら、全探索を用いる場合には計算時間の問題から３つ以上の探索を行うことは困難である。また、医学的な知見として、特定の疾病への関連性が高いと認識されているＳＮＰもあり、このようなＳＮＰの情報を考慮に入れた上で、バイオマーカー候補を検出するのが望ましい。 The technique for detecting such a combination of SNPs has been brought about by recent genome analysis techniques. Due to the huge number of SNP combinations, it is not easy to investigate the effect of disease on phenotypic expression for all combinations. Therefore, the actual situation is that the search must be performed by limiting the combination of SNPs. There is a method of performing a full search for combinations of up to two. Alternatively, there is also a method of creating a ranking for each SNP based on a combination of at most two types of SNPs and searching for the combination. However, when the full search is used, it is difficult to perform three or more searches due to the problem of calculation time. In addition, as medical findings, some SNPs are recognized to be highly relevant to a specific disease, and it is desirable to detect biomarker candidates in consideration of such SNP information.

しかしながら、医学的な知見によって得たＳＮＰの情報を考慮に入れて、膨大な組合せの中からバイオマーカー候補を探索する効率的な手法は、今まで提案されていない。 However, an efficient method for searching for biomarker candidates from a huge number of combinations has not been proposed so far, taking into consideration the SNP information obtained from medical knowledge.

特開２０１０−２２４８１５号公報JP-A-2010-224815 特開２０１３−１７５１３５号公報Japanese Unexamined Patent Publication No. 2013-175135

本発明の一実施形態は、医学上の知見を考慮に入れて、バイオマーカー候補を効率的に探索可能なバイオマーカー探索装置、バイオマーカー探索方法およびプログラムを提供するものである。 One embodiment of the present invention provides a biomarker search device, a biomarker search method, and a program capable of efficiently searching for biomarker candidates in consideration of medical knowledge.

本実施形態によれば、塩基配列内の複数のＳＮＰ（Single-Nucleotide Polymorphisms：一塩基多型）の中から、特定の疾病に関連があると推測される特定のＳＮＰを指定する特定ＳＮＰ指定部と、
前記特定のＳＮＰおよび検体の形質情報に基づいて、前記特定の疾病に関連があると推測されるＳＮＰを２つ以上含むバイオマーカー候補を探索する候補探索部と、
前記バイオマーカー候補を出力する候補出力部と、を備えるバイオマーカー候補探索装置が提供される。 According to the present embodiment, a specific SNP designation unit that specifies a specific SNP presumed to be related to a specific disease from among a plurality of SNPs (Single-Nucleotide Polymorphisms) in the base sequence. When,
A candidate search unit that searches for biomarker candidates containing two or more SNPs that are presumed to be related to the specific disease based on the trait information of the specific SNP and the sample.
A biomarker candidate search device including the candidate output unit for outputting the biomarker candidate is provided.

一実施形態によるバイオマーカー探索装置の概略構成を示すブロック図。The block diagram which shows the schematic structure of the biomarker search apparatus by one Embodiment. 疾病有無ベクトルと接合タイプ行列の一例を示す図。The figure which shows an example of the disease presence vector and the junction type matrix. ＳＮＰ組合せ行列の一例を示す図。The figure which shows an example of the SNP combination matrix. 対象の検体の疾病の有無の識別方法を示すフローチャート。A flowchart showing a method of identifying the presence or absence of a disease in a target sample. 一実施形態によるバイオマーカー探索装置のより詳細なブロック図。A more detailed block diagram of the biomarker search device according to one embodiment. ＳＮＰ形質ＤＢの一例を示す図。The figure which shows an example of the SNP trait DB. 検体情報入力部の処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of the sample information input part. 検体情報登録ＤＢの一例を示す図。The figure which shows an example of the sample information registration DB. ＳＮＰ情報登録ＤＢの一例を示す図。The figure which shows an example of the SNP information registration DB. 関連ＳＮＰ登録ＤＢの一例を示す図。The figure which shows an example of the related SNP registration DB. 検体情報入力部と検索条件入力部を兼ねるＧＵＩ画面の一例を示す図。The figure which shows an example of the GUI screen which serves as the sample information input part and the search condition input part. 本実施形態によるバイオマーカー探索装置の処理手順を示すフローチャート。The flowchart which shows the processing procedure of the biomarker search apparatus by this embodiment. ステップＳ１８におけるＳＮＰ組合せ行列の各接合体要素のスコアの一例を示す図。The figure which shows an example of the score of each junction element of the SNP combination matrix in step S18. ２^Ｖ通りの組合せのうちの３つ（以下、組合せｃ１〜ｃ３）の識別誤差をそれぞれ示す図。2 Three of the combination of ^V street (hereinafter, combination c1 to c3) shows each identification error of. ＳＮＰ組合せ行列の更新方法を示すフローチャート。The flowchart which shows the update method of the SNP combination matrix. ステップＳ２２の出力形態の一例を示す図。The figure which shows an example of the output form of step S22. 各ＳＮＰの組合せの判別精度を示すオッズ値または−ｌｏｇ(Ｐ値)をプロットした図。The figure which plotted the odds value or -log (P value) which shows the discrimination accuracy of each SNP combination.

以下、図面を参照しながら、本発明の一実施形態を説明する。図１は一実施形態によるバイオマーカー探索装置１の概略構成を示すブロック図である。図１のバイオマーカー探索装置１は、特定ＳＮＰ指定部２と、候補探索部３と、候補出力部４とを備えている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a biomarker search device 1 according to an embodiment. The biomarker search device 1 of FIG. 1 includes a specific SNP designation unit 2, a candidate search unit 3, and a candidate output unit 4.

特定ＳＮＰ指定部２は、ゲノム（塩基配列）内の複数のＳＮＰ（Single-Nucleotide Polymorphisms：一塩基多型）の中から、特定の疾病に関連があると指定された特定のＳＮＰを取得する。 The specific SNP designation unit 2 acquires a specific SNP designated as being related to a specific disease from a plurality of SNPs (Single-Nucleotide Polymorphisms) in the genome (base sequence).

候補探索部３は、特定のＳＮＰおよび検体の形質情報に基づいて、特定の疾病に関連があると推測されるＳＮＰを２つ以上含むバイオマーカー候補を探索する。検体の形質情報は、例えば、後述する探索情報登録に登録されている。よって、候補探索部３は、より詳細には、特定ＳＮＰ指定部２で指定された特定ＳＮＰと探索情報登録ＤＢに基づき、特定ＳＮＰ指定部２で指定したＳＮＰを１つ以上含むＳＮＰの組合せをＫ個（Ｋは２以上の整数）探索する。
図２は探索情報登録ＤＢ内の登録情報の一例を示す図である。図示のように、探索情報登録ＤＢ内には、疾病有無ベクトルが登録されている。図２の疾病有無ベクトルには各検体の疾病の有無情報が記録されている。値が１ならば対応する検体は疾病を有し、値が０ならば疾病を有さないことを示す。また、各ＳＮＰは２つの塩基の組合せから構成されており、Ａ（アデニン）とＴ（チミン）、もしくはＧ（グアニン）とＣ（シトシン）がペアとなる。また各ＳＮＰで登場する２種類の塩基のうち数が多い方をメジャーアリル、数が少ないほうをマイナーアリルとよぶ。そのため各ＳＮＰを構成する塩基の組合せは共にメジャーアリル（メジャーホモ接合体、ＸＸ）、メジャーアリルとマイナーアリル（ヘテロ接合体、ＸＹ）、共にマイナーアリル（マイナーホモ接合体、ＹＹ）の３種類の接合体に分類できる。探索情報登録ＤＢ内には、図２に示すように、各検体の接合タイプを表した接合タイプ行列が登録されている。 The candidate search unit 3 searches for biomarker candidates containing two or more SNPs that are presumed to be related to a specific disease, based on the trait information of the specific SNP and the sample. The trait information of the sample is registered in , for example, the search information registration described later. Therefore, in more detail, the candidate search unit 3 includes a combination of SNPs including one or more SNPs designated by the specific SNP designation unit 2 based on the specific SNPs designated by the specific SNP designation unit 2 and the search information registration DB. Search for K (K is an integer of 2 or more).
FIG. 2 is a diagram showing an example of registration information in the search information registration DB. As shown in the figure, a disease presence / absence vector is registered in the search information registration DB. The disease presence / absence vector of FIG. 2 records the disease presence / absence information of each sample. A value of 1 indicates that the corresponding specimen has disease, and a value of 0 indicates no disease. In addition, each SNP is composed of a combination of two bases, and A (adenine) and T (thymine) or G (guanine) and C (cytosine) are paired. Of the two types of bases that appear in each SNP, the one with the largest number is called the major allele, and the one with the smaller number is called the minor allyl. Therefore, there are three types of base combinations that make up each SNP: major allele (major homozygotes, XX), major alleles and minor alleles (heterozygotes, XY), and both minor alleles (minor homozygotes, YY). It can be classified as a conjugate. As shown in FIG. 2, a junction type matrix representing the junction type of each sample is registered in the search information registration DB.

図３はＳＮＰの組合せを２個探索した例である。図３では行数が組合せ数（Ｋ）、列数がＳＮＰの接合体数である行列をＳＮＰ組合せ行列と呼び、ＳＮＰ組合せの出力を示している。ＳＮＰ組合せ行列内の要素が１の値を取るものが対応するＳＮＰ組合せで用いられるものに該当する。図３の場合、Ｔ１で示す通り、組合せ数が２の場合であり１つ目の組合せが（ＳＮＰ−００００１がＸＸ、ＳＮＰ−００００２がＹＹ）であり、２つ目の組合せが（ＳＮＰ−００００３がＸＹ、ＳＮＰ−００００４がＸＹ）である。候補探索部３は、典型的には、指定した特定のＳＮＰを含む、２つ以上のＳＮＰが含まれる組合せを探索する。ただしデータセットと条件によっては特定のＳＮＰが含まれない場合も想定されるため、その場合にはエラーを返す。 FIG. 3 is an example of searching for two SNP combinations. In FIG. 3, a matrix in which the number of rows is the number of combinations (K) and the number of columns is the number of conjugates of SNP is called an SNP combination matrix, and the output of the SNP combination is shown. An element in the SNP combination matrix that takes a value of 1 corresponds to the one used in the corresponding SNP combination. In the case of FIG. 3, as shown by T1, the number of combinations is 2, the first combination is (SNP-00001 is XX, SNP-00002 is YY), and the second combination is (SNP-00003). Is XY, and SNP-00004 is XY). The candidate search unit 3 typically searches for a combination including two or more SNPs including a specified specific SNP. However, depending on the data set and conditions, it is assumed that a specific SNP is not included, and in that case an error is returned.

複数のＳＮＰの組合せを用いた各検体の疾病の有無の識別方法は、例えば図４のフローチャートで表される。探索されたＫ個のＳＮＰの組合せのうち、いずれかの組合せに用いられる全てのＳＮＰの接合体を検体が有している場合に疾病ありと識別する。図３の場合には（ＳＮＰ−００００１がＸＸかつＳＮＰ−００００２がＹＹ）または（ＳＮＰ−００００３がＸＹかつＳＮＰ−００００４がＸＹ）の条件を満たす検体は疾病ありと識別を行う。図４のフローチャートの説明は後述する。 A method for identifying the presence or absence of a disease in each sample using a combination of a plurality of SNPs is represented by, for example, the flowchart of FIG. When the sample has all the SNP conjugates used in any of the K SNP combinations searched, it is identified as having a disease. In the case of FIG. 3, a sample satisfying the condition of (SNP-00001 is XX and SNP-00002 is YY) or (SNP-00003 is XY and SNP-00004 is XY) is identified as having a disease. A description of the flowchart of FIG. 4 will be described later.

図１に示すように、候補探索部３は、評価値算出部５と、識別誤差算出部６と、最小識別誤差選択部７とを有する。評価値算出部５は、複数のＳＮＰのそれぞれが前記バイオマーカー候補となりうる可能性の高さを示す評価値をＳＮＰごとに算出する処理をＶ（Ｖは２以上の整数）回行う。より詳細には、評価値算出部５は、図３のＳＮＰ組合せ行列の各要素がＳＮＰの組合せとして選択される可能性の高さを示す評価値を各ＳＮＰで算出する処理をＶ（Ｖは２以上の整数）回行う。本明細書では、評価値をスコアとも呼ぶ。以下では、ＳＮＰ組合せ行列の各要素を、接合体要素とも呼ぶ。図３に示すように、各ＳＮＰごとに、例えばＸＸ、ＸＹ、ＹＹの計３個の接合体要素が設けられている。 As shown in FIG. 1, the candidate search unit 3 has an evaluation value calculation unit 5, an identification error calculation unit 6, and a minimum identification error selection unit 7. The evaluation value calculation unit 5 performs a process of calculating an evaluation value indicating a high possibility that each of the plurality of SNPs can be a biomarker candidate for each SNP V (V is an integer of 2 or more) times. More specifically, the evaluation value calculation unit 5 performs a process of calculating an evaluation value indicating a high possibility that each element of the SNP combination matrix of FIG. 3 is selected as a combination of SNPs at each SNP. Do this twice or more). In the present specification, the evaluation value is also referred to as a score. Hereinafter, each element of the SNP combination matrix is also referred to as a zygote element. As shown in FIG. 3, for each SNP, for example, a total of three joining elements of XX, XY, and YY are provided.

識別誤差算出部６は、評価値算出部５による各回の評価値の最大値に対応するＳＮＰをＶ個集めた中での任意のＳＮＰの組合せについて、特定の疾病との関連性の高さを示す識別誤差を算出する。より詳細には、識別誤差算出部６は、評価値算出部５による各回の評価値の最大値に対応する接合体要素をＶ個集めた中での各接合体要素がＳＮＰ組合せ行列において０，１のいずれかの値を取る全ての組合せである２^Ｖ通りについて、ＳＮＰ組合せ行列が各検体の疾病の有無を正しく識別できた度合を表す識別誤差を算出する。この場合、接合体要素が１の値を取る場合に対応する接合体要素をＳＮＰ組合せとして採用することを示し、０の値を取る場合ＳＮＰ組合せとして採用しないことを示す。最小識別誤差選択部７は、各接合体要素がＳＮＰ組合せ行列において０，１のいずれかの値を取る全ての場合において識別誤差が最小のＳＮＰの組合せを選択する。評価値算出部５および識別誤差算出部６は、最小識別誤差選択部７にて選択された接合体要素を用いてＳＮＰ組合せ行列を更新する。そして評価値算出部５からＵ（Ｕは２以上の整数）回の処理をそれぞれ繰り返し、ＳＮＰ組合せ行列を更新する。候補探索部３で算出される出力結果であるＳＮＰ組合せ行列が典型的にはＳＮＰを２つ以上含むバイオマーカー候補と対応している。 The identification error calculation unit 6 determines the high degree of relevance to a specific disease for any combination of SNPs in a collection of V SNPs corresponding to the maximum value of each evaluation value by the evaluation value calculation unit 5. Calculate the indicated identification error. More specifically, in the identification error calculation unit 6, each joint element in the collection of V joint elements corresponding to the maximum value of each evaluation value by the evaluation value calculation unit 5 is 0 in the SNP combination matrix. About 2 ^V as an all combinations taking a value of either 1, it calculates the identification error representing the degree of SNP combined matrix could be correctly identified the presence or absence of disease in each sample. In this case, when the joint element takes a value of 1, it indicates that the corresponding joint element is adopted as the SNP combination, and when it takes a value of 0, it indicates that it is not adopted as the SNP combination. The minimum identification error selection unit 7 selects the combination of SNPs having the smallest identification error in all cases where each junction element takes a value of 0 or 1 in the SNP combination matrix. The evaluation value calculation unit 5 and the identification error calculation unit 6 update the SNP combination matrix using the join element selected by the minimum identification error selection unit 7. Then, the evaluation value calculation unit 5 repeats the process U (U is an integer of 2 or more) times to update the SNP combination matrix. The SNP combination matrix, which is the output result calculated by the candidate search unit 3, typically corresponds to a biomarker candidate containing two or more SNPs.

候補出力部４は、候補探索部３が探索したバイオマーカー候補を出力する。より詳細には、候補出力部４は、Ｕ回の処理後に最小識別誤差選択部７により選択されたＳＮＰの組合せであるＳＮＰ組合せ行列をバイオマーカー候補として出力する。 The candidate output unit 4 outputs the biomarker candidate searched by the candidate search unit 3. More specifically, the candidate output unit 4 outputs an SNP combination matrix, which is a combination of SNPs selected by the minimum identification error selection unit 7 after U times of processing, as a biomarker candidate.

候補探索部３は、ＳＮＰ組合せ行列の初期値を設定する行列初期化部８を有していてもよい。 The candidate search unit 3 may have a matrix initialization unit 8 that sets an initial value of the SNP combination matrix.

また、評価値算出部５は、最大接合体要素選択部９と複数接合体要素選択部１０を有していてもよい。最大接合体要素選択部９は、ＳＮＰ組合せ行列内の各接合体要素を取得して評価値を算出して、評価値が最大の接合体要素を選択する。複数接合体要素選択部１０は、最大接合体要素選択部９の処理をＶ回繰り返して、各回にそれぞれ相違する接合体要素を総計Ｖ個選択する。 Further, the evaluation value calculation unit 5 may have a maximum joint element selection unit 9 and a plurality of joint element selection units 10. The maximum joint element selection unit 9 acquires each joint element in the SNP combination matrix, calculates an evaluation value, and selects the joint element having the maximum evaluation value. The plurality of joint element selection units 10 repeat the process of the maximum joint element selection unit 9 V times, and select a total of V different joint elements each time.

この場合、識別誤差算出部６は、複数接合体要素選択部１０にて選択されたＶ個の接合体要素のそれぞれをＳＮＰの組合せとして選択するか否かのすべての組合せについて、識別誤差を算出する。 In this case, the identification error calculation unit 6 calculates the identification error for all combinations of whether or not each of the V joint elements selected by the plurality of joint element selection units 10 is selected as the combination of SNPs. do.

図５は一実施形態によるバイオマーカー探索装置１のより詳細なブロック図である。図５のバイオマーカー探索装置１は、図１に示した各部を有する他に、ＳＮＰ形質ＤＢ１１と、検体情報入力部１２と、検体情報登録ＤＢ１３と、ＳＮＰ情報登録ＤＢ１４と、特定ＳＮＰ登録ＤＢ１５と、関連ＳＮＰ登録ＤＢ１６と、探索範囲ＳＮＰ選択部（探索範囲取得部）１７と、選択ＳＮＰ登録ＤＢ１８と、探索情報照合部１９と、探索情報登録ＤＢ２０と、検索条件入力部２１と、バイオマーカー候補登録ＤＢ２２とを備えている。 FIG. 5 is a more detailed block diagram of the biomarker search device 1 according to the embodiment. In addition to having each part shown in FIG. 1, the biomarker search device 1 of FIG. 5 includes an SNP trait DB 11, a sample information input unit 12, a sample information registration DB 13, an SNP information registration DB 14, and a specific SNP registration DB 15. , Related SNP registration DB 16, search range SNP selection unit (search range acquisition unit) 17, selected SNP registration DB 18, search information collation unit 19, search information registration DB 20, search condition input unit 21, and biomarker candidates. It has a registration DB 22.

ＳＮＰ形質ＤＢ１１は、各検体に含まれる複数のＳＮＰ（ＳＮＰ系列データとも呼ばれる）と、各検体が特定の形質を有するか否かの情報とを対応づけて登録したデータベースである。なお、本明細書では、「データベース」をＤＢと略する。 The SNP trait DB 11 is a database in which a plurality of SNPs (also referred to as SNP series data) included in each sample and information on whether or not each sample has a specific trait are registered in association with each other. In this specification, "database" is abbreviated as DB.

図６はＳＮＰ形質ＤＢ１１の一例を示す図である。図６に示すように、ＳＮＰ形質ＤＢ１１には、各検体の識別番号と、各検体に含まれるＳＮＰの情報と、各検体が特定の疾病を有するか否かの情報とが登録されている。 FIG. 6 is a diagram showing an example of the SNP trait DB11. As shown in FIG. 6, in the SNP trait DB 11, the identification number of each sample, the information of SNP contained in each sample, and the information on whether or not each sample has a specific disease are registered.

ここで、ＳＮＰとは遺伝子配列の中で個人により特徴が異なる塩基対である。例えば図６において、遺伝子配列位置SNP-00002では、複数の検体P-001〜P-010によって取り得る遺伝子型の組合せがＣＣ，ＣＴ，ＴＴの場合があり検体により異なる。このように遺伝子型の組合せが検体により異なる塩基対のことをＳＮＰと呼ぶ。 Here, SNP is a base pair having different characteristics depending on an individual in the gene sequence. For example, in FIG. 6, at the gene sequence position SNP-00002, the combination of genotypes that can be taken by a plurality of samples P-001 to P-010 may be CC, CT, and TT, and differs depending on the sample. A base pair whose genotype combination differs depending on the sample is called SNP.

また、図６のＳＮＰ形質ＤＢ１１では、検体ごとに、２種類の疾病Trait-001とTrait-002に対する形質の有無を０と１で表している。０が形質なしで、１が形質ありである。なお、ＳＮＰ形質ＤＢ１１に登録される疾病の種類や数は特に問わない。 Further, in the SNP trait DB 11 of FIG. 6, the presence or absence of traits for two types of diseases Trait-001 and Trait-002 is represented by 0 and 1 for each sample. 0 is no trait and 1 is trait. The type and number of diseases registered in the SNP trait DB 11 are not particularly limited.

検体情報入力部１２は、各検体の問診などの診断結果や、過去の病歴、親族の罹患履歴などの検体に関する属性情報および形質情報を入力する。入力された属性情報は、検体情報ＤＢに登録される。 The sample information input unit 12 inputs diagnostic results such as interviews with each sample, and attribute information and trait information regarding the sample such as past medical history and morbidity history of relatives. The input attribute information is registered in the sample information DB.

図７は検体情報入力部１２の処理手順の一例を示すフローチャートである。検体情報入力部１２は、検体番号（ステップＳ１）、検体の年齢（ステップＳ２）、国籍（ステップＳ３）、既往歴（ステップＳ４）、体型（ステップＳ５）を順に入力する。ステップＳ１〜Ｓ５の入力順序は、特に問わない。検体情報入力部１２の入力は、キーボード等の情報入力機器を用いて行われる。 FIG. 7 is a flowchart showing an example of the processing procedure of the sample information input unit 12. The sample information input unit 12 inputs the sample number (step S1), the age of the sample (step S2), the nationality (step S3), the medical history (step S4), and the body type (step S5) in this order. The input order of steps S1 to S5 is not particularly limited. The input of the sample information input unit 12 is performed using an information input device such as a keyboard.

次に、検体情報入力部１２は、ステップＳ１〜Ｓ５にて入力された各情報を検体情報登録ＤＢ１３に登録する（ステップＳ６）。なお、ステップＳ６の処理は、ステップＳ１〜Ｓ５のステップごとに行ってもよい。 Next, the sample information input unit 12 registers each information input in steps S1 to S5 in the sample information registration DB 13 (step S6). The process of step S6 may be performed for each step of steps S1 to S5.

ステップＳ１〜Ｓ５で入力された各情報は、属性情報や検体情報とも呼ばれる。図７の処理により、検体情報登録ＤＢ１３への登録を行うことで、検体番号P-001〜P-010のうち任意の検体番号を指定することで、その検体番号に対応する属性情報を一括して検体情報登録ＤＢ１３から取得することができる。 Each of the information input in steps S1 to S5 is also referred to as attribute information or sample information. By registering in the sample information registration DB 13 by the process of FIG. 7, by specifying an arbitrary sample number from the sample numbers P-001 to P-010, the attribute information corresponding to the sample number is collectively collected. It can be obtained from the sample information registration DB 13.

図８は検体情報登録ＤＢ１３の一例を示す図である。図８の例では、検体情報登録ＤＢ１３には、検体ごとに、収縮期血圧と、拡張期血圧と、検体に対応する人間の病歴有無情報と、その人間の親族の病歴有無情報とが登録されている。図８は一例であり、検体情報登録ＤＢ１３に登録する検体情報には、特に制限はない。 FIG. 8 is a diagram showing an example of the sample information registration DB 13. In the example of FIG. 8, the systolic blood pressure, the diastolic blood pressure, the information on the presence or absence of a human medical history corresponding to the sample, and the information on the presence or absence of a medical history of the human relative are registered in the sample information registration DB 13 for each sample. ing. FIG. 8 is an example, and the sample information registered in the sample information registration DB 13 is not particularly limited.

ＳＮＰ情報登録ＤＢ１４は、検体ごとに、遺伝子型を構成するＳＮＰの情報と、複数の疾病の有無情報とを登録する。 The SNP information registration DB 14 registers information on SNPs constituting genotypes and information on the presence or absence of a plurality of diseases for each sample.

図９はＳＮＰ情報登録ＤＢ１４の一例を示す図である。図９では、各ＳＮＰを、メジャーホモ接合体、マイナーホモ接合体およびヘテロ接合体の３つに分けている。該当する接合体の場合に１の値を取る。 FIG. 9 is a diagram showing an example of the SNP information registration DB 14. In FIG. 9, each SNP is divided into three homozygotes, minor homozygotes and heterozygotes. Take a value of 1 for the applicable joint.

図９に示すように、一つのＳＮＰについて３つの接合体があるため、列数はＳＮＰ数の３倍になる。これら３つの接合体のうち、いずれか１つのみが１になり、残り２つは０になる。 As shown in FIG. 9, since there are three joints for one SNP, the number of columns is three times the number of SNPs. Of these three conjugates, only one will be 1 and the other 2 will be 0.

図９では、検体ごとに、ある疾病に対する形質がある場合を１、ない場合を０としている。同じ疾病に対して、複数の検体が形質ありとしてもよい。 In FIG. 9, the case where there is a trait for a certain disease is set to 1 and the case where there is no trait for a certain disease is set to 0 for each sample. Multiple specimens may be traits for the same disease.

関連ＳＮＰ登録ＤＢ１６は、ある共通性をグループ化したＩＤ（以下、グループＩＤ）と、その共通性と関連のあるＳＮＰのＩＤとを登録したデータベースである。ある共通性とは、例えば、一般的な疾患で関連性が認められたＳＮＰ群や、染色体で免疫を司るＳＮＰ群などを指す。相関のあるＳＮＰを有する検体で検索対象を絞り込む場合に、グループＩＤを指定することで、関連するＳＮＰのＩＤ群を選定し、その遺伝子型を持つ検体のみで、特定のＳＮＰ群の探索を行うことができる。 The related SNP registration DB 16 is a database in which an ID in which a certain commonality is grouped (hereinafter referred to as a group ID) and an ID of an SNP related to the commonality are registered. A certain commonality refers to, for example, an SNP group found to be related to a general disease, an SNP group that controls immunity with a chromosome, and the like. When narrowing down the search target by samples with correlated SNPs, by specifying the group ID, the ID group of the related SNPs is selected, and the search for a specific SNP group is performed only with the samples having that genotype. be able to.

図１０は関連ＳＮＰ登録ＤＢ１６の一例を示す図である。図１０の関連ＳＮＰ登録ＤＢ１６には、グループＩＤと関連ＳＮＰのＩＤとが対応づけて登録されている。例えば、グループChr-001には、関連するＳＮＰの情報として、SNP-00001、SNP-00002、…、SNP-01000が登録されている。Chr-001〜Chr-022は染色体番号であり、HLA-001〜HLA-003はＨＬＡ領域の番号である。 FIG. 10 is a diagram showing an example of the related SNP registration DB 16. In the related SNP registration DB 16 of FIG. 10, the group ID and the ID of the related SNP are registered in association with each other. For example, in the group Chr-001, SNP-00001, SNP-00002, ..., SNP-01000 are registered as related SNP information. Chr-001 to Chr-022 are chromosome numbers, and HLA-001 to HLA-003 are HLA region numbers.

探索範囲ＳＮＰ選択部１７は、関連ＳＮＰ登録ＤＢ１６に登録されているグループＩＤを選択し、対応するＳＮＰの部分集合を指定し、この部分集合に対応するＳＮＰ番号を選択ＳＮＰ登録ＤＢ１８に登録する。選択ＳＮＰ登録ＤＢ１８のデータ構造は、関連ＳＮＰ登録ＤＢ１６と同様であり、関連ＳＮＰ登録ＤＢ１６の登録データの一部が選択ＳＮＰ登録ＤＢ１８に登録される。 The search range SNP selection unit 17 selects a group ID registered in the related SNP registration DB 16, specifies a subset of the corresponding SNP, and registers the SNP number corresponding to this subset in the selection SNP registration DB 18. The data structure of the selected SNP registration DB 18 is the same as that of the related SNP registration DB 16, and a part of the registration data of the related SNP registration DB 16 is registered in the selected SNP registration DB 18.

図１にも示した特定ＳＮＰ指定部２は、図５のＳＮＰ情報登録ＤＢ１４に登録された中から、特定のＳＮＰを指定して、特定ＳＮＰ登録ＤＢ１５に登録する。 The specific SNP designation unit 2 also shown in FIG. 1 designates a specific SNP from those registered in the SNP information registration DB 14 of FIG. 5 and registers it in the specific SNP registration DB 15.

探索情報照合部１９は、特定ＳＮＰ登録ＤＢ１５に登録された特定のＳＮＰと、ＳＮＰ形質ＤＢ１１内の登録情報と、検体情報登録ＤＢ１３内の登録情報と、選択ＳＮＰ登録ＤＢ１８内の登録情報とを照合し、合致する情報を探索情報登録ＤＢ２０に登録する。 The search information collation unit 19 collates the specific SNP registered in the specific SNP registration DB 15, the registration information in the SNP trait DB 11, the registration information in the sample information registration DB 13, and the registration information in the selected SNP registration DB 18. Then, the matching information is registered in the search information registration DB 20.

図５に示す特定ＳＮＰ指定部２と検索条件入力部２１は、不図示の表示装置に表示されたＧＵＩ（Graphical User Interface）画面にて入力することができる。図１１は検体情報入力部１２と検索条件入力部２１を兼ねるＧＵＩ画面の一例を示す図である。図１１のＧＵＩ画面は、ウインドウｗ１〜ｗ４を有する。このうち、ウインドウｗ１とｗ２は特定ＳＮＰ指定部２に対応し、ウインドウｗ４は検索条件入力部２１に対応する。 The specific SNP designation unit 2 and the search condition input unit 21 shown in FIG. 5 can be input on a GUI (Graphical User Interface) screen displayed on a display device (not shown). FIG. 11 is a diagram showing an example of a GUI screen that also serves as a sample information input unit 12 and a search condition input unit 21. The GUI screen of FIG. 11 has windows w1 to w4. Of these, windows w1 and w2 correspond to the specific SNP designation unit 2, and window w4 corresponds to the search condition input unit 21.

ウインドウｗ１（第１ウインドウ）は、特定ＳＮＰ指定部２にて指定される特定のＳＮＰを指定する。ウインドウｗ２（第２ウインドウ）は、指定された特定のＳＮＰすべてを列記する。ウインドウｗ３（第３ウインドウ）は、特定の疾病の種類を指定する。ウインドウｗ３には、例えば複数の疾病名と、各疾病を選択するラジオボタンとが設けられており、ユーザは、任意のラジオボタンにチェックを付けることで、そのラジオボタンに対応した疾病を選択することができる。また、ウインドウｗ４では候補探索部３で必要となる各種パラメータを入力する。具体的な一例としては、評価値（スコア）を補正するためのパラメータε、識別誤差を補正するためのパラメータα、ＳＮＰの組合せ数Ｋ、候補探索部３内での繰り返し回数を表すパラメータＵ、Ｖなどである。 The window w1 (first window) specifies a specific SNP designated by the specific SNP designation unit 2. Window w2 (second window) lists all the specified specific SNPs. Window w3 (third window) specifies a specific disease type. The window w3 is provided with, for example, a plurality of disease names and radio buttons for selecting each disease, and the user selects a disease corresponding to the radio button by checking an arbitrary radio button. be able to. Further, in the window w4, various parameters required by the candidate search unit 3 are input. As a specific example, a parameter ε for correcting the evaluation value (score), a parameter α for correcting the identification error, a combination number K of SNPs, and a parameter U indicating the number of repetitions in the candidate search unit 3 V and so on.

ユーザは、ウインドウｗ１〜ｗ４の選択および設定が完了すると、画面内の右下に設けられたsubmitボタンｂ１を押下する。これにより、特定ＳＮＰ指定部２と検索条件入力部２１の処理が終了する。 When the selection and setting of the windows w1 to w4 are completed, the user presses the submit button b1 provided at the lower right of the screen. As a result, the processing of the specific SNP designation unit 2 and the search condition input unit 21 is completed.

検索条件入力部２１は、第１補正定数入力部２１ａと第２補正定数入力部２１ｂを備えていてもよい。第１補正定数入力部２１ａは、特定のＳＮＰの評価値を補正するための第１補正定数（ε）を入力する。評価値算出部５は、第１補正定数に基づいて、特定のＳＮＰの評価値を算出する。これにより、特定のＳＮＰの評価値を他のＳＮＰの評価値よりも優先的に高くすることができる。 The search condition input unit 21 may include a first correction constant input unit 21a and a second correction constant input unit 21b. The first correction constant input unit 21a inputs the first correction constant (ε) for correcting the evaluation value of a specific SNP. The evaluation value calculation unit 5 calculates the evaluation value of a specific SNP based on the first correction constant. As a result, the evaluation value of a specific SNP can be made higher than the evaluation value of other SNPs.

第２補正定数入力部２１ｂは、特定のＳＮＰを含むＳＮＰの組合せに対応する識別誤差を補正するための第２補正定数（α）を入力する。識別誤差算出部６は、第２補正定数に基づいて、特定のＳＮＰを含むＳＮＰの組合せに対応する識別誤差を算出する。これにより、特定のＳＮＰを含むＳＮＰの組合せに対応する識別誤差を小さく設定することができる。 The second correction constant input unit 21b inputs the second correction constant (α) for correcting the identification error corresponding to the combination of SNPs including a specific SNP. The identification error calculation unit 6 calculates the identification error corresponding to the combination of SNPs including a specific SNP based on the second correction constant. Thereby, the identification error corresponding to the combination of SNPs including a specific SNP can be set small.

検索条件入力部２１は、バイオマーカー候補となるＳＮＰの組合せの個数Ｋを入力するＫ入力部２１ｃを備えていてもよい。 The search condition input unit 21 may include a K input unit 21c for inputting the number K of combinations of SNPs that are biomarker candidates.

また、検索条件入力部２１は、上述したＵを入力するＵ入力部２１ｄと、上述したＶを入力するＶ入力部２１ｅとを備えていてもよい。上述したように、Ｖは評価値算出部５が評価値算出部５において接合体要素を選択する個数である。また、Ｕは最小識別誤差選択部７が識別誤差の最小のＳＮＰの組合せを選択する処理を行う回数である。 Further, the search condition input unit 21 may include a U input unit 21d for inputting the above-mentioned U and a V input unit 21e for inputting the above-mentioned V. As described above, V is the number of joint elements selected by the evaluation value calculation unit 5 in the evaluation value calculation unit 5. Further, U is the number of times that the minimum identification error selection unit 7 performs a process of selecting the combination of SNPs having the minimum identification error.

図１２は本実施形態によるバイオマーカー探索装置１の処理手順を示すフローチャートである。まず、探索情報登録ＤＢ２０から、探索範囲内のＳＮＰ系列データと検体の形質情報とを取得する（ステップＳ１１）。以下では、探索範囲内ＳＮＰ系列データを接合タイプ行列と呼び、検体の形質情報を形質ベクトルと呼び、検体の疾病有無情報を疾病有無ベクトルと呼ぶ場合もある。 FIG. 12 is a flowchart showing a processing procedure of the biomarker search device 1 according to the present embodiment. First, the SNP series data within the search range and the trait information of the sample are acquired from the search information registration DB 20 (step S11). In the following, the SNP sequence data within the search range may be referred to as a junction type matrix, the trait information of the sample may be referred to as a trait vector, and the disease presence / absence information of the sample may be referred to as a disease presence / absence vector.

図９に示すように、接合タイプ行列の行方向は検体数分の行を有し、列方向は一つのＳＮＰに対して３種類の接合体（メジャーホモ接合体、ヘテロ接合体、マイナーホモ接合体）を有し、列方向の総数は、３×ＳＮＰ数である。形質ベクトルの行方向は検体数分の行を有し、列方向は疾病数分の列を有する。 As shown in FIG. 9, the row direction of the junction type matrix has as many rows as the number of samples, and the column direction is three types of conjugates (major homozygotes, heterozygotes, minor homozygotes) for one SNP. The total number in the column direction is 3 × the number of SNPs. The row direction of the trait vector has rows for the number of samples, and the column direction has columns for the number of diseases.

接合タイプ行列は、一つのＳＮＰに対して、３つの要素を有する。例えば、メジャーホモ接合体では｛１，０，０｝、ヘテロホモ接合体では｛０，１，０｝、マイナーホモ接合体では｛０，０，１｝で表現する。接合タイプ行列は、列方向にＳＮＰ数だけ並んでおり、行方向に検体数だけ並んでいる。 The junction type matrix has three elements for one SNP. For example, major homozygotes are represented by {1,0,0}, heterozygotes are represented by {0,1,0}, and minor homozygotes are represented by {0,0,1}. The junction type matrix is arranged by the number of SNPs in the column direction and by the number of samples in the row direction.

接合タイプ行列と疾病有無ベクトルは、探索情報登録ＤＢ２０に登録されており、ステップＳ１１では、この探索情報登録ＤＢ２０から探索範囲内のＳＮＰ系列データと検体の形質情報とを取得する。 The junction type matrix and the disease presence / absence vector are registered in the search information registration DB 20, and in step S11, SNP sequence data within the search range and sample trait information are acquired from the search information registration DB 20.

図２は疾病有無ベクトルと接合タイプ行列の一例を示す図である。疾病有無ベクトルは、図３のＳＮＰ形質ＤＢ１１内の形質を表す複数列分のうち１列を指定する。あるいは、複数列分の形質情報から、積または和演算を行って得た値を０と１に置換してもよい。 FIG. 2 is a diagram showing an example of a disease presence / absence vector and a junction type matrix. The disease presence / absence vector specifies one column out of a plurality of columns representing the trait in the SNP trait DB 11 of FIG. Alternatively, the values obtained by performing the product or sum operation from the trait information for a plurality of columns may be replaced with 0 and 1.

このように、図１２のステップＳ１１では、図９や図２の表データを用いて、探索範囲内の接合タイプ行列と形質ベクトルを取得する。 As described above, in step S11 of FIG. 12, the junction type matrix and the trait vector within the search range are acquired by using the table data of FIGS. 9 and 2.

次に、ユーザが特定ＳＮＰ指定部２にて指定した特定のＳＮＰと、ユーザが検索条件入力部２１にて入力した各種検索条件とを取得する（ステップＳ１２）。ここで取得する検索条件は、例えば、検索条件入力部２１にて入力したＳＮＰ組合せの個数（ＳＮＰ組合せ行列の行数）Ｋと、特定のＳＮＰの評価値を補正するための第１補正定数と、特定のＳＮＰを含むＳＮＰの組合せに対応する識別誤差を補正するための第２補正定数と、評価値算出部５が評価値を算出する回数Ｖと、最小識別誤差選択部７が識別誤差の最小のＳＮＰの組合せを選択する処理を行う回数Ｕとを含む。 Next, the specific SNP designated by the user in the specific SNP designation unit 2 and various search conditions input by the user in the search condition input unit 21 are acquired (step S12). The search conditions acquired here include, for example, the number of SNP combinations (the number of rows in the SNP combination matrix) K input by the search condition input unit 21, and the first correction constant for correcting the evaluation value of a specific SNP. , The second correction constant for correcting the identification error corresponding to the combination of SNPs including a specific SNP, the number of times V for the evaluation value calculation unit 5 to calculate the evaluation value, and the minimum identification error selection unit 7 for the identification error. It includes the number of times U of performing the process of selecting the minimum SNP combination.

次に、行列初期化部８にてＳＮＰ組合せ行列を初期化する（ステップＳ１３）。ＳＮＰ組合せ行列の各要素を０または１に初期化する。初期化の際に、ＳＮＰ組合せ行列の各接合体要素を０または１のいずれに設定するかは任意である。 Next, the matrix initialization unit 8 initializes the SNP combination matrix (step S13). Initialize each element of the SNP combination matrix to 0 or 1. At initialization, it is arbitrary whether each junction element of the SNP combination matrix is set to 0 or 1.

図３はＳＮＰ組合せ行列の一例を示す図である。図３はＫ＝２の例を示している。図１２のステップＳ１３では、ＳＮＰ組合せ行列の各要素を、図３のＳＮＰ組合せ行列には、２×１５＝３０個の接合体要素が含まれている。このように、１つのＳＮＰには３種類の接合体があるため３つの接合体要素を有し、各接合体要素は０か１を取り得る。最終的にＳＮＰ組合せ行列の各行がＳＮＰの組合せに相当する。 FIG. 3 is a diagram showing an example of an SNP combination matrix. FIG. 3 shows an example of K = 2. In step S13 of FIG. 12, each element of the SNP combination matrix is included, and the SNP combination matrix of FIG. 3 contains 2 × 15 = 30 conjugate elements. As described above, since one SNP has three types of joints, it has three joint elements, and each joint element can take 0 or 1. Finally, each row of the SNP combination matrix corresponds to the combination of SNPs.

次に、反復回数を計測する変数ｕを０に初期化する（ステップＳ１４）。続いて、ＳＮＰ組合せ行列に含まれる複数の接合体要素の中からｖ番目に選択する接合体要素を表すための変数ｖを０に初期化する（ステップＳ１５）。 Next, the variable u for measuring the number of repetitions is initialized to 0 (step S14). Subsequently, the variable v for representing the v-th selected join element from the plurality of join elements included in the SNP combination matrix is initialized to 0 (step S15).

次に、ＳＮＰ組合せ行列内の複数の接合体要素から、相互情報量に基づいてｖ番目の接合体要素を取得する（ステップＳ１６）。次に、以下の手順に従って、評価値であるスコアを計算する（ステップＳ１７）。ステップＳ１７の処理は、評価値算出部５内の最大接合体要素選択部９にて行われる。 Next, the v-th junction element is acquired from the plurality of junction elements in the SNP combination matrix based on the mutual information (step S16). Next, the score, which is an evaluation value, is calculated according to the following procedure (step S17). The process of step S17 is performed by the maximum joint element selection unit 9 in the evaluation value calculation unit 5.

ステップＳ１７の計算にあたって、ｖ−１個目までの接合体要素の選択が完了して、ｖ個目の要素を選択することを考える。ＳＮＰ組合せ行列のｋ番目の組合せのｉ番目の接合体要素の評価値であるスコアをＳ(ｋ，ｉ)とする。まず、ｖ−１個の要素からなる既に選択された接合体要素の集合をＲとする。そのうち、既に選択済の接合体要素の一つは、ＳＮＰ組合せ行列におけるｌ（エル）番目のＳＮＰ組合せのｊ番目の要素とする。ＳＮＰ組合せ行列におけるｋ番目ＳＮＰ組合せのｉ番目の要素と、ＳＮＰ組合せ行列におけるｌ（エル）番目のＳＮＰ組合せのｊ番目の要素との冗長度を示す修正付き相互情報量ＲＩは、以下の（１）式で定義する。 In the calculation of step S17, it is considered that the selection of the v-1 th element is completed and the vth element is selected. Let S (k, i) be the score which is the evaluation value of the i-th junction element of the k-th combination of the SNP combination matrix. First, let R be a set of already selected join element consisting of v-1 elements. Among them, one of the already selected conjugate elements is the j-th element of the l-th SNP combination in the SNP combination matrix. The modified mutual information RI indicating the redundancy between the i-th element of the k-th SNP combination in the SNP combination matrix and the j-th element of the l-th SNP combination in the SNP combination matrix is as follows (1). ) Defined by the formula.

ここで、Ｔ_lはＳＮＰ組合せ行列のｌ番目のＳＮＰ組合せを除くＫ−１個のＳＮＰ組合せによって陰性と識別される（陰性と識別されない）検体の集合である。また、Ｔ_k,lは、Ｔ_lとＴ_kの共通部分の検体からなる集合である。 Here, T _l is a set of samples that are identified as negative (not identified as negative) by K-1 SNP combinations excluding the l-th SNP combination in the SNP combination matrix. Further, T _{k and l} are a set consisting of samples of the intersection of _{T l} and T _k.

また、Ｉ(Ｘ_Ｔk,l，ｊ，Ｘ_Ｔk,l，ｉ)は、Ｔ_k,lに属する検体に関するｊ番目の接合体要素とｉ番目の接合体要素の相互情報量である。また、このとき、Ｓ(ｋ，ｉ)は、以下の（２）式で計算される。ただし、Ｉ(Ｙ_T，Ｘ_Ｔk,i)はＴ_kに属する検体に関してi番目の接合体要素と疾病の有無に関する相互情報量である。 Further, I (X _{Tk, l, j} , X _{Tk, l, i} ) is a mutual information amount between the j-th junction element and the i-th junction element with respect to the sample belonging to _{T k, l.} At this time, S (k, i) is calculated by the following equation (2). However, I (Y _T , _{XT k} _{, i} ) is a mutual information regarding the presence or absence of disease and the i-th zygote element for the sample belonging to T k.

次に、ＳＮＰ組合せ行列内の全接合体要素の中から、最大のスコアを持つ接合体要素を選択する（ステップＳ１８）。ステップＳ１８の処理は、評価値算出部５内の最大接合体要素選択部９にて行われる。 Next, the join element having the highest score is selected from all the join elements in the SNP combination matrix (step S18). The process of step S18 is performed by the maximum joint element selection unit 9 in the evaluation value calculation unit 5.

図１３はステップＳ１８で変数ｖがＶに達したと判定された場合のＳＮＰ組合せ行列のスコアの一例を示す図である。図１３の例では、スコアが０．９であるｋ＝２でSNP-00003-YYの接合体要素が選択される。 FIG. 13 is a diagram showing an example of the score of the SNP combination matrix when it is determined that the variable v has reached V in step S18. In the example of FIG. 13, the zygote element of SNP-00003-YY is selected with k = 2 having a score of 0.9.

ここで、特定ＳＮＰ指定部２で利用者が予め特定したＳＮＰについては、Ｓ（ｋ，ｉ）＋εとスコアの値をε（ε＞０）だけ高くし、スコアの値を意図的に高くして選ばれやすくしてもよい。 Here, for the SNP specified in advance by the user in the specific SNP designation unit 2, S (k, i) + ε and the score value are increased by ε (ε> 0), and the score value is intentionally increased. It may be easy to be selected.

このように、変数ｖの値ごとに、ステップＳ１７，Ｓ１８の処理を行って、（２）式のスコアが最大の接合体要素を一つ選択する。ステップＳ１８の処理は、評価値算出部５内の複数接合体要素選択部１０にて行われる。 In this way, the processes of steps S17 and S18 are performed for each value of the variable v, and one junction element having the maximum score in Eq. (2) is selected. The process of step S18 is performed by the plurality of joint element selection units 10 in the evaluation value calculation unit 5.

次に、変数ｖが所定の制限数Ｖに達したか否かを判定する（ステップＳ１９）。まだ達していなければ、変数ｖを１インクリメントして（ステップＳ２０）、ステップＳ１６〜Ｓ１９の処理を繰り返す。 Next, it is determined whether or not the variable v has reached the predetermined limit number V (step S19). If it has not been reached yet, the variable v is incremented by 1 (step S20), and the processes of steps S16 to S19 are repeated.

変数ｖが所定の制限数Ｖに達した段階では、Ｖ個の接合体要素が選抜されたことになる。そこで、これらＳＮＰ組合せ行列のＶ個の接合体要素がそれぞれ０もしくは１をとる２^Ｖ個の組合せのそれぞれ毎に識別誤差を算出し(ステップ２１)、その中から識別誤差が最小となる組合せを探索し、ＳＮＰ組合せ行列を更新する（ステップＳ２２）。ただしＳＮＰ組合せ行列のＶ個以外の接合体要素に関しては現時点で値に基づいて識別誤差の計算を行う。ステップＳ２１の処理は、識別誤差算出部６にて行われる。ステップＳ２２の処理は、最小識別誤差選択部７にて行われる。 When the variable v reaches the predetermined limit number V, V joined elements are selected. ^{Therefore, the identification error is calculated for each of the 2 V} combinations in which the V junction elements of the SNP combination matrix take 0 or 1, respectively (step 21), and the combination that minimizes the identification error is selected from among them. Search and update the SNP combination matrix (step S22). However, for the junction elements other than V in the SNP combination matrix, the identification error is calculated based on the values at present. The process of step S21 is performed by the identification error calculation unit 6. The process of step S22 is performed by the minimum identification error selection unit 7.

識別誤差を計算するにあたって、対象とする検体の疾病の有無を識別する必要がある。図４は対象の検体の疾病の有無を識別するフローチャートである。まず、識別対象の検体のＩＤを取得し（ステップＳ３１）、次に、疾病の有無を識別するのに利用する変数Ｚを０に初期化する（ステップＳ３２）。次に、ＳＮＰ組合せ行列の行数を指定する変数ｋを１に初期化する（ステップＳ３３）。次に、ステップＳ３１で取得したＩＤの検体が、ＳＮＰ組合せ行列が示すｋ番目のＳＮＰ組合せに含まれる各ＳＮＰの接合体をすべて有するか否かを判定する（ステップＳ３４）。ステップＳ３４でＹＥＳと判定されると、変数Ｚを１だけインクリメントする（ステップＳ３５）。 In calculating the discrimination error, it is necessary to identify the presence or absence of disease in the target sample. FIG. 4 is a flowchart for identifying the presence or absence of a disease in the target sample. First, the ID of the sample to be identified is acquired (step S31), and then the variable Z used for identifying the presence or absence of a disease is initialized to 0 (step S32). Next, the variable k that specifies the number of rows of the SNP combination matrix is initialized to 1 (step S33). Next, it is determined whether or not the sample of the ID acquired in step S31 has all the conjugates of each SNP included in the k-th SNP combination indicated by the SNP combination matrix (step S34). If YES is determined in step S34, the variable Z is incremented by 1 (step S35).

ステップＳ３４でＮＯと判定された場合、またはステップＳ３５の処理が終了した場合は、変数ｋがＳＮＰ組合せ行列の行数Ｋに１を加えた値に達したか否かを判定する（ステップＳ３６）。ステップＳ３６でＮＯと判定された場合は、変数ｋを１だけインクリメントし（ステップＳ３７）、ステップＳ３４以降の処理を繰り返す。 If NO is determined in step S34, or if the process of step S35 is completed, it is determined whether or not the variable k has reached the value obtained by adding 1 to the number of rows K of the SNP combination matrix (step S36). .. If NO is determined in step S36, the variable k is incremented by 1 (step S37), and the processes after step S34 are repeated.

ステップＳ３６がＹＥＳの場合は、変数Ｚが１以上であれば、ステップＳ３１で取得したＩＤの検体は疾病ありと識別し、変数Ｚ＝０であれば、疾病なしと識別する（ステップＳ３８）。 When step S36 is YES, if the variable Z is 1 or more, the sample with the ID acquired in step S31 is identified as having a disease, and if the variable Z = 0, it is identified as having no disease (step S38).

識別誤差は、陽性と識別された検体が実際は陰性である検体数と、陰性と識別された検体が実際は陽性である検体数との和である。ただし、ユーザが指定した特定のＳＮＰを含む場合は、識別誤差をα（０＜α＜１）倍に減算し、識別誤差を小さくすることで、特定のＳＮＰが選ばれやすくする。 The discrimination error is the sum of the number of samples in which the sample identified as positive is actually negative and the number of samples in which the sample identified as negative is actually positive. However, when a specific SNP specified by the user is included, the identification error is subtracted by α (0 <α <1) times to reduce the identification error, so that the specific SNP can be easily selected.

図１４は２^Ｖ通りの組合せのうちの３つ（以下、組合せｃ１〜ｃ３）の識別誤差をそれぞれ示す図である。図１４の組合せｃ１の識別誤差は４、図１４の組合せｃ２の識別誤差は３、図１４の組合せｃ３の識別誤差は２である。よって、図１２のステップＳ１９では、識別誤差が２の図１４の組合せｃ３における接合体要素が最終的に選択されて、これら接合体要素を含む新たなＳＮＰ組合せ行列が生成される。例えば、図１４のｃ３のＳＮＰ組合せ行列の場合（SNP-00001がＸＸかつSNP-00002がYY）またはSNP-00003がYYならば、その検体は陽性であると識別する。 FIG. 14 is ^{a diagram showing identification errors of three of the 2 V} combinations (hereinafter, combinations c1 to c3). The identification error of the combination c1 of FIG. 14 is 4, the identification error of the combination c2 of FIG. 14 is 3, and the identification error of the combination c3 of FIG. 14 is 2. Therefore, in step S19 of FIG. 12, the junction elements in the combination c3 of FIG. 14 having an identification error of 2 are finally selected, and a new SNP combination matrix containing these junction elements is generated. For example, in the case of the SNP combination matrix of c3 in FIG. 14 (SNP-00001 is XX and SNP-00002 is YY) or SNP-00003 is YY, the sample is identified as positive.

図１５はＳＮＰ組合せ行列の更新手順の一例を示すフローチャートである。まず、現時点でのＳＮＰ組合せ行列を取得し（ステップＳ４１）、評価値算出部５で算出されたＶ個の接合体要素を取得する（ステップＳ４２）。 FIG. 15 is a flowchart showing an example of the procedure for updating the SNP combination matrix. First, the current SNP combination matrix is acquired (step S41), and the V joined elements calculated by the evaluation value calculation unit 5 are acquired (step S42).

次に、変数ｉを０に初期化する（ステップＳ４３）。次に、Ｖ個の接合体要素のそれぞれを１にするか０にするかのすべての組合せである２^Ｖ個の組合せの中からｉ番目の組合せにＳＮＰ組合せ行列を更新する（ステップＳ４４）。次に、ステップＳ４４で選択したＳＮＰ組合せ行列について、図１２のステップＳ２１の処理を行って、識別誤差を算出する（ステップＳ４５）。 Next, the variable i is initialized to 0 (step S43). Next, update the SNP combination matrix i-th combination among the 2 ^V-number combinations are all combinations of either zero or one each of the V of the joining element (step S44). Next, with respect to the SNP combination matrix selected in step S44, the processing of step S21 of FIG. 12 is performed to calculate the identification error (step S45).

次に、変数ｉが２^Ｖに達したか否かを判定する（ステップＳ４６）。ステップＳ４６がＮＯであれば、変数ｉを１だけインクリメントして（ステップＳ４７）、ステップＳ４４以降の処理を繰り返す。ステップＳ４６がＹＥＳであれば、識別誤差の２^Ｖ個の組合せの中から最小の組合せにＳＮＰ組合せ行列を更新する（ステップＳ４８）。 Next, it is determined whether or not the variable i has ^{reached 2 V (step S46).} If step S46 is NO, the variable i is incremented by 1 (step S47), and the processes after step S44 are repeated. If step S46 is YES, the SNP combination matrix is updated to the smallest combination from ^{the 2 V combinations of identification errors (step S48).}

図１２のステップＳ２２の処理が終了すると、次に変数ｕが所定の制限回数Ｕに達したか否かを判定する（ステップＳ２３）。まだ制限回数Ｕに達していなければ、変数ｕを１だけインクリメントして（ステップＳ２４）、ステップＳ２２で生成された新たなＳＮＰ組合せ行列を用いて、ステップＳ１５以降の処理を繰り返す。 When the process of step S22 of FIG. 12 is completed, it is next determined whether or not the variable u has reached the predetermined limit number of times U (step S23). If the limit number U has not been reached yet, the variable u is incremented by 1 (step S24), and the processing after step S15 is repeated using the new SNP combination matrix generated in step S22.

このように、図１２の処理では、ＳＮＰ組合せ行列を更新しながら、Ｕ回にわたって、接合体要素を更新する処理が行われる。 As described above, in the process of FIG. 12, the process of updating the junction element is performed U times while updating the SNP combination matrix.

ステップＳ２０で変数ｕが制限回数Ｕに達したと判定されると、最終的にステップＳ２２で探索された接合体要素の組合せをバイオマーカー候補として出力する（ステップＳ２５）。 When it is determined in step S20 that the variable u has reached the limit number of times U, the combination of the conjugate elements finally searched in step S22 is output as a biomarker candidate (step S25).

図１６はステップＳ２２の出力形態の一例を示す図である。図１６のウインドウｗ１１（第１ウインドウ）は、特定ＳＮＰ指定部２にて指定される特定のＳＮＰを指定する。ウインドウｗ１２（第２ウインドウ）は、指定された特定のＳＮＰすべてを列記する。ウインドウｗ１３（第３ウインドウ）は、特定の疾病の種類を指定する。ウインドウＷ１４（第４ウインドウ）は、バイオマーカー候補内の各ＳＮＰを表示する。ユーザは、ウインドウｗ１４のバイオマーカー候補を確認した上で、ウインドウｗ１１にて特定のＳＮＰの指定をやり直して、resubmitボタンｂ２を押下して、再度図１２のバイオマーカー候補探索を行うことも可能である。 FIG. 16 is a diagram showing an example of the output form of step S22. The window w11 (first window) of FIG. 16 specifies a specific SNP designated by the specific SNP designation unit 2. Window w12 (second window) lists all the specified specific SNPs. Window w13 (third window) specifies a specific disease type. Window W14 (fourth window) displays each SNP in the biomarker candidate. After confirming the biomarker candidates in the window w14, the user can specify the specific SNP again in the window w11, press the resubmit button b2, and search for the biomarker candidates in FIG. 12 again. be.

ステップＳ２２の出力形態は、図１６に示した画面表示例に限定されるものではない。例えば、図１７は各ＳＮＰの組合せの識別精度を示すオッズ値または−ｌｏｇ(Ｐ値)をプロットした図である。横軸は特定のＳＮＰを含む個数、縦軸はオッズ値または−ｌｏｇ(Ｐ値)である。図１７の破線は利用可否識別閾値である。この利用可否識別閾値は、検索条件入力部２１にてユーザが入力した域値と、オッズ値または−ｌｏｇ(Ｐ値)の平均値または標準偏差値とを合算して生成されるものである。バイオマーカー候補のうち、ユーザが指定した特定のＳＮＰを一つでも含む場合に、特定のＳＮＰを利用可能として表示し、それ以外は、特定ＳＮＰ利用不可２３とエラー表示する。 The output form of step S22 is not limited to the screen display example shown in FIG. For example, FIG. 17 is a plot of odds values or -logs (P values) indicating the identification accuracy of each SNP combination. The horizontal axis is the number including a specific SNP, and the vertical axis is the odds value or -log (P value). The broken line in FIG. 17 is the availability identification threshold. This availability identification threshold value is generated by adding the region value input by the user in the search condition input unit 21 and the average value or standard deviation value of the odds value or −log (P value). Among the biomarker candidates, when at least one specific SNP specified by the user is included, the specific SNP is displayed as available, and in other cases, the error is displayed as the specific SNP unavailable 23.

図１７の結果を受けて、ユーザは、図１６を用いて、特定のＳＮＰや検索条件などの検索条件を変更するなどして、バイオマーカー候補の再探索を行うことができる。 Based on the result of FIG. 17, the user can re-search the biomarker candidate by changing the search condition such as a specific SNP or the search condition by using FIG.

このように、本実施形態では、ゲノム内の複数のＳＮＰから、特定の疾病に関連があると推測される特定のＳＮＰを予め入力し、入力された特定のＳＮＰと検体の形質情報とに基づいて、特定の疾病に関連があると推測されるＳＮＰを１つ以上含むバイオマーカー候補を探索する。これにより、医師の知見により、特定のＳＮＰが特定の疾病に関連があるという情報がわかっている場合には、その情報を考慮に入れて、バイオマーカー候補を探索できる。 As described above, in the present embodiment, a specific SNP presumed to be related to a specific disease is input in advance from a plurality of SNPs in the genome, and based on the input specific SNP and the trait information of the sample. Therefore, biomarker candidates containing one or more SNPs that are presumed to be related to a specific disease are searched for. Thereby, when the information that a specific SNP is related to a specific disease is known from the knowledge of a doctor, the biomarker candidate can be searched by taking the information into consideration.

また、本実施形態によれば、２以上のＳＮＰを含むバイオマーカー候補を探索できるため、複数のＳＮＰの組合せにより生じる疾病についても、そのＳＮＰの組合せを精度よく探索できる。 Further, according to the present embodiment, since the biomarker candidate including two or more SNPs can be searched, the combination of SNPs can be accurately searched for the disease caused by the combination of a plurality of SNPs.

さらに、本実施形態によれば、接合体要素の中から評価値が最大の接合体要素を選択する処理をＶ回行い、選択されたＶ個の接合体要素のそれぞれをＳＮＰとして選択するか否かの２^Ｖ通りについて識別誤差を算出し、識別誤差が最小のＳＮＰの組合せを最終的なバイオマーカー候補として選択するため、膨大なＳＮＰの情報から該当するＳＮＰの組合せを漏れなく、かつ短時間で選択することができる。 Further, according to the present embodiment, whether or not the process of selecting the joint element having the maximum evaluation value from the joint elements is performed V times and each of the selected V joint elements is selected as the SNP. calculating an identification error for Kano 2 ^V street, because the identification error is to select a minimum SNP combinations as final biomarker candidates, without omission combinations of SNP applicable from a large SNP information, a short time You can select with.

上述した実施形態で説明したバイオマーカー探索装置の少なくとも一部は、ハードウェアで構成してもよいし、ソフトウェアで構成してもよい。ソフトウェアで構成する場合には、バイオマーカー探索装置の少なくとも一部の機能を実現するプログラムをフレキシブルディスクやＣＤ−ＲＯＭ等の記録媒体に収納し、コンピュータに読み込ませて実行させてもよい。記録媒体は、磁気ディスクや光ディスク等の着脱可能なものに限定されず、ハードディスク装置やメモリなどの固定型の記録媒体でもよい。 At least a part of the biomarker search apparatus described in the above-described embodiment may be configured by hardware or software. When configured by software, a program that realizes at least a part of the functions of the biomarker search device may be stored in a recording medium such as a flexible disk or a CD-ROM, read by a computer, and executed. The recording medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed recording medium such as a hard disk device or a memory.

また、バイオマーカー探索装置の少なくとも一部の機能を実現するプログラムを、インターネット等の通信回線（無線通信も含む）を介して頒布してもよい。さらに、同プログラムを暗号化したり、変調をかけたり、圧縮した状態で、インターネット等の有線回線や無線回線を介して、あるいは記録媒体に収納して頒布してもよい。 Further, a program that realizes at least a part of the functions of the biomarker search device may be distributed via a communication line (including wireless communication) such as the Internet. Further, the program may be encrypted, modulated, compressed, and distributed via a wired line or wireless line such as the Internet, or stored in a recording medium.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.

１バイオマーカー探索装置、２特定ＳＮＰ指定部、３候補探索部、４候補出力部、５評価値算出部、６識別誤差算出部、７最小識別誤差選択部、１１ＳＮＰ形質ＤＢ、１２検体情報入力部、１３検体情報登録ＤＢ、１４ＳＮＰ情報登録ＤＢ、８行列初期化部、９最大接合体要素選択部、１０複数接合体要素選択部、１１ＳＮＰ形質ＤＢ、１２建託情報入力部、１３検体情報登録ＤＢ、１４ＳＮＰ情報登録ＤＢ、１５特定ＳＮＰ登録ＤＢ、１６関連ＳＮＰ登録ＤＢ、１７探索範囲ＳＮＰ選択部、１８選択ＳＮＰ登録ＤＢ、１９探索情報照合部、２０探索情報登録ＤＢ、２１検索条件入力部、２２バイオマーカー候補登録ＤＢ 1 Biomarker search device, 2 Specific SNP designation unit, 3 Candidate search unit, 4 Candidate output unit, 5 Evaluation value calculation unit, 6 Identification error calculation unit, 7 Minimum identification error selection unit, 11 SNP trait DB, 12 Specimen information input Part, 13 Specimen information registration DB, 14 SNP information registration DB, 8 Matrix initialization part, 9 Maximum junction element selection unit, 10 Multiple junction element selection unit, 11 SNP trait DB, 12 Construction information input unit, 13 Samples Information registration DB, 14 SNP information registration DB, 15 specific SNP registration DB, 16 related SNP registration DB, 17 search range SNP selection unit, 18 selection SNP registration DB, 19 search information collation unit, 20 search information registration DB, 21 search conditions Input section, 22 Biomarker candidate registration DB

Claims

A specific SNP designation unit that pre-designates a specific SNP known to be related to a specific disease from a plurality of SNPs (Single-Nucleotide Polymorphisms) in the genome.
Each conjugate element is selected as a combination of SNPs for each SNP based on the specific SNP , disease presence / absence information of each sample, and search information in which a junction type matrix representing the junction type of the SNP is registered. The process of calculating the evaluation value indicating the high possibility is performed V (V is an integer of 2 or more) times, and the V joint elements corresponding to the maximum value of the evaluation value each time are collected. For any combination of the conjugate elements, an identification error representing the degree to which the presence or absence of a disease in each sample could be correctly identified was calculated, the combination of the conjugate elements having the smallest identification error was selected, and the selection was made. By updating the biomarker candidates to the conjugate element and repeating the update U (U is an integer of 2 or more) times, a biomarker candidate containing two or more SNPs presumed to be related to a specific disease is searched for. Candidate search department and
A biomarker search device including a candidate output unit that outputs the biomarker candidate.

The biomarker search device according to claim 1, wherein the candidate search unit searches for the biomarker candidate containing two or more SNPs known to be related to the specific disease.

The biomarker search device according to claim 2, wherein the candidate output unit uses the combination of the joined elements selected with the minimum identification error as the biomarker candidate after the U times of processing.

The biomarker search device according to any one of claims 1 to 3, further comprising a search condition input unit for inputting the value of V and the value of U.

The candidate search unit initializes an SNP combination matrix in which the number of combinations of SNPs that can be biomarker candidates is the number of rows and the number of conjugate elements for a plurality of SNPs that can be biomarker candidates is the number of columns. Has a chemical unit,
A maximum joint element selection unit that acquires each joint element in the SNP combination matrix, calculates the evaluation value, and selects the joint element having the maximum evaluation value.
A plurality of joint element selection units that repeat the process of the maximum joint element selection unit V times and select a total of V different joint elements for each time of the processing.
Any one of claims 1 to 4 , further comprising an identification error calculation unit for calculating the identification error for all combinations of the V joint elements selected by the plurality of joint element selection units. The biomarker search device according to the section.

The maximum junction element selection unit is for each junction element in the k (k is an integer of 1 or more) rows in the SNP combination matrix, and each junction in rows other than the k row in the SNP combination matrix. The biomarker search device according to claim 5 , wherein the evaluation value is calculated based on mutual information with a body element.

The candidate output unit displays on a two-dimensional plane the correspondence between the number of the specific SNPs included in the biomarker candidate and the value representing the significance of the corresponding SNP for each SNP in the biomarker candidate. The biomarker search apparatus according to any one of claims 1 to 6.

The biomarker search apparatus according to claim 7 , wherein the value representing the significance includes at least one of a P value and an odds ratio.

The first window that specifies the specific SNP specified by the specific SNP designation unit, and
A second window listing all of the specified SNPs,
A third window that specifies the type of the particular disease,
A fourth window for specifying the conditions of the candidate search unit, and
The biomarker search device according to any one of claims 1 to 8 , further comprising a display control unit for displaying the above on the display screen of the display device.

The first window that specifies the specific SNP specified by the specific SNP designation unit, and
A second window listing all of the specified SNPs,
A third window that specifies the type of the particular disease,
A fourth window for displaying each SNP in the biomarker candidate and a display control unit for displaying on the display screen of the display device are provided.
The biomarker search according to any one of claims 1 to 8 , wherein the display control unit highlights the SNP included in the biomarker candidate among the specific SNPs designated in the first window. Device.

From multiple SNPs (Single-Nucleotide Polymorphisms) in the genome, a specific SNP known to be related to a specific disease is specified in advance.
Each conjugate element is selected as a combination of SNPs for each SNP based on the specific SNP , disease presence / absence information of each sample, and search information in which a junction type matrix representing the junction type of the SNP is registered. The process of calculating the evaluation value indicating the high possibility is performed V (V is an integer of 2 or more) times, and the V joint elements corresponding to the maximum value of the evaluation value of each time are collected. For any combination of the conjugate elements, an identification error representing the degree to which the presence or absence of a disease in each sample could be correctly identified was calculated, the combination of the conjugate elements having the smallest identification error was selected, and the selection was made. By updating the biomarker candidates to the conjugate element and repeating the update U (U is an integer of 2 or more) times, a biomarker candidate containing two or more SNPs presumed to be related to a specific disease is searched for. death,
A biomarker search method that outputs the biomarker candidate.

A procedure for pre-designating a specific SNP known to be related to a specific disease from a plurality of SNPs (Single-Nucleotide Polymorphisms) in the genome.
Each conjugate element is selected as a combination of SNPs for each SNP based on the specific SNP , disease presence / absence information of each sample, and search information in which a junction type matrix representing the junction type of the SNP is registered. The process of calculating the evaluation value indicating the high possibility is performed V (V is an integer of 2 or more) times, and the V joint elements corresponding to the maximum value of the evaluation value of each time are collected. For any combination of the conjugate elements, an identification error representing the degree to which the presence or absence of a disease in each sample could be correctly identified was calculated, the combination of the conjugate elements having the smallest identification error was selected, and the selection was made. By updating the biomarker candidates to the conjugate element and repeating the update U (U is an integer of 2 or more) times, a biomarker candidate containing two or more SNPs presumed to be related to a specific disease is searched for. And the procedure to do
A program for causing a computer to execute the procedure for outputting the biomarker candidate.