JPWO2019200328A5

JPWO2019200328A5 -

Info

Publication number: JPWO2019200328A5
Application number: JP2020555454A
Authority: JP
Publication date: 2022-04-19
Anticipated expiration: 2039-04-12

Description

上記または下記に引用されるすべての特許出願、ウェブサイト、他の刊行物、受託番号などは、それぞれ個々の品目が、具体的かつ個別に、参照により組み込まれると示されるのと同程度に、あらゆる目的で、参照によりその全体が本明細書に組み込まれる。配列の異なるバージョンが、異なる時点で、１つの受託番号と関連付けられている場合、本出願の有効出願日の時点で受託番号と関連付けられているバージョンを意味する。有効出願日は、実際の出願日、または該当する場合には受託番号に言及している優先出願の出願日のうちの早い方を意味する。同様に、刊行物、ウェブサイトなどの異なるバージョンが、異なる時点で公開されている場合、別途示されない限り、本出願の有効出願日の時点でもっとも新しく公開されているバージョンを意味する。本開示の任意の特性、ステップ、要素、実施形態、または態様は、別途具体的に示されない限り、任意の他のものと組み合わせて使用することができる。本開示は、明確さおよび理解の目的で、図示および例を用いていくらか詳細に記載されているが、添付の特許請求の範囲の範囲内で、ある特定の変更および修正が実施されてもよいことは明らかであろう。
本発明は、例えば、以下の項目を提供する。
(項目１)
少なくとも部分的にコンピュータを使用して、遺伝子配列リードにおけるアライメントエラーを検出するための方法であって、
（ａ）前記コンピュータによって、対象からの生物学的試料中の無細胞核酸分子から得られた前記遺伝子配列リードを含む、配列情報を受信するステップと、
（ｂ）前記遺伝子配列リードを、参照配列に対してアライメントして、アライメントされた配列リードを生成するステップと、
（ｃ）前記アライメントされた配列リードから、遺伝子内融合ブレイクポイントを含む、遺伝子融合リードのセットを識別するステップと、
（ｄ）前記遺伝子内融合ブレイクポイントを含む領域内に遺伝子バリアントを含む、前記遺伝子融合リードのうちの１つまたは複数のサブセットを識別することによって、アライメントエラーを検出するステップであって、前記領域が、前記遺伝子内融合ブレイクポイントに隣接する１つまたは複数のヌクレオチドを含む、ステップとを含む、方法。
(項目２)
少なくとも部分的にコンピュータを使用して、対象の生物学的試料からの無細胞核酸分子において真の遺伝子バリアントを検出することにおいて、アライメントエラーを抑制するための方法であって、
（ａ）前記コンピュータによって、前記無細胞核酸分子から得られた配列リードを含む、配列情報を受信するステップと、
（ｂ）前記配列リードを、参照配列に対してアライメントして、アライメントされた配列リードを生成するステップと、
（ｃ）前記アライメントされた配列リードから、遺伝子内融合ブレイクポイントを含む、遺伝子融合リードのセットを識別するステップと、
（ｄ）前記遺伝子内融合ブレイクポイントを含む領域内に遺伝子バリアントを含む、前記遺伝子融合リードのうちの１つまたは複数のサブセットを識別することによって、アライメントエラーを検出するステップであって、前記領域が、前記遺伝子内融合ブレイクポイントに隣接する１つまたは複数のヌクレオチドを含む、ステップと、
（ｅ）前記１つまたは複数の遺伝子融合リードの前記サブセットにおける前記１つまたは複数の検出されたアライメントエラーの少なくとも一部分をフィルタリングして、フィルタリングされた配列リードを生成するステップと、
（ｆ）前記参照配列と比較して、真の遺伝子バリアントを含む、フィルタリングされた配列リードを検出するステップとを含む、方法。
(項目３)
少なくとも部分的にコンピュータを使用して、対象の試料からの無細胞核酸分子において真の遺伝子バリアントを検出することにおいて、アライメントエラーを抑制するための方法であって、
（ａ）前記コンピュータによって、前記無細胞核酸分子から得られたシーケンシングリードを含む、配列情報を受信するステップと、
（ｂ）前記配列リードを、参照配列に対してアライメントして、アライメントされた配列リードを生成するステップと、
（ｃ）前記アライメントされた配列リードから、遺伝子内融合ブレイクポイントを含む、遺伝子融合リードのセットを識別するステップと、
（ｄ）遺伝子バリアントを含む、前記遺伝子融合リードのうちの１つまたは複数のサブセットを識別することによって、アライメントエラーを検出するステップであって、前記遺伝子融合リードのうちの前記１つまたは複数の前記サブセットが、ＳＭＡＤ４、ＴＹＲＯ３、および／またはＲＡＦ１に対応する遺伝子配列を含む、ステップと、
（ｅ）前記遺伝子融合リードのうちの前記１つまたは複数の前記サブセットにおける前記１つまたは複数の検出されたアライメントエラーの少なくとも一部分をフィルタリングして、フィルタリングされた配列リードを生成するステップと、
（ｆ）前記参照配列と比較して、真の遺伝子バリアントを含む、フィルタリングされた配列リードを検出するステップとを含む、方法。
(項目４)
少なくとも部分的にコンピュータを使用して、遺伝子配列リードにおけるアライメントエラーを検出するための方法であって、
（ａ）前記コンピュータによって、対象からの生物学的試料中の無細胞核酸分子から得られた前記遺伝子配列リードを含む、配列情報を受信するステップと、
（ｂ）前記遺伝子配列リードを、参照配列に対してアライメントして、アライメントされた配列リードを生成するステップと、
（ｃ）前記アライメントされた配列リードから、遺伝子内融合ブレイクポイントを含む、遺伝子融合リードのセットを判定するステップと、
（ｄ）前記遺伝子内融合ブレイクポイントを含む領域内に遺伝子バリアントを含む、前記遺伝子融合リードのうちの１つまたは複数のサブセットを判定するステップであって、前記領域が、前記遺伝子内融合ブレイクポイントに隣接する１つまたは複数のヌクレオチドを含む、ステップと、
（ｅ）所定の基準を満たす前記領域内のそれぞれの遺伝子バリアントを、アライメントエラーとして識別するステップとを含む、方法。
(項目５)
少なくとも部分的にコンピュータを使用して、対象の試料からの無細胞核酸分子において真の遺伝子バリアントを検出することにおいて、アライメントエラーを抑制するための方法であって、
（ａ）前記コンピュータによって、前記無細胞核酸分子から得られたシーケンシングリードを含む、配列情報を受信するステップと、
（ｂ）前記配列リードを、参照配列に対してアライメントして、アライメントされた配列リードを生成するステップと、
（ｃ）前記アライメントされた配列リードから、遺伝子内融合ブレイクポイントを含む、遺伝子融合リードのセットを識別するステップと、
（ｄ）遺伝子バリアントを含む、前記遺伝子融合リードのうちの１つまたは複数のサブセットを識別することによって、アライメントエラーを検出するステップであって、前記遺伝子融合リードのうちの前記１つまたは複数の前記サブセットが、ＳＭＡＤ４、ＴＹＲＯ３、および／またはＲＡＦ１に対応する遺伝子配列を含む、ステップと、
（ｅ）前記遺伝子融合リードのうちの前記１つまたは複数の前記サブセットにおける前記１つまたは複数の検出されたアライメントエラーの少なくとも一部分をフィルタリングして、フィルタリングされた配列リードを生成するステップと、
（ｆ）前記参照配列と比較して、真の遺伝子バリアントを含む、フィルタリングされた配列リードを検出するステップとを含む、方法。
(項目６)
前記遺伝子融合リードの前記セットが、１つまたは複数のプロセスされた偽遺伝子（ＰＰＧ）に対応する、項目１から５のいずれか一項に記載の方法。
(項目７)
前記１つまたは複数のＰＰＧが、１つまたは複数の試料特異的ＰＰＧを含む、項目６に記載の方法。
(項目８)
前記１つまたは複数の試料特異的ＰＰＧにより、対象の集団において、前記対象が識別される、項目７に記載の方法。
(項目９)
前記１つまたは複数のＰＰＧが、ＳＭＡＤ４、ＧＮＡＳ、ＴＰ５３、ＲＡＦ１、ＣＤＫ４、ＴＹＲＯ３、ＭＡＰＫ１、ＳＴＫ１１、ＣＣＮＤ１、ＨＲＡＳ、ＭＥＴ、ＭＹＣ、およびＮＲＡＳからなる群に由来する、項目６に記載の方法。
(項目１０)
前記１つまたは複数のＰＰＧが、ＳＭＡＤ４、ＧＮＡＳ、ＴＰ５３、ＲＡＦ１、ＣＤＫ４、ＴＹＲＯ３、ＭＡＰＫ１、ＳＴＫ１１、ＣＣＮＤ１、ＨＲＡＳ、ＭＥＴ、ＭＹＣ、およびＮＲＡＳからなる群に由来する２つまたはそれを上回るＰＰＧを含む、項目６に記載の方法。
(項目１１)
前記１つまたは複数のＰＰＧが、ＳＭＡＤ４、ＧＮＡＳ、ＴＰ５３、ＲＡＦ１、ＣＤＫ４、ＴＹＲＯ３、ＭＡＰＫ１、ＳＴＫ１１、ＣＣＮＤ１、ＨＲＡＳ、ＭＥＴ、ＭＹＣ、およびＮＲＡＳからなる群に由来する３つまたはそれを上回るＰＰＧを含む、項目６に記載の方法。
(項目１２)
前記遺伝子バリアントまたは真の遺伝子バリアントが、単一ヌクレオチドバリアント（ＳＮＶ）または挿入もしくは欠失（インデル）を含む、項目１から１１のいずれか一項に記載の方法。
(項目１３)
前記遺伝子バリアントが、ＳＮＶを含む、項目１２に記載の方法。
(項目１４)
前記ＳＮＶが、イントロン－エクソン境界部に位置する、項目１２に記載の方法。
(項目１５)
前記ＳＮＶが、遺伝子コーディング配列（ＣＤＳ）内に位置する、項目１２に記載の方法。
(項目１６)
前記遺伝子バリアントが、インデルを含む、項目１２に記載の方法。
(項目１７)
前記領域が、前記遺伝子内融合ブレイクポイントに隣接する約２、４、６、８、１０、１５、または２０個のヌクレオチドを含む、項目１に記載の方法。
(項目１８)
前記１つまたは複数の検出されたアライメントエラーの一部分が、前記試料中の前記遺伝子内融合ブレイクポイントに対応する前記遺伝子内融合の画分よりも低いかまたはそれと同等である前記試料中の突然変異対立遺伝子画分を有する、前記検出されたアライメントエラーに基づいて、フィルタリングされる、先行する項目のいずれかに記載の方法。
(項目１９)
前記１つまたは複数の検出されたアライメントエラーの一部分が、事前に定義された臨床的に対処可能なバリアントのセットに属さない遺伝子バリアントを含む前記遺伝子融合リードに基づいて、フィルタリングされる、項目１８に記載の方法。
(項目２０)
前記試料が、血液、血漿、血清、尿、唾液、粘膜排出物、喀痰、糞便、および涙液からなる群から選択される体液試料である、先行する項目のいずれか一項に記載の方法。
(項目２１)
前記対象が、疾患または障害を有する、先行する項目のいずれか一項に記載の方法。
(項目２２)
前記疾患が、がんである、項目２１に記載の方法。
(項目２３)
前記対象の前記生物学的試料から、無細胞核酸分子を単離するステップを含む、先行する項目のいずれか一項に記載の方法。
(項目２４)
前記無細胞核酸分子が、ＤＮＡ、ＲＮＡ、またはこれらの組合せを含む、項目２３に記載の方法。
(項目２５)
前記無細胞核酸分子が、二本鎖ＤＮＡである、項目２４に記載の方法。
(項目２６)
シーケンシングの前に、分子バーコードを含む１つまたは複数のアダプターを、前記無細胞核酸分子に結合させて、タグ付けされた親ポリヌクレオチドを生成するステップをさらに含む、先行する項目のいずれか一項に記載の方法。
(項目２７)
前記アダプターが、前記無細胞核酸分子の両端に結合される、項目２６に記載の方法。
(項目２８)
前記無細胞核酸分子が、固有にバーコーディングされる、項目２６に記載の方法。
(項目２９)
前記無細胞核酸分子が、非固有にバーコーディングされる、項目２６に記載の方法。
(項目３０)
それぞれのバーコードが、選択された領域からシーケンシングされる多様な分子と組み合わせて、固有な分子の識別を可能にする、固定またはセミランダムなオリゴヌクレオチド配列を含む、項目２９に記載の方法。
(項目３１)
前記タグ付けされた親ポリヌクレオチドを増幅させて、子孫ポリヌクレオチドを生成するステップをさらに含む、項目２６に記載の方法。
(項目３２)
目的の標的配列に関して、前記子孫ポリヌクレオチドを選択的に濃縮させ、それによって、濃縮された子孫ポリヌクレオチドを生成するステップをさらに含む、項目３１に記載の方法。
(項目３３)
前記濃縮された子孫ポリヌクレオチドを増幅させるステップをさらに含む、項目３２に記載の方法。
(項目３４)
前記子孫ポリヌクレオチドまたは濃縮された子孫ポリヌクレオチドに、試料インデックス配列がタグ付けされる、項目３１から３３のいずれか一項に記載の方法。
(項目３５)
前記配列情報が、核酸シーケンサーから得られる、先行するいずれかの項目に記載の方法。
(項目３６)
前記遺伝子融合リードのセットが、シーケンシングされたペアエンドリードをアライメントおよび接続することによって識別される、先行する項目のいずれか一項に記載の方法。
(項目３７)
前記遺伝子融合リードのセットが、イントロン－エクソン境界部にまたがるカバレッジにおける不連続性に基づいて識別される、先行する項目のいずれか一項に記載の方法。
(項目３８)
前記事前に定義されたセットが、ＣＯＳＭＩＣ、ＴｈｅＣａｎｃｅｒＧｅｎｏｍｅ
Ａｔｌａｓ（ＴＣＧＡ）、またはＥｘｏｍｅＡｇｇｒｅｇａｔｉｏｎＣｏｎｓｏｒｔｉｕｍ（ＥｘＡＣ）において見出されるバリアントを含む、項目１９に記載の方法。
(項目３９)
少なくとも部分的にコンピュータを使用して、フィルタリングされたリード配列情報データセットを生成するための方法であって、
（ａ）対象から得られた生物学的試料中の無細胞核酸（ｃｆＮＡ）から得られた試験配列リードのセットにおいて、１つまたは複数のスプリット配列リードを識別するステップであって、それぞれのスプリット配列リードが、少なくとも１つのブレイクポイントを含む、ステップと、
（ｂ）前記試験配列リードのセットにおいて、（ｉ）所与のブレイクポイントから選択されたヌクレオチド数以内に少なくとも１つの配列バリアントを含む、前記スプリット配列リードのうちの１つもしくは複数の少なくとも一部分および／または前記試験配列リードのうちの１つもしくは複数の少なくとも一部分を抑制し、それによって前記フィルタリングされた配列情報データセットを生成するか、あるいは（ｉｉ）所与のブレイクポイントから選択されたヌクレオチド数以内に少なくとも１つの配列バリアントを含む前記スプリット配列リードの１つもしくは複数のベースコールおよび／または前記試験配列リードの１つもしくは複数のベースコールを抑制し、それによって、前記フィルタリングされた配列情報データセットを生成するステップとを含む、方法。 All patent applications, websites, other publications, accession numbers, etc. cited above or below, to the extent that each individual item is specifically and individually indicated to be incorporated by reference. For all purposes, the whole is incorporated herein by reference. If different versions of the sequence are associated with one accession number at different times, it means the version associated with the accession number as of the effective filing date of the application. Valid filing date means the earlier of the actual filing date or, where applicable, the filing date of the preferred application referring to the accession number. Similarly, if different versions of a publication, website, etc. are published at different times, it means the most recently published version as of the effective filing date of this application, unless otherwise indicated. Any property, step, element, embodiment, or embodiment of the present disclosure may be used in combination with any other, unless otherwise specified. This disclosure is described in some detail with illustrations and examples for clarity and understanding, but certain modifications and amendments may be made within the scope of the appended claims. It will be clear.
The present invention provides, for example, the following items.
(Item 1)
A method for detecting alignment errors in gene sequence reads, at least partially using a computer.
(A) A step of receiving sequence information by the computer, including the gene sequence read obtained from an acellular nucleic acid molecule in a biological sample from a subject.
(B) A step of aligning the gene sequence read with respect to a reference sequence to generate an aligned sequence read.
(C) A step of identifying a set of gene fusion reads, including an intragene fusion breakpoint, from the aligned sequence reads.
(D) A step of detecting an alignment error by identifying one or more subsets of the gene fusion reads, the region comprising the gene variant within the region containing the intragenic fusion breakpoint. A method comprising a step comprising one or more nucleotides flanking the intragene fusion breakpoint.
(Item 2)
A method for suppressing alignment errors in detecting true gene variants in cell-free nucleic acid molecules from a biological sample of interest, at least partially using a computer.
(A) A step of receiving sequence information by the computer, including sequence reads obtained from the cell-free nucleic acid molecule.
(B) A step of aligning the sequence read with respect to a reference sequence to generate an aligned sequence read.
(C) A step of identifying a set of gene fusion reads, including an intragene fusion breakpoint, from the aligned sequence reads.
(D) A step of detecting an alignment error by identifying one or more subsets of the gene fusion reads, the region comprising the gene variant within the region containing the intragenic fusion breakpoint. , Which comprises one or more nucleotides flanking the intragene fusion breakpoint.
(E) A step of filtering at least a portion of the one or more detected alignment errors in said subset of the one or more gene fusion reads to generate a filtered sequence read.
(F) A method comprising the step of detecting a filtered sequence read comprising a true gene variant as compared to said reference sequence.
(Item 3)
A method for suppressing alignment errors in detecting true gene variants in cell-free nucleic acid molecules from a sample of interest, at least partially using a computer.
(A) A step of receiving sequence information by the computer, including sequencing reads obtained from the cell-free nucleic acid molecule.
(B) A step of aligning the sequence read with respect to a reference sequence to generate an aligned sequence read.
(C) A step of identifying a set of gene fusion reads, including an intragene fusion breakpoint, from the aligned sequence reads.
(D) A step of detecting an alignment error by identifying one or more subsets of the fusion read, including the gene variant, the one or more of the fusion read. The step and the step, wherein the subset comprises the gene sequence corresponding to SMAD4, TYRO3, and / or RAF1.
(E) A step of filtering at least a portion of the one or more detected alignment errors in the one or more subsets of the gene fusion read to generate a filtered sequence read.
(F) A method comprising the step of detecting a filtered sequence read comprising a true gene variant as compared to said reference sequence.
(Item 4)
A method for detecting alignment errors in gene sequence reads, at least partially using a computer.
(A) A step of receiving sequence information by the computer, including the gene sequence read obtained from an acellular nucleic acid molecule in a biological sample from a subject.
(B) A step of aligning the gene sequence read with respect to a reference sequence to generate an aligned sequence read.
(C) A step of determining a set of gene fusion reads, including an intragene fusion breakpoint, from the aligned sequence reads.
(D) A step of determining one or a plurality of subsets of the gene fusion leads containing a gene variant in the region containing the intragenic fusion breakpoint, wherein the region is the intragenic fusion breakpoint. A step and a step containing one or more nucleotides adjacent to the
(E) A method comprising identifying each gene variant within said region that meets a predetermined criterion as an alignment error.
(Item 5)
A method for suppressing alignment errors in detecting true gene variants in cell-free nucleic acid molecules from a sample of interest, at least partially using a computer.
(A) A step of receiving sequence information by the computer, including sequencing reads obtained from the cell-free nucleic acid molecule.
(B) A step of aligning the sequence read with respect to a reference sequence to generate an aligned sequence read.
(C) A step of identifying a set of gene fusion reads, including an intragene fusion breakpoint, from the aligned sequence reads.
(D) A step of detecting an alignment error by identifying one or more subsets of the fusion read, including the gene variant, the one or more of the fusion read. The step and the step, wherein the subset comprises the gene sequence corresponding to SMAD4, TYRO3, and / or RAF1.
(E) A step of filtering at least a portion of the one or more detected alignment errors in the one or more subsets of the gene fusion read to generate a filtered sequence read.
(F) A method comprising the step of detecting a filtered sequence read comprising a true gene variant as compared to said reference sequence.
(Item 6)
The method according to any one of items 1 to 5, wherein the set of the gene fusion reads corresponds to one or more processed pseudogenes (PPGs).
(Item 7)
6. The method of item 6, wherein the one or more PPGs comprises one or more sample-specific PPGs.
(Item 8)
7. The method of item 7, wherein the subject is identified in a population of subjects by the one or more sample-specific PPGs.
(Item 9)
6. The method of item 6, wherein the one or more PPGs are derived from the group consisting of SMAD4, GNAS, TP53, RAF1, CDK4, TYRO3, MAPK1, STK11, CCND1, HRAS, MET, MYC, and NRAS.
(Item 10)
The one or more PPGs include two or more PPGs from the group consisting of SMAD4, GNAS, TP53, RAF1, CDK4, TYRO3, MAPK1, STK11, CCND1, HRAS, MET, MYC, and NRAS. , Item 6.
(Item 11)
The one or more PPGs include three or more PPGs from the group consisting of SMAD4, GNAS, TP53, RAF1, CDK4, TYRO3, MAPK1, STK11, CCND1, HRAS, MET, MYC, and NRAS. , Item 6.
(Item 12)
The method of any one of items 1-11, wherein the gene variant or true gene variant comprises a single nucleotide variant (SNV) or an insertion or deletion (Indel).
(Item 13)
12. The method of item 12, wherein the gene variant comprises SNV.
(Item 14)
The method of item 12, wherein the SNV is located at the intron-exon boundary.
(Item 15)
The method of item 12, wherein the SNV is located within a gene coding sequence (CDS).
(Item 16)
12. The method of item 12, wherein the gene variant comprises an indel.
(Item 17)
The method of item 1, wherein the region comprises approximately 2, 4, 6, 8, 10, 15, or 20 nucleotides flanking the intragene fusion breakpoint.
(Item 18)
Mutations in the sample in which a portion of the one or more detected alignment errors is lower or equivalent to the fraction of the intragenic fusion corresponding to the intragenic fusion breakpoint in the sample. The method according to any of the preceding items, which has an allelic fraction and is filtered based on the detected alignment error.
(Item 19)
A portion of said one or more detected alignment errors is filtered based on said gene fusion read containing genetic variants that do not belong to a predefined set of clinically manageable variants, item 18. The method described in.
(Item 20)
The method according to any one of the preceding items, wherein the sample is a body fluid sample selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal excretion, sputum, feces, and tears.
(Item 21)
The method according to any one of the preceding items, wherein the subject has a disease or disorder.
(Item 22)
21. The method of item 21, wherein the disease is cancer.
(Item 23)
The method according to any one of the preceding items, comprising the step of isolating a cell-free nucleic acid molecule from the biological sample of the subject.
(Item 24)
23. The method of item 23, wherein the cell-free nucleic acid molecule comprises DNA, RNA, or a combination thereof.
(Item 25)
24. The method of item 24, wherein the cell-free nucleic acid molecule is double-stranded DNA.
(Item 26)
Any of the preceding items, further comprising the step of binding one or more adapters containing a molecular barcode to the acellular nucleic acid molecule to generate a tagged parent polynucleotide prior to sequencing. The method described in paragraph 1.
(Item 27)
26. The method of item 26, wherein the adapter is attached to both ends of the cell-free nucleic acid molecule.
(Item 28)
26. The method of item 26, wherein the cell-free nucleic acid molecule is uniquely barcoded.
(Item 29)
26. The method of item 26, wherein the cell-free nucleic acid molecule is barcoded non-uniquely.
(Item 30)
29. The method of item 29, wherein each barcode comprises a fixed or semi-random oligonucleotide sequence that, in combination with a variety of molecules sequenced from selected regions, allows identification of unique molecules.
(Item 31)
26. The method of item 26, further comprising the step of amplifying the tagged parent polynucleotide to produce a progeny polynucleotide.
(Item 32)
31. The method of item 31, further comprising the step of selectively enriching the progeny polynucleotide with respect to the target sequence of interest, thereby producing the enriched progeny polynucleotide.
(Item 33)
32. The method of item 32, further comprising the step of amplifying the enriched progeny polynucleotide.
(Item 34)
The method of any one of items 31-33, wherein the sample index sequence is tagged with the progeny polynucleotide or the enriched progeny polynucleotide.
(Item 35)
The method according to any of the preceding items, wherein the sequence information is obtained from a nucleic acid sequencer.
(Item 36)
The method according to any one of the preceding items, wherein the set of gene fusion reads is identified by aligning and connecting sequenced paired end reads.
(Item 37)
The method according to any one of the preceding items, wherein the set of gene fusion reads is identified based on the discontinuity in coverage across the intron-exon boundary.
(Item 38)
The predefined set is COSMIC, The Cancer Genome.
19. The method of item 19, comprising variants found in Atlas (TCGA), or Exome Aggregation Consortium (ExAC).
(Item 39)
A method for generating a filtered read sequence information dataset, at least partially using a computer.
(A) A step of identifying one or more split sequence reads in a set of test sequence reads obtained from acellular nucleic acid (cfNA) in a biological sample obtained from a subject, each split. A step in which the sequence read contains at least one breakpoint ...
(B) In the set of test sequence reads, (i) at least a portion of the split sequence reads and at least a portion of the split sequence reads comprising at least one sequence variant within the number of nucleotides selected from a given breakpoint. / Or suppress at least one or more portions of the test sequence read, thereby producing the filtered sequence information data set, or (ii) the number of nucleotides selected from a given breakpoint. Suppressing one or more base calls of the split sequence read and / or one or more base calls of the test sequence read, thereby containing at least one sequence variant within the filtered sequence information data. A method, including steps to generate a set.

Claims

A method for detecting alignment errors in gene sequence reads, at least partially using a computer.
(A) A step of receiving sequence information by the computer, including the gene sequence read obtained from an acellular nucleic acid molecule in a biological sample from a subject.
(B) A step of aligning the gene sequence read with respect to a reference sequence to generate an aligned sequence read.
(C) A step of identifying a set of gene fusion reads, including an intragene fusion breakpoint, from the aligned sequence reads.
(D) A step of detecting an alignment error by identifying one or more subsets of the gene fusion reads, the region comprising the gene variant within the region containing the intragenic fusion breakpoint. A method comprising a step comprising one or more nucleotides flanking the intragene fusion breakpoint.

A method for suppressing alignment errors in detecting a true gene variant in a cell-free nucleic acid molecule from a biological sample of interest, at least partially using a computer, according to claim 1. Including executing (a) to (d) to be performed.
(E) A step of filtering at least a portion of the one or more detected alignment errors in the subset of the one or more gene fusion reads to generate a filtered sequence read.
(F) A method further comprising the step of detecting a filtered sequence read comprising a true gene variant as compared to said reference sequence.

The set of said gene fusion reads corresponds to one or more processed pseudogenes (PPGs), eg, for example.
i) Whether the one or more PPGs include one or more sample-specific PPGs, eg, the one or more sample-specific PPGs identify the subject in a population of subjects.
ii) Whether the one or more PPGs are derived from the group consisting of SMAD4, GNAS, TP53, RAF1, CDK4, TYRO3, MAPK1, STK11, CCND1, HRAS, MET, MYC, and NRAS.
iii) Two or more PPGs from which the one or more PPGs are derived from the group consisting of SMAD4, GNAS, TP53, RAF1, CDK4, TYRO3, MAPK1, STK11, CCND1, HRAS, MET, MYC, and NRAS. Including or
iv) Three or more PPGs from which the one or more PPGs are derived from the group consisting of SMAD4, GNAS, TP53, RAF1, CDK4, TYRO3, MAPK1, STK11, CCND1, HRAS, MET, MYC, and NRAS. including,
The method according to claim 1 or 2 .

The gene variant or true gene variant comprises a single nucleotide variant (SNV) or insertion or deletion (indel), eg,
i) Does the gene variant contain SNV?
ii) Is the SNV located at the intron-exon boundary?
iii) The SNV is located within the gene coding sequence (CDS) or
iv) The gene variant comprises an indel.
The method according to any one of claims 1 to 3 .

The method of any one of claims 1-4 , wherein the region comprises about 2, 4, 6, 8, 10, 15, or 20 nucleotides flanking the intragene fusion breakpoint.

Mutations in the sample in which a portion of the one or more detected alignment errors is lower or equivalent to the fraction of the intragenic fusion corresponding to the intragenic fusion breakpoint in the sample. Filtered based on said detected alignment error with an allelic fraction , and optionally, a portion of said one or more detected alignment errors can be addressed in a predefined clinical manner. Filtered based on said gene fusion read containing gene variants that do not belong to a set of variants, eg, the predefined set is COSMIC, The Cancer Genome Atlas (TCGA), or Exome Aggregation Consortium (ExAC). The method according to any one of claims 1 to 5 , which comprises the variant found in) .

i) The sample is a body fluid sample selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal excretion, sputum, feces, and tears, and / or .
ii) The subject has a disease or disorder, eg, the disease is cancer.
The method according to any one of claims 1 to 6 .

The step of isolating the cell-free nucleic acid molecule from the biological sample of the subject comprises , optionally, the cell-free nucleic acid molecule comprising DNA, RNA, or a combination thereof, eg, the absent. The method according to any one of claims 1 to 7 , wherein the cellular nucleic acid molecule is double-stranded DNA .

Prior to sequencing, one or more adapters containing molecular barcodes may be attached to the cell-free nucleic acid molecule to further include the step of generating a tagged parent polynucleotide , optionally.
i) Whether the adapter is attached to both ends of the cell-free nucleic acid molecule
ii) The cell-free nucleic acid molecule is uniquely barcoded or
iii) The cell-free nucleic acid molecule is non-uniquely bar-coded, eg, each barcode can be combined with a variety of molecules sequenced from selected regions to allow identification of unique molecules. Containing fixed or semi-random oligonucleotide sequences,
The method according to any one of claims 1 to 8 .

The step of amplifying the tagged parent polynucleotide to generate a progeny polynucleotide is further included , and if necessary, the progeny polynucleotide is selectively enriched with respect to the target sequence of interest, whereby. 9. The method of claim 9 , further comprising the step of producing the enriched progeny polynucleotide, and further comprising the step of amplifying the enriched progeny polynucleotide, if necessary .

10. The method of claim 10 , wherein the sample index sequence is tagged with the progeny polynucleotide or the enriched progeny polynucleotide.

i) The sequence information is obtained from a nucleic acid sequencer .
ii) The set of gene fusion reads is identified and / or by aligning and connecting sequenced paired-end reads.
iii) The set of gene fusion reads is identified based on the discontinuity in coverage across the intron-exon boundary.
The method according to any one of claims 1 to 11.