WO2022186673A1

WO2022186673A1 - Next-generation-sequencing-based rna sequencing panel for targeted genes, and analysis algorithm

Info

Publication number: WO2022186673A1
Application number: PCT/KR2022/003196
Authority: WO
Inventors: 신명근; 임하진
Original assignee: 전남대학교산학협력단; 케이블루바이오 주식회사
Priority date: 2021-03-05
Filing date: 2022-03-07
Publication date: 2022-09-09

Abstract

The present invention allowed the design of a targeted RNA-seq panel targeting 84 genes associated with hematologic malignancies and integrated, for a clinical diagnostic setting, with stepwise filtering, prioritization strategies and a bioinformatics pipeline. The system provides, in various clinical samples, a gene fusion identification ability that is more sensitive than conventional molecular methods. A transcriptome and clinically significant variants in expression profiling can be directly and simultaneously investigated using RNA-seq data even without additional parallel testing. The present invention provides a comprehensive tool for analyzing hematologic malignancies in a clinical laboratory to identify the advantages of a clinical laboratory-oriented targeted RNA-seq system, which increases the diagnostic yield for gene fusion detection and is for simplifying diagnostic steps.

Description

Next-generation sequencing-based target gene RNA sequencing panel and analysis algorithm

The present invention relates to a next-generation sequencing panel for leukemia diagnosis and a method for providing information for leukemia diagnosis using the same.

Cancer can develop in any tissue in the body, and cancer cells usually invade and destroy adjacent tissues, then gradually invade the circulatory system and metastasize to other parts of the body away from the site of the cancer, eventually killing the host (e.g. a human). make it die Cancer cells divide abnormally, and when observed under a microscope, normal tissues or cells lose their shape and exhibit abnormal functions. Cancer can accompany various genetic mutations depending on the type of tumor, and it has been reported that cell mutations have a significant effect on cancer development and progression.

Therefore, various methods for detecting gene mutations from cancer cells are being studied, and the detected mutation information can help a lot in the diagnosis of cancer patients and the selection of precisely customized anticancer drugs.

Existing genomic mutation detection methods use amplicons and probes designed to detect only one genomic mutation. Additional experiments are required, and there is a disadvantage that new mutations other than previously discovered mutations cannot be found.

In addition, in the existing method, a separate detection method (eg, SNV: real-time PCR, direct sequencing; expression level analysis: microarray, quantitative real-time PCR; or translocation: FISH; etc.) is performed according to each type of genomic mutation. Therefore, it takes a lot of time to detect all kinds of mutations in cancer tissue of one patient, and a large cost is incurred.

Recently, with the introduction of next-generation sequencing (NGS) techniques, it has become possible to analyze several cancer-related genes simultaneously, but the occurrence of a significant amount of false positive results is still a significant factor in diagnosing diseases and predicting prognosis. This means that many challenging factors remain in the application of bioinformatics).

An object of the present invention is to provide a next-generation sequencing panel for leukemia diagnosis targeting a specific gene and a method for providing information for leukemia diagnosis using the same.

1. Next-generation sequencing panel for leukemia diagnosis, including probes that specifically bind to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.

2. In the above 1, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1 , MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, and a next-generation for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of WT1 sequencing panel.

3. In the above 1, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1 , KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 and a next-generation for diagnosis of leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of ZNF384 sequencing panel.

4. In 1 above, ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214 , PCM1, TBP, TCL1A, TRB, TRG and a next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of TYK2.

5. according to 1 above, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 and ZNF384 A next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of.

6. In the above 1, at least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63 is specifically Next-generation sequencing panel for leukemia diagnosis further comprising a binding probe.

7. The leukemia according to the above 1, further comprising a probe that specifically binds to at least one selected from the group consisting of AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB and PCM1. A next-generation sequencing panel for diagnostics.

8. The next-generation sequencing panel for leukemia diagnosis according to the above 1, further comprising a probe that specifically binds to PDGFRA.

9. ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, ERG, ETV FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, SDHA, A next-generation sequencing panel for leukemia diagnosis, comprising a probe that specifically binds to TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384.

10. Selecting and sequencing a target gene by target capture hybridization with the sequencing panel of any one of 1 to 9 above to obtain read data;

checking whether PHB and PHB2 are overexpressed from the read data; and

From the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, leukemia diagnosis How to provide information for

11. In the above 10, whether the overexpression is performed by aligning the read data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2. , an informational method for diagnosing leukemia.

12. The method of 10 above, wherein the gene fusion detection is performed by aligning the read data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a STAR-Fusion or Fusion Catcher fusion gene identification tool. Phosphorus, information providing method for leukemia diagnosis.

13. The above 10, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, DEK, DUSP22, EBF1, EP300, ERG, FGFR3, FIP1L1, FUS, GAPDH , GATA2, GUSB, HBS1L, HPRT1, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYH11, NSDUP214, NUP98, NSDUP214 , PAX5, PBX1, PCM1, PDGFRA, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384 The method for providing information for diagnosing leukemia, further comprising the step of determining whether any one gene selected from the group is overexpressed, fused or mutated.

14. (a) binding cDNA synthesized from RNA isolated from an individual to each probe of the sequencing panel according to any one of claims 1 to 9 and performing next-generation sequencing (NGS) to obtain raw read data;

(b) adjusting the raw lead data to data having a quality score of Q10 or higher;

(c) detecting the fusion of each gene in the adjusted data;

(d) detecting a mutation compared to a reference sequence in each gene of the adjusted data; and

(e) confirming the expression of each gene from the adjusted data,

The detection of the fusion is performed by aligning the adjusted data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a fusion gene identification tool (STAR-Fusion, Fusion Catcher),

The detection of the mutation is to obtain the SAM / BAM data aligned with the STAR sequence of the adjusted data, classify and label the duplicates in the BAM data with Piccard, and the alignment, classification and deduplication BAM data with Freebayes SNV and This is done by calling Indel,

The expression of the gene is performed by aligning the adjusted data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2. Information for diagnosing leukemia How to provide.

The present invention relates to a next generation sequencing (NGS) panel for diagnosing leukemia comprising an RNA-seq panel that targets a specific gene. can provide

1 shows a bioinformatics pipeline of target RNA-sequencing data analysis for fusion detection, variant detection, and expression profiling.

2 relates to analytical validation of target RNA sequencing for carryover (A-B), repeatability (C-D) and linearity (E-F). A-B: All known fusions described in Table 4 were detected as true fusions in Run 1 with trace levels of carryover fusions in both STAR-Fusion (A) and FusionCatcher (B). Carryover fusions have significantly lower FFPM and read counts than true fusions (P <0.001). C-D: Known fusions within replicates showed reliable repeatability of read counts (C) and increased repeatability when normalized FFPM values were used (D). E-F: FFPM of samples diluted with BCR-ABL1(E) and PML-RARA(F) fusions showed a linear log2-fold change (r2=0.9852 and 0.9447, respectively) (FFPM: fusion fragments per million).

3 is a heat map and hierarchical clustering of 30 patients with hematologic malignancies and 3 normal controls by target RNA sequencing. The heatmap shows the normalized log2-fold change in gene expression in color, with both the target gene rows in the panel and the sample columns from patients and normal controls clustered. The top color bar shows the disease group of each sample and divides the disease groups into 4 distinct clusters (CS: clinical sample; NC: normal control; AML: acute myeloid leukemia; B-ALL: B-lymphoblastic leukemia) /lymphoma; T-ALL: T-lymphoblastic leukemia/lymphoma; MBN: mature B-cell tumor; MPN: myeloproliferative tumor; CML-BP: chronic myelogenous leukemia explosion; MDS/MPN: myelodysplastic/myeloproliferative Tumors; MLN: myeloid/lymphoid tumors).

4 and 5 show gene fusion frequencies detected in various types of leukemia obtained using the next-generation sequencing panel of the present invention. In Fig. 4, black bars indicate the frequency of patients with detected gene fusion in each leukemia type. Gene fusions were found in 77% (72) of 93 leukemia patients. Genetic fusion mutations were observed in 94% (33/35) of adult B-ALL patients and 83% (25/30) of pediatric B-ALL patients. 5 shows gene fusion patterns and frequencies for each type of leukemia. Among the gene fusions (n=35) found in adult B-ALL, the most common fusion gene was BCR-ABL1 (24/33, 73%), and the most common fusion gene mutation in pediatric B-ALL was ETV6-RUNX1 (4/ 26, 15%).

Figure 6 shows the comparative evaluation of analysis using the next-generation sequencing panel of the present invention and the existing commercialized targeted RNAseq analysis (B-ALL: B-lymphoblastic leukemia/lymphoma, APL: acute promyelocytic leukemia (acute promyelocytic) leukemia), AML: acute myeloid leukemia, T-ALL: T-lymphoblastic leukemia/lymphoma, FISH: fluorescence in situ hybridization.

The present invention provides a next-generation sequencing panel for leukemia diagnosis.

"Leukemia" of the present invention is a type of blood cancer in which blood cells, particularly white blood cells, abnormally proliferate, and may be, for example, acute myeloid leukemia, acute lymphocytic leukemia, chronic myelogenous leukemia or chronic lymphocytic leukemia.

"Diagnosis" of the present invention refers to any act of discovering and confirming that abnormal blood cells are excessively proliferated without inhibition and that the production of normal white blood cells, red blood cells, and platelets is suppressed, and the sensitivity of an individual to leukemia is determined. determining whether an individual currently has leukemia, or determining the prognosis of an individual afflicted with leukemia.

As used herein, the term “individual” refers to all animals, including humans, rats, mice, and livestock that have or can develop leukemia. As a specific example, it may be a mammal including a human.

The "probe" of the present invention refers to a single-stranded DNA or RNA fragment used in genetic engineering to search for a specific gene or other DNA sequence, and is produced through enzymatic chemical separation and purification or synthetic process, from several bases to several hundred bases in length. It may be a nucleic acid capable of specifically binding to the mRNA of a. The presence or absence of mRNA can be checked by labeling the probe with a radioactive isotope or an enzyme, and it can be designed and modified by a known method.

The "panel" of the present invention refers to a gene panel or a gene probe panel constructed using any combination of probes binding to multiple genes for leukemia diagnosis, and the combination includes, for example, 13, 14, 25, the entire set of probes for 29, 35, 41, 49 or 84, etc., or any subset or subcombination thereof.

The present invention provides a next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.

High sensitivity and specificity by detecting mutation, fusion and abnormal expression of PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC genes selected by the sequencing panel of the present invention Road leukemia can be diagnosed. For example, by detecting a fusion containing PHB and PHB2 overexpression (overexpression) and IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC genes, Philadelphia chromosome-like lymphoblasts with high sensitivity and specificity It can diagnose leukemia (Philladelphia chromosome-like-ALL).

The next-generation sequencing panel of the present invention is AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D , MLF1, MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, may further comprise a probe that specifically binds to at least one selected from the group consisting of TRA and WT1 have. Acute myeloid leukemia (AML) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.

The next-generation sequencing panel of the present invention is AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL , IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 and may further comprise a probe that specifically binds to at least one selected from the group consisting of ZNF384 have. B-lymphoblastic leukemia/lymphoma (B-ALL) with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene can be diagnosed

The next-generation sequencing panel of the present invention is ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3 , NUP214, PCM1, TBP, TCL1A, TRB, TRG and may further include a probe that specifically binds to at least one selected from the group consisting of TYK2. T-lymphoblastic leukemia/lymphoma (T-ALL) with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene can be diagnosed

The next-generation sequencing panel of the present invention is BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 And it may further include a probe that specifically binds to at least one selected from the group consisting of ZNF384. Mature B-cell neoplasm (MBN) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.

The next-generation sequencing panel of the present invention is specific to at least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63. It may further include a probe binding positively. Myeloproliferative neoplasms (MPN) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.

The next-generation sequencing panel of the present invention further comprises a probe that specifically binds to at least one selected from the group consisting of AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB and PCM1. can do. It is possible to diagnose myelodysplastic/myeloproliferative neoplasm (MDS/MPN) with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene. have.

The next-generation sequencing panel of the present invention may further include a probe that specifically binds to PDGFRA. Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement (MLN) can be diagnosed with high sensitivity and specificity by selecting a gene that specifically binds to the probe and detecting mutation, fusion, and expression abnormality.

The present invention is ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6 , FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10 , MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, SDHA RUNX1T1 , TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384 (84 genes) provides a next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds.

Leukemia can be diagnosed with high sensitivity and specificity by detecting mutations, fusions and abnormal expression of 84 genes selected by the sequencing panel of the present invention. More specifically, fusion genes such as ABL1-ETV6 and CSF1R-MEF2D described in Table 1, in particular, IGH-CRLF2, a fusion gene found in patients with Ph-like ALL, can be detected, thereby effectively diagnosing Ph-like ALL.

Table 1. Philadelphia Chromosome-like Lymphoblastic Leukemia (ph-like ALL) Related Fusion Genes.

kinasekinase	Fusion partner genesFusion partner genes
ABL1ABL1	ETV6, NUP214ETV6, NUP214
CSF1RCSF1R	MEF2DMEF2D
PDGFRBPDGFRB	EBF1, ETV6EBF1, ETV6
PDGFRAPDGFRA	FIP1L1FIP1L1
CRLF2CRLF2	IGHIGH
JAK2JAK2	BCR, EBF1, ETV6, PAX5, PCM1BCR, EBF1, ETV6, PAX5, PCM1
EPOREPOR	IGH, IGKIG, IG
NTRK3NTRK3	ETV6ETV6
FGFR1FGFR1	BCRBCR

The present invention provides a target capture hybridization method using a next-generation sequencing panel to obtain read data by selecting and sequencing a target gene; checking whether PHB and PHB2 are overexpressed from the read data; And from the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, Leukemia diagnosis It provides a method of providing information for

The sequencing panel of the present invention is as described above.

The target capture hybridization method of the present invention is a method of selecting a target gene target prior to detecting an abnormality (mutation, gene fusion, or expression abnormality) of a target gene, and a probe specifically binding to a target gene is available. For example, a target gene may be selected by a sequencing panel including a probe that specifically binds to a specific gene.

The target capture hybridization method of the present invention is performed after RNA extraction, cDNA synthesis, adapter ligation, and PCR from the genome to be analyzed.

The read data of the present invention is raw read data or adjusted data obtained by adjusting raw read data.

The adjustment of the raw lead data may be to filter only data having a quality score higher than or equal to a certain standard from the raw lead data, and the quality score is a value representing the estimation error probability in the raw data numerically, specifically, each It may be a Phred score, which is an index indicating the quality of the base.

Confirmation of overexpression may include obtaining SAM/BAM data by aligning the read data with a reference sequence; obtaining GTF data by calculating the expression of each gene from the SAM/BAM data; and normalizing the GTF data.

A method of obtaining SAM/BAM data by aligning read data with a reference sequence may be using HISAT2, and a method of obtaining GTF data by calculating the expression of each gene in SAM/BAM data may be using StringTie, and GTF data A method of normalizing ? may be using DESeq2.

The detection of the fusion may be to confirm the fusion gene by comparing the read data with the reference sequence. The reference sequence may be, for example, a reference sequence within each program such as Bowtie, STAR, Blat or Bowtie2, which is an algorithm or software. As a fusion gene identification tool, STAR-Fusion or Fusion Catcher can be used.

In addition, the present invention includes the steps of obtaining raw data by performing next-generation sequencing using a next-generation sequencing panel for leukemia diagnosis; adjustment step; detecting the fusion of each gene; detecting a mutation in each gene; And it provides an information providing method for leukemia diagnosis comprising the step of confirming the expression of each gene:

The next-generation sequencing (NGS) is a method of high-speed decoding of vast genome information by dividing the genome into countless fragments, analyzing and combining each nucleotide sequence, RNA extraction, cDNA synthesis, adapter ligation , which consists of the steps of target capture hybridization and sequencing. Each step may be performed by a method known in the art, and specifically, cDNA is synthesized from RNA extracted from a patient's blood sample, and adapter attachment, PCR performance and target capture hybridization are performed thereon. Sequence analysis (sequencing) may be performed on the library thus prepared.

The adjusting step may be to filter only data having a quality score above a certain standard from raw data, and the quality score is a numerical value representing the estimation error probability in the raw data, specifically, the quality of each base. It may be a Phred score, which is an index indicating A FASTQ file in which the nucleotide sequence and Phred score of each sequencing read are displayed together is called a FASTQ file.

If the Phred score is 20 (Q20), the probability that the corresponding nucleotide sequence result is an error is 1%, and when it is 30 (Q30), it is stipulated that it has an error probability of 0.1%. The visible bases are judged to have excellent sequencing quality and are used for further analysis.

The step of detecting the fusion of the gene is aligning with the reference sequence in each program with a sequence alignment algorithm or software Bowtie, STAR, Blat or Bowtie2, and discovering the fusion of the gene with a fusion gene identification tool (STAR-Fusion, Fusion Catcher) may include the step of

The sequence alignment algorithms Bowtie and Bowtie2 are at http://bowtie-bio.sourceforge.net/bowtie2/index.shtml, and STAR is at https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons At /03_alignment.html, Blat is available at https://genome.ucsc.edu/goldenPath/help/blatSpec.html.

STAR-Fusion, the fusion gene confirmation tool, is a program that discovers fusion transcript candidates using the STAR sequence aligner, and is available at https://github.com/STAR-Fusion/STAR-Fusion/wiki#RunnningStarF, Fusion Catcher is a software that identifies somatic cell fusion genes, metastases, and chimeras from RNA-seq data. It is available at https://github.com/ndaniel/fusioncatcher.

The step of detecting the mutation of the gene aligns the sequence of the adjusted data to obtain SAM / BAM data, classifies and labels duplicates in the SAM / BAM data with Piccard, and the alignment, classification and deduplication BAM data This may include calling SNVs and Indels with Freebayes.

The SAM data is a text file containing sequence alignment data. Each content is separated by tabs and contains alignment and mapping information. Transcripts or genomes of sequences sequenced through next-generation sequencing analysis This is a file in which the FASTQ file is re-mapped to the sequence. BAM data is also a compressed file containing the same information as SAM data. It has a smaller capacity than SAM data, so BAM files are mainly used in major programs that use large-capacity next-generation sequencing data.

The Piccard is a tool for controlling technical bias due to meaningless reads, i.e., duplicates, obtained by abnormally amplifying a single read or fragment in PCR, which is a library production process, https:/ Available at /broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates.

The Freebayes is a haplotype-based genetic variation detection tool useful for calling mutations in a population, and is available at https://github.com/freebayes/freebayes.

The SNV (Single nucleotide variant) refers to a single nucleotide variation, and is a concept encompassing SNP (Single nucleotide polymorphism, a mutation that exists at a frequency of 1% or more in a population), and Indel (Insertion/Deletion) is a short nucleotide sequence in the genome. This means that it is inserted or deleted.

The step of confirming the expression of the gene may include: aligning the adjusted data with a reference sequence with HISAT2 to obtain SAM/BAM data; obtaining GTF data by calculating the expression of each gene with StringTie; and normalizing the GTF data to DESeq2.

The HISAT2 is an alignment program for mapping next-generation sequencing reads to a human genome population and a single reference genome provided by the program, and is available at http://daehwankimlab.github.io/hisat2/.

The StringTie is a program that can efficiently assemble RNA-seq data into a potential transcript, specifically a full-length transcript representing multiple splicing variants for each locus. It can be assembled and quantified, and is available at http://ccb.jhu.edu/software/stringtie/index.shtml.

The GTF (Gene transfer format) data means data including annotation information on genes.

The DESeq2 is an analysis method that performs internal normalization in which the geometric mean is calculated for each gene in all samples, and then divides the number of genes in each sample by the mean. Normalization is an essential process for gene expression analysis to solve the problem that data cannot reflect the actual expression level of a gene due to anomalies such as data duplication.

Hereinafter, the present invention will be described in more detail through examples.

실시예 1. 84개 유전자를 표적으로 하는 차세대 염기서열분석 패널Example 1. Next-generation sequencing panel targeting 84 genes

1. 실험방법1. Experimental method

(1) Sample collection and preparation

Diagnostic samples included 1 human reference RNA (Cat no. 740000, Agilent Technologies), 1 human reference genome (NA12878), 4 validation samples with repeat gene fusions, 30 clinical samples and clones with/without repeat gene fusions. Include 14 normal peripheral blood (PB) samples without sexual blood disorders. All validation and clinical samples were from patients with hematological malignancies and were included when the patient diagnosis was well characterized. Patient diagnosis is based on microscopic findings of bone marrow aspirates (BM aspirates) and trepine biopsy sections, tissue immunostaining, immunoexpression, chromosomal analysis, FISH, multiplex RT-PCR, real-time PCR, and NGS DNA sequencing commonly used in clinical laboratories. This was done according to the WHO classification. The collection of patient and normal samples for the study was approved by the Institutional Review Board of Chonnam National University Hwasun Hospital (Approval No. CNUHH-2020-091).

Patient samples were obtained from ethylenediaminetetraacetic acid (EDTA) tubes. For the 4 validation samples, 8 patient samples of BM aspirates with high leukemia cell fraction (43% to 96% of cell number) and repeat gene fusions were pooled into pairs with identical fusions. Validation samples included fusions of BCR-ABL1, PML-RARA, RUNX1-RUNX1T1 and CBFBMYH11. Clinical samples included 6 acute myeloid leukemia (AML), 9 B lymphocytic leukemia/lymphoma (B-ALL), 4 T lymphocytic leukemia/lymphoma (TALL), 3 mature B-cell tumors, 6 MPN, 27 BM aspirates and 3 PB samples consisting of 1 myelodysplastic/myeloproliferative tumor (MDS/MPN), and 1 myeloid/lymphoid tumor with eosinophilia and gene rearrangements were included. Mononuclear cell layers were isolated from blood samples using Lymphoprep (Alere Technologies AS). RNA was extracted with an RNAqueous Isolation kit (Thermo Fisher Scientific) according to the manufacturer's instructions.

(2) Design and evaluation of target capture panels

A total of 84 genes associated with hematologic cancers (AML, ALL, lymphoma, MPN, and myeloid/lymphoid tumors with gene rearrangements) were selected based on previous literature.

The 84 genes and the types of leukemia associated with each gene are listed in Tables 2 and 2.

본 발명 패널이 표적으로 하는 84개의 유전자84 genes targeted by the present panel
ABL1ABL1	CCND1CCND1	EP300EP300	GUSBGUSB	JAK2JAK2	MRTFAMRTFA	PDGFRAPDGFRA	RBM15RBM15	TRGTRG
ABL2ABL2	CCND2CCND2	EPOREPOR	HBS1LHBS1L	KMT2AKMT2A	MYCMYC	PDGFRBPDGFRB	RUNX1RUNX1	TYK2TYK2
AFF1AFF1	CCND3CCND3	ERGERG	HPRT1HPRT1	MAFMAF	MYH11MYH11	PHBPHB	RUNX1T1RUNX1T1	WT1WT1
ALKALK	CRBNCRBN	ETV6ETV6	IGHIGH	MAFAMAFA	NSD2NSD2	PHB2PHB2	SDHASDHA	ZNF384ZNF384
BAALCBAALC	CREBBPCREBBP	FGFR1FGFR1	IGKIGK	MAFBMAFB	NTRK3NTRK3	PICALMPICALM	TBPTBP
BCL2BCL2	CRLF2CRLF2	FGFR3FGFR3	IGLIGL	MECOMMECOM	NUP214NUP214	PMLPML	TCF3TCF3
BCL6BCL6	CSF1RCSF1R	FIP1L1FIP1L1	IKZF1IKZF1	MEF2DMEF2D	NUP98NUP98	PPIAPPIA	TCL1ATCL1A
BCL9BCL9	DEKDEK	FUSFUS	IL2RBIL2RB	MLF1MLF1	PAX5PAX5	PSMB2PSMB2	TP63TP63
BCRBCR	DUSP22DUSP22	GAPDHGAPDH	IL3IL3	MLLT10MLLT10	PBX1PBX1	RAB7ARAB7A	TRATRA
CBFBCBFB	EBF1EBF1	GATA2GATA2	IRF4IRF4	MLLT3MLLT3	PCM1PCM1	RARARARA	TRBTRB

백혈병의 세부종류Subtypes of Leukemia		관련 유전자Related genes
1One	급성 골수성 백혈병(Acute myeloid leukemia, AML)Acute myeloid leukemia (AML)	ABL2, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, CRLF2, DEK, DUSP22, EBF1, EPOR, ETV6, FGFR3, FGFR1, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3, MRTFA, MYC, MYH11, NUP98, PCM1, PHB, PHB2, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, WT1ABL2, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, CRLF2, DEK, DUSP22, EBF1, EPOR, ETV6, FGFR3, FGFR1, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A MEF2D, MLF1, MLLT3, MRTFA, MYC, MYH11, NUP98, PCM1, PHB, PHB2, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, WT1
22	B-림프아구성 백혈병/림프종(B-lymphoblastic leukemia/lymphoma, B-ALL)B-lymphoblastic leukemia/lymphoma (B-ALL)	ABL1, ABL2, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, CRLF2, DEK, DUSP22, EP300, EPOR, ERG, FGFR1, FGFR3, FIP1L1, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3, PAX5, PBX1, PCM1, PHB, PHB2, PPIA, RAB7A, TCF3, ZNF384ABL1, ABL2, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, CRLF2, DEK, DUSP22, EP300, EPOR, ERG, FGFR1, FGFR3, FIP1L1, HBS1L, HPRT1, IGH IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3, PAX5, PBX1, PCM1, PHB, PHB2, PPIA, RAB7A, TCF3, ZNF384
33	T-림프아구성 백혈병/림프종(T-lymphoblastic leukemia/lymphoma, T-ALL)T-lymphoblastic leukemia/lymphoma (T-ALL)	ABL1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CRLF2, DEK, DUSP22, EPOR, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3, NUP214, PCM1, PHB, PHB2, TBP, TCL1A, TRB, TRG, TYK2ABL1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CRLF2, DEK, DUSP22, EPOR, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3 NUP214, PCM1, PHB, PHB2, TBP, TCL1A, TRB, TRG, TYK2
44	성숙 B세포 종양(Mature B-cell neoplasm, MBN)Mature B-cell neoplasm (MBN)	ABL1, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, CSF1R, TP63, DEK, DUSP22, ETV6, FGFR1, FGFR3, IGH, IGK, IGL, IKZF1, KMT2A, MYC, NTRK3, PAX5, PBX1, PHB, PHB2, PPIA, RAB7A, TCF3, TP63, ZNF384ABL1, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, CSF1R, TP63, DEK, DUSP22, ETV6, FGFR1, FGFR3, IGH, IGK, IGL, IKZF1, KMT2A, MYC, NTRK3, PAX5, PHB2, PHB2, PHB2 PPIA, RAB7A, TCF3, TP63, ZNF384
55	골수 증식성 종양 (Myeloproliferative neoplasms, MPN)Myeloproliferative neoplasms (MPN)	ABL1, AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, EPOR, ETV6, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, MYC, PHB, PHB2, PSMB2, TP63ABL1, AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, EPOR, ETV6, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, MYC, PHB, PHB2, PSMB2, TP63
66	골수 형성 이상/골수 증식 종양 (Myelodysplastic/myeloproliferative neoplasm, MDS/MPN)Myelodysplastic/myeloproliferative neoplasm (MDS/MPN)	ABL1, ABL2, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, ETV6, FGFR3, GATA2, IKZF1, MAFA, MAFB, MYC, PCM1, PHB, PHB2ABL1, ABL2, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, ETV6, FGFR3, GATA2, IKZF1, MAFA, MAFB, MYC, PCM1, PHB, PHB2
77	골수성/림프성 종양(Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement, MLN)Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement (MLN)	FGFR1, JAK2, PDGFRB, MYC, PDGFRA, PHB, PHB2FGFR1, JAK2, PDGFRB, MYC, PDGFRA, PHB, PHB2
88	Philadelphia chromosome-like ALLPhiladelphia chromosome-like ALL	ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, MYCABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, MYC

Custom oligonucleotide probes are designed to capture target genes. A DNA template (human genome reference NA12878) was sequenced to assess whether the probes uniformly captured the 84 genes of the panel. The overall average coverage diagram was visually inspected using deepTools, and coverage uniformity (%) was calculated as a percentage of the reference position 0.2 times higher than the average coverage for the target area.

(3) Analysis validation matrix and comparative analysis

Thirty assay validation matrices were prepared prior to the experiment (Table 4). One reference RNA, one normal sample, 4 validation samples with high tumor burden and 2 replicates from cancer cell lines were tested in the run (Run 1) to evaluate repeatability and carryover within the run. For the dilution test, two validation samples containing BCR-ABL1 and PML-RARA were tested at a first concentration of 1,500 ng at a 2-fold dilution (1:2, 1:4, 1:8) (Run 2). . Each replicate was tested again for inter-run validation (run 3). Then, 30 clinical samples and 13 normal samples were analyzed and compared with conventional FISH or RT-PCR methods for fusion gene detection and further tested for expression and mutation analysis.

Table 4. Assay validation matrix of target RNA sequencing for gene fusion detection

Run no.Run no.	샘플 색인 (sample index)sample index
Run no.Run no.	1One	22	33	44	55	66	77	88
1One	인간 참조 RNA^* human reference RNA ^*	정상 샘플normal sample	VS1 (BCR-ABL1)VS1 ( BCR-ABL1 )	VS1 replicate (BCR-ABL1)VS1 replicate ( BCR-ABL1 )	VS2 (PML-RARA)VS2 ( PML-RARA )	VS2 replicate (PML-RARA)VS2 replicate ( PML-RARA )	VS3 (RUNX1-RUNX1T1)VS3 ( RUNX1-RUNX1T1 )	VS4 (CBFB-MYH11)VS4 ( CBFB-MYH11 )
22	VS1-D1 [1:2 희석]VS1-D1 [1:2 dilution]	VS2-D1 [1:2 희석]VS2-D1 [1:2 dilution]	VS1-D2 [1:4 희석]VS1-D2 [1:4 dilution]	VS2-D2 [1:4희석]VS2-D2 [1:4 dilution]	VS1-D3 [1:8 희석]VS1-D3 [1:8 dilution]	VS2-D3 [1:8 희석]VS2-D3 [1:8 dilution]	VS1 replicate (BCR-ABL1)VS1 replicate ( BCR-ABL1 )	VS2 replicate (PML-RARA)VS2 replicate ( PML-RARA )
33	VS1-D1 replicate [1:2 희석]VS1-D1 replicate [1:2 dilution]	VS2-D1 replicate [1:2 희석]VS2-D1 replicate [1:2 dilution]	VS1-D2 replicate [1:4 희석]VS1-D2 replicate [1:4 dilution]	VS2-D2 replicate [1:4희석]VS2-D2 replicate [1:4 dilution]	VS1-D3 replicate [1:8 희석]VS1-D3 replicate [1:8 dilution]	VS2-D3 replicate [1:8 희석]VS2-D3 replicate [1:8 dilution]	VS1 replicate (BCR-ABL1)VS1 replicate ( BCR-ABL1 )	VS2 replicate (PML-RARA)VS2 replicate ( PML-RARA )

* Universal human reference RNA (Cat no. 740000, Agilent Technologies). Gene fusions in parentheses are known fusions in validation samples previously detected by multiplex RT-PCR or fluorescence in situ hybridization. VS (validation sample with known fusion); D (diluted sample).

(4) library preparation and target RNA-seq

cDNA synthesis, library preparation, and capture hybridization were performed using HEMEaccuTest RNA kit (NGenBio, Seoul, Korea). After removing ribosomal RNA from 800~1,500ng of extracted total RNA using NEBNext® rRNA Depletion kit (NEB), cDNA was synthesized and purified. Adapter ligation, PCR enrichment and target capture hybridization were performed according to the manufacturer's instructions. The concentration and size of the library were measured using a Qubit 2.0 Fluorometer (Invitrogen) and a 4200 TapeStation system (Agilent Technologies), respectively. Libraries were sequenced with 150bp paired-ends using Miseq Reagent Kit v3 (300 cycles) from MiseqDx (Illumina).

(5) bioinformatics pipeline

The bioinformatics pipeline used in this study is summarized and described in Figure 1. Sequencing output files of paired-end reads in FASTQ format were adjusted with a quality score of Q10. After adjustment, fusion transcripts were identified using both STAR-Fusion and FusionCatcher algorithms.

In fusion detection, predicted fusions were investigated using two parameters: fusion read counts and FFPM (fusion fragments per million). FFPM is a normalized value than the number of fusion reads and is available in STAR-Fusion. The aligned BAM file generated by STAR to detect nucleotide variants was processed with Picard, and then variant calling was performed in FreeBayes. Mutations are annotated using ANNOVAR and filtered based on the annotated information, including domain (exonic and splicing), function (non-synonymous, missense, nonsense, frame shift) and Frequencies (less than 1% in population databases and pathogenic or likely pathogenic in disease databases) are included. Filtered variants were graded according to the level of evidence for clinical significance, and stage 1 and stage 2 variants with clinically significant evidence were finally selected.

In expression analysis, adjusted reads were aligned using HISAT2 and then StringTie was used for transcript assembly and quantification of expression levels. The obtained alignment files were converted to BAM format using Samtools. Then, the read data were normalized using DESeq2.38. The log2-fold-change was calculated using the normalized number of reads from clinical samples and the average of 14 normal controls.

When determining the dysregulated genes, an arbitrary log2-fold-change cutoff was set to ±2.0. Data from the whole process were mapped to the human reference genome (GRCh37/hg19).

(6) Filtering and prioritization of fusion candidates

Predicted fusion candidates were filtered to exclude false-positive results and then classified using a stratified grading system according to the relevance of clinical symptoms with priority evidence in the literature. When adjusting the cascading criteria, fusion candidates were considered true fusions if: i) supported by a minimum number of reads (FFPM ≥ 0.1 and junctional reads ≥ 1), ii) short replicates , pseudogenes, not read-through or found in healthy populations or normal samples, iii) affecting expression levels of fusion partner genes or causing in-frame fusions; and iv) two fusion detection algorithms (FusionCatcher final result file and STAR-Fusion preliminary file and final result file) were considered true fusion.

Predicted truly positive fusions were classified according to previous guidelines for interpreting NGS results in cancer.

Phase 1 (good studies with consensus of field experts) and Phase 2 (several small published studies with some agreement; preclinical trials; or several case reports without agreement) fusions were selected using a previous tier grading system. For grading, expert consensus studies and fusion databases including ChimerDB and Mitelman databases were used.

If the predicted true-positive fusion was not associated with the patient's cancer type, the fusion was not considered stage 1 or stage 2, even if found in well-known studies and disease databases. Of all predicted true fusions in

grades

1 and 2, fusions verified by multiple RT-PCR, FISH, or direct sequencing are considered confirmed fusions, and fusions not otherwise identified are considered putative fusions. became

(7) fusion and mutation detection methods

Multiple RT-PCR was performed using the HemaVision kit (DNA Technology) targeting 28 translocations and 145 breakpoints. FISH is a dual fusion probe targeting IGH-CCND1 (MetaSystems), BCR-ABL1 (MetaSystems), RUNX1-RUNX1T1 (Abbott Molecular), PML-RARA (Abbott Molecular) and ETV6-RUNX1 (Abbott Molecular) and PDGFRB (MetaSystems). ), CBFB (MetaSystems) and KMT2A (Abbott Molecular) were performed using detachable probes. Direct sequencing was attempted if the predicted fusion in target RNA-seq was not verified by multiplex RT-PCR or FISH. cDNA synthesis was performed using PrimeScript™ II 1st strand cDNA synthesis kit (Takara) using 500-1,000 ng of total RNA. Using Takara ExTaq (Takara), 1 μL of cDNA was amplified with the following primers.

PrimerPrimer	서열order	서열번호SEQ ID NO:
PAX5-FPAX5-F	5'-AGATGCGGGGAGACTTGTT-3'5'-AGATGCGGGGAGACTTGTT-3'	1One
ARHGAP22-RARHGAP22-R	5'-CTGCACCCAGTCCTCCATGT-3'5'-CTGCACCCAGTCCTCCATGT-3'	22
DACH1-RDACH1-R	5'-GCTCATTGCCATGGTGACAG-3'5'-GCTCATTGCCATGGTGACAG-3'	33
PICALM-FPICALM-F	5'-ACCCCCTGTAATGGCCTATC-3'5'-ACCCCCTGTAATGGCCTATC-3'	44
MLLT10-RMLLT10-R	5'-CAGTGGCTGCTTTGCTTTCTC-3'5'-CAGTGGCTGCTTTGCTTTCTC-3'	55
MECOM-FMECOM-F	5'-CTGCATAGATGCCAGTCAACCA-3'5'-CTGCATAGATGCCAGTCAACCA-3'	66
MBNL1-RMBNL1-R	5'- CAGGCATCATGGCATTGGCTA-3'5'-CAGGCATCATGGCATTGGCTA-3'	77
MLLT3-RMLLT3-R	5'-TCGTGCAAGTGGAAGACGAC-3'5'-TCGTGCAAGTGGAAGACGAC-3'	88
CCND6-FCCND6-F	5'-TCCGAGAGTGAGTCCAGCTT-3'5'-TCCGAGAGTGAGTCCAGCTT-3'	99
PDGFRB-RPDGFRB-R	5'-CGGATCTCGTAACGTGGCTT-3'5'-CGGATCTCGTAACGTGGCTT-3'	1010

The size of the PCR product was measured using a 4200 TapeStation system (Agilent Technologies). All PCR steps used the 548 bp GADPH gene as a positive control.

Direct sequencing was performed using PCR products using the same forward and reverse primers of Macrogen (Seoul, Korea). Sequencing files were analyzed with SeqMan software (DNASTAR). If DNA-based PCR or sequencing results are available, all variants detected in the target RNA-seq are the results of DNA-based methods including quantitative real-time PCR (JAK2 MutaQuant assay kit, Ipsogen) and sequencing (HEMEaccuTest DNA kit; NGenBio). was confirmed by comparison with

(8) Statistical analysis

The average carryover and the actual number of fusions were compared using the Wilcoxon rank-sum test. Linear regression was performed to evaluate repeatability and linearity. Hierarchical clustering performed complete linkage using proximity measurement of Euclidean distance. All statistical analyzes were performed using R studio (Rstudio, Inc.).

2. 실험결과2. Experimental results

(1) Assay validation using validation samples

To investigate the target gene coverage included in the panel, a DNA template (human genome reference NA12878) was used because RNA samples can show various patterns depending on the gene expression and fusion of each sample.

The coverage plots showed uniform average coverage from the beginning to the end of the subject transcript. The uniformity of coverage (0.2×% of high base pairs relative to the total mean depth) was calculated to be 99.8%, showing uniform coverage for the target gene within the panel. Figure 2 shows the analytical performance of the targeted RNA-seq of the present invention. In an in-run test (run 1 in Table 4), all expected fusions were reliably detected after adjusting the filtering strategy from 6 positive samples and 1 reference RNA with known fusions. Before filtering, carryover fusions including BCR-ABL1, PML-RARA, RUNX1-RUNX1T1 and CBFB-MYH11 were observed in all 8 samples without each fusion.

The average log2 FFPMs for carryover fusion and true fusion were -0.37 and 5.04 in STAR-Fusion, respectively, and the average log2 fusion supporting reads for carryover fusion and true fusion were 2.30 and 9.62 in FusionCatcher, respectively. . Carryover fusions showed significantly lower log2 FFPM and log2 fusion support read values than true fusions (P < 0.001, FIGS. 2A and 2B ), and were filtered out due to the lower number of reads.

In both the intra-run and inter-run tests (runs 1-3 in Table 4), the number of reads from all replicates showed reliable repeatability (r2 = 0.9655; Figure 2C). When using the normalized FFPM value provided by STAR-Fusion, the result showed higher repeatability than when only the number of reads was used (r2 = 0.9874; FIG. 2D). In the 2-fold dilution test (runs 2 and 3 in Table 4), the two known fusions (BCR-ABL1 and PML-RARA) had high FFPM (>9.0) until 3-fold dilution (1:8 dilution). was stably detected. The FFPM of the diluted samples containing BCR-ABL1 and PML-RARA showed a linear log2 fold change (r2=0.9852 and 0.9447, respectively; FIGS. 2E and 2F), and the limit of detection was a 2-fold dilution assuming an FFPM cutoff of 0.1. This was predicted to happen 4-5 times (1:16~1:32).

(2) Gene fusion detection using clinical samples

In the first step, approximately 227 million transcript sequence reads were generated from 30 clinical samples. A total of 1,243 and 3,363 fusion transcripts meeting the minimum number of reads in raw reads were predicted by STAR-Fusion and FusionCatcher, respectively. After adjusting the filtering and prioritization strategies described in the experimental methods section, 40 and 211 fusion transcripts containing isoforms and reciprocal fusions were clinically significant in STAR-Fusion and FusionCatcher, respectively. It was selected as a reportable fusion (stage 1 and stage 2). After selecting the dominant homozygous gene and ignoring mutual fusions, a total of 30 fusions were finally curated.

Table 6 shows 30 of myeloid/lymphoid tumors with 6 AML, 9 B-ALL, 4 T-ALL, 3 mature B-cell tumors, 6 MPN, 1 MDS/MPN and 1 PDGFRB rearrangement. The final results of target RNA-seq compared to the conventional method using canine clinical samples are shown. Of the 13 known fusions, targeted RNA-seq detected 12 identical fusions and 1 mutual fusion of CCND1-IGH. In one sample with PDGFRB rearrangement, the partner gene was designated as CCDC6 in target RNA-seq, unlike conventional FISH, which was also confirmed by direct sequencing.

Table 6. Comparison of results between conventional methods (FISH or multiple RT-PCR) and target RNA-seq using 30 clinical samples in hematological malignancies

샘플Sample 번호number	진단Diagnosis	FISH or multiplex RT-PCRFISH or multiplex RT-PCR	표적 RNA-seqTarget RNA-seq
샘플Sample 번호number	진단Diagnosis	FISH or multiplex RT-PCRFISH or multiplex RT-PCR	확인된 융합◈Confirmed Fusion◈	추정 융합▣presumed fusion▣	변이체variant
CS1CS1	AMLAML	KMT2A-MLLT3KMT2A-MLLT3	KMT2A-MLLT3KMT2A-MLLT3
CS2CS2	AMLAML	PML-RARAPML-RARA	PML-RARAPML-RARA		GATA2 p.I379Gfs85WT1 p.T363Nfs27 WT1 p.P271Rfs20 GATA2* p.I379Gfs85 WT1* p.T363Nfs27 WT1* p.P271Rfs*20
CS3CS3	AMLAML	PML-RARAPML-RARA	PML-RARAPML-RARA	NUP98-TOP2B NUP98-TOP2B	WT1 p.K250Qfs3 WT1* p.K250Qfs*3
CS4CS4	AMLAML	음성voice	음성voice
CS5CS5	AMLAML	음성voice	음성voice
CS6CS6	AMLAML	음성voice	음성voice
CS7CS7	B-ALL B-ALL	BCR-ABL1BCR-ABL1	BCR-ABL1BCR-ABL1		ABL1 p.E255K ABL1 p.E255K
CS8CS8	B-ALL B-ALL	BCR-ABL1BCR-ABL1	BCR-ABL1BCR-ABL1
CS9CS9	B-ALL B-ALL	BCR-ABL1BCR-ABL1	BCR-ABL1BCR-ABL1	P2RY8- CRLF2§ P2RY8- CRLF2 §
CS10CS10	B-ALL B-ALL	KMT2A-AFF1KMT2A-AFF1	KMT2A-AFF1KMT2A-AFF1
CS11CS11	B-ALL B-ALL	ETV6-RUNX1 ETV6-RUNX1	ETV6-RUNX1 ETV6-RUNX1	ERG -DYRK1A§ IGH- PAX5§ ERG -DYRK1A § IGH-PAX5 §
CS12CS12	B-ALL B-ALL	음성voice	PAX5PAX5 -ARHGAP22-ARHGAP22	IGH- PAX5§ IGH-PAX5 §
CS13CS13	B-ALL B-ALL	음성voice	PAX5PAX5 -DACH1-DACH1
CS14CS14	B-ALL B-ALL	음성voice	음성voice	IGH- CRLF2§ P2RY8- CRLF2§ IGH- CRLF2 § P2RY8- CRLF2 §	JAK2 p.R683G JAK2 p.R683G
CS15CS15	B-ALL B-ALL	음성voice	음성voice
CS16CS16	T-ALLT-ALL	음성voice	PICALM-MLLT10PICALM-MLLT10
CS17CS17	T-ALLT-ALL	음성voice	음성voice
CS18CS18	T-ALLT-ALL	음성voice	음성voice	NUP214-ABL1NUP214-ABL1 ^§§	RUNX1 p.R162K RUNX1 p.R162K
CS19CS19	T-ALLT-ALL	음성voice	음성voice
CS20CS20	MCLMCL	IGH-CCND1IGH-CCND1	CCND1CCND1 -IGH-IGH
CS21CS21	B-CLLB-CLL	NTNT	음성voice	IGH- BCL2§ IGH- PAX5§ IGH- BCL2 § IGH-PAX5 §
CS22CS22	B-CLLB-CLL	NTNT	음성voice	IGH- BCL2§IGH- PAX5§ IGH- BCL2 § IGH-PAX5 §
CS23CS23	CML, BP (myeloid BP)CML, BP (myeloid BP)	BCR-ABL1BCR-ABL1	BCR-ABL1BCR-ABL1 MECOMMECOM -MBNL1-MBNL1		ABL1 p.Y253H ABL1 p.V299L ABL1 p.T315I IKZF1 p.S442fs ABL1 p.Y253H ABL1 p.V299L ABL1 p.T315I IKZF1 p.S442fs
CS24CS24	CML, BP (lymphoid BP)CML, BP (lymphoid BP)	BCR-ABL1BCR-ABL1	BCR-ABL1BCR-ABL1 PAX5PAX5 -MLLT3-MLLT3		ABL1 p.M244V ABL1 p.E255V ABL1 p.M244V ABL1 p.E255V
CS25CS25	CML, CPCML, CP	BCR-ABL1BCR-ABL1	BCR-ABL1BCR-ABL1
CS26CS26	PVPV	음성voice	음성voice		JAK2 p.V617F JAK2 p.V617F
CS27CS27	PVPV	음성voice	음성voice		JAK2 p.V617F JAK2 p.V617F
CS28CS28	PMFPMF	NTNT	음성voice		JAK2 p.V617F JAK2 p.V617F
CS29CS29	MDS/MPN-UMDS/MPN-U	음성voice	음성voice
CS30CS30	MLN with PDGFRB 재배열MLN with PDGFRB rearrangement	PDGFRB 유전자 재배열 PDGFRB gene rearrangement	CCDC6-PDGFRBCCDC6-PDGFRB

* All fusions and variants detected in target RNA-seq were sorted by a grading system according to level of evidence to determine clinical significance, and only steps 1 and 2 or higher were selected.

◈ Gene fusions detected by target RNA-seq and confirmed by multiplex RT-PCR, FISH or direct sequencing.

▣ Gene fusions detected by target RNA-seq but not confirmed by other multiplex RT-PCR, FISH, or direct sequencing analyses.

§ Fusion filtered in the final result by STAR-Fusion algorithm.

- Partner genes overexpressed in the fusion are bolded.

- FISH (fluorescence in situ hybridization); RT-PCR (reverse transcriptase-PCR); RNA-seq (RNA sequencing); CS (clinical sample); AML (acute myeloid leukemia); NOS (not otherwise specified); B-ALL (B-lymphocytic leukemia/lymphoma); T-ALL (T-lymphocytic leukemia/lymphoma); mantle cell lymphoma (MCL); B-CLL (B cell type, chronic lymphocytic leukemia); CML (chronic myeloid leukemia); BP (explosive phase); CP (chronic stage); PV (erythrocytosis); PMF (primary myelofibrosis); MDS/MPN-U (myelodysplastic/myeloproliferative tumor-not classifiable); MLN (myeloid/lymphoid tumor); NT (not tested).

Five fusion transcripts were newly detected as one- or two-step fusions in target RNA-seq using a stratified grading system, and their cleavage points were all confirmed by direct sequencing. These five additional fusions included PAX5-ARHGAP22 and PAX5-DACH1 in two B-ALL samples, PICALM-MLLT10 in one T-ALL sample, MECOM-MBNL1 and PAX5-MLLT3 in two CML-BP samples. Among these additional fusions, ARHGAP22, DACH1 and MBNL1 were non-target genes and can be assigned via partial hybridization with fusion partners in the target probe. We also found 12 putative fusions associated with disease, but could not be confirmed by direct sequencing.

Most putative fusions (10 out of 12 fusions) were shown to increase expression of partner genes, of which 7 putative fusions were IGH rearrangements (4 IGH-PAX5, 1 IGH-CRLF2 and 2 IGH-BCL2).

The remaining two putative fusions were predicted as disease-associated in-frame fusions (one NUP98-TOP2B in the AML sample, one NUP214-ABL1 in the T-ALL sample) and could not be detected by direct sequencing due to low expression.

(3) detection of mutations in clinical samples

Furthermore, targeted RNA-seq identified 16 variants (tier 1 or 2) in the expressed transcripts of 10 samples (Table 6). Four frame-shifting mutations in GATA2 and WT1 were found in two AML cases (clinical sample [CS] 2-3). In three cases of one B-ALL and two CML-BP (CS7 and CS23-24) samples, the M244V, Y253H, E255K/V, V299L and T315I mutations of ABL1 were assigned in the target RNA-seq, indicating that tyrosine It is associated with kinase inhibitor (TKI) resistance. Disease-associated variants, including JAK2 R683G, RUNX1 R162K and IKZF1 S442fs mutations, were detected in one B-ALL, one T-ALL and one CML-BP case (CS14, CS18 and CS23), respectively.

Three JAK2 V617F mutations were assigned in three BCR-ABL1-negative MPN samples, including two erythrocytosis and one primary myelofibrosis sample (CS26-28). All available cases with 15 of these variants were confirmed by DNA-based-NGS sequencing or real-time PCR.

(4) Expression analysis of clinical samples

3 shows a heatmap showing hierarchical clustering of 30 hematological malignancies and 3 normal controls. Hierarchical clustering generated four subtrees according to the basic structure of the expression data. These include cluster 1 (T-ALL 1 and AML), cluster 2 (B-cell leukemia and lymphoma), cluster 3 (2 T-ALL and AML), cluster 4 (MPN, other myeloid neuromas and 4 populations of normal controls) are included. With the exception of one case of B-ALL, clustering showed reliable divisions consistent with cancer subtypes and lineages of malignant cells.

(5) Interpretation of results

The present invention developed and validated a clinically applicable target RNA-seq system for 84 genes related to other hematological malignancies by considering the data as well as the basis of the previous literature. The platform of the present invention showed stable performance in assay validation, and efficiently detected a known gene and a new gene fusion. In addition, the targeted RNA-seq system showed better applicability to detect clinically significant sequence variants as well as expression features using 30 clinical samples from patients with hematological malignancies.

With regard to assay validation, target RNA-seq showed reliable performance in testing the range of target genes, repeatability between runs and within runs, and linearity. However, trace levels of carryover fusion were observed in the test. A valid explanation for this may be index-hopping or index-swapping, which was recently reported as an incorrect assignment of sequencing reads due to residual primers or adapters during cluster amplification in the Illumina platform. Although not all fusions were detected as questionable index-hopping fusions, fusion transcripts with top hits in the pool (mean FFPM=105.7) were falsely detected in the results of other samples. Suspicious index-hopping fusions have much lower read counts than true fusions (p<0.001), so they were filtered out based on low supporting read counts. In this context, RNA-seq data generated by the Illumina platform in a clinical setting should be interpreted with caution for trace-level transcriptome reads that show exactly the same breakpoints and sequences as top-hit fusions of other samples in the same pool. , should be inconsistent with the patient's clinical and pathological signs.

So far, the performance of oncogenic fusion detection using the NGS method has been significantly improved. This is largely due to a powerful bioinformatics tool for sorting short reads as well as a multi-layer filtering strategy to rule out false-positive fusions. In the present invention, a stepwise filtering strategy is used to remove false positive calls.

Of the 1,243 and 3,363 fusion candidates predicted by STAR-Fusion and FusionCatcher, respectively, 83 (6.7%) and 477 (14.2%) fusion transcripts were selected as true positive fusions for consideration first after 4 filtering steps. In particular, considering the oncogenic function of the fusion, candidates mostly found in short repeats, pseudogenes, read-through or healthy populations were removed from further evaluation, whereas aberrant expression of partner genes was removed. Induced fusions and in-frame fusions were included. Then, before grading fusions according to prioritized clinical evidence, not all fusions were sufficient to report clinically, unlike in the study setting, as many fusions had not been demonstrated in hematological malignancies. To solve this problem, we adopted a hierarchical rating system that classifies aberrations according to the level of evidence and determines their significance. By prioritizing fusions according to a hierarchical grading system, the number of fusion transcripts was narrowed down to 40 (3.2%) and 211 (6.3%) in STAR-Fusion and FusionCatcher results, respectively, indicating reportable fusions of clinical significance. . Likewise, proper filtering and prioritization strategies are essential for managing RNA-seq data in a clinical setting.

Finally, 18 identified fusions and 12 putative fusions were curated with target RNA-seq results of 30 clinical samples. Five of the 18 identified fusions were fusions unknown in previous FISH or multiple RT-PCR tests commonly used in clinical laboratories. Three of the five fusions were identified by targeting only one of the partner genes using a probe-hybridization method. Similarly, target RNA-seq designated the partner gene as non-target CCDC6 in one case of known PDGFRB rearrangements. As can be seen in these four cases, the hybrid capture method can efficiently improve the diagnostic rate for fusion detection because it can isolate adjacent flanking regions with the transcript of interest. The advantage of this targeting method is that it reduces the cost of oligonucleotide probe sets and probes for only one partner, especially for CRLF2-, ETV6-, KMT2A-, NUP98-, PAX5- and PDGFRA/B fusions with multiple partner genes. can be used to easily detect chimeric transcripts.

Although the putative fusion could not be confirmed experimentally, surrogate overexpression of the partner gene in the ten putative fusions appears to indicate a rearrangement between the two genetic regions. Most of these cases were IGH rearrangements with breakpoints in introns, and it was likely that a small residual DNA fraction could be detected by the FusionCatcher algorithm. In the STAR-Fusion algorithm, these IGH rearrangements were identified in the preliminary file, but filtered in the final result. Therefore, the use of target RNA-seq only as a fusion assay was insufficient to directly detect DNA-level rearrangements, but it could be supplemented with expression analysis. This suggests that the putative results of target RNA-seq with enhanced expression may guide further tests such as FISH in clinical settings.

While the identification of genomic variants relies primarily on DNA-based sequencing by NGS techniques, the use of RNA-seq is difficult due to the inherent complexity of transcripts spanning multiple exons at genomic locations. Although this hurdle has been overcome using splice-recognition mappers such as HISAT2, TopHat and STAR10-12, only a few RNA-seq studies have investigated the detection of variants in clinical diagnostic settings. In the present invention, interesting mutations were identified in RNA-seq data having clinical significance. Among other things, the simultaneous detection of ABL1 mutations associated with TKI resistance and BCR-ABL1 fusion in B-ALL and CML-BP patients demonstrated the advantage of targeted RNA-seq to enable faster diagnostic and therapeutic decisions.

In addition, negative results of BCR-ABL1 fusion and positive results of JAK2 V617F mutation were simultaneously confirmed in three MPN cases. Other prognostic and diagnostic variants were also found in 2 AML, 2 ALL and 1 CML-BP patients. Therefore, the study results of the present invention show that the use of target RNA-seq combined with reliable computational bioinformatics tools can simplify the diagnostic step without the need to perform additional DNA-based sequencing in parallel. show that there is

Targeted RNA-seq data can also measure molecular characteristics through expression analysis similar to what has been done with microarray technology over the past two decades. As mentioned above, expression data can help characterize driver events due to gene fusion. Of the 30 fusions of target RNA-seq, 6 and 9 showed overexpression of 5' and 3' partner genes, respectively. These results may be supported by some structural and functional mechanisms of oncogenic fusion. In oncogenic fusions causing dysregulation, the 5' or 3' fusion partner is overexpressed either by the contribution of the 3' partner with a highly stable UTR region, respectively, or by regulatory elements of the 5' partner gene.

In addition, subsequent analysis of expression data revealed distinct clustering of clinical samples according to cancer subtypes and cell lineages. Classification was based on several representative molecular features by subtype identifiers (eg, MECOM and EPOR overexpression in MPN and EBF1, PAX5 and TCL1A overexpression in B-ALL). This may support the diagnosis of subtypes or phylogenies in ambiguous cases or further discovery of new disease subtypes.

실시예 2. 본 발명의 차세대 염기서열분석 패널을 이용한, 백혈병 환자에서의 유전자 변이 분석Example 2. Analysis of genetic mutations in leukemia patients using the next-generation sequencing panel of the present invention

Bone marrow specimens were used for diagnosis of 93 confirmed leukemia patients (15 acute myeloid leukemia, 35 adult B-acute lymphoid leukemia, 30 childhood B-acute leukemia, and 13 T-acute lymphoid leukemia). In all 93 leukemia patients, tier 1 or tier 2 genetic mutations were observed in 72 (77%) of the gene fusion mutations. In pediatric B-acute lymphoid leukemia (B-ALL), gene fusion mutation was detected in 83% (25/30) of the target children, and in adult B-ALL, gene fusion mutation was observed in 94% (33/35). . In patients with acute myeloid leukemia (AML) and T-acute lymphoid leukemia (T-ALL), fusion gene mutations were 53% (8/15) and 46% (6/13), respectively (see FIGS. 4 and 5).

실시예 3. 본 발명의 차세대 염기서열분석 패널과 기존 패널의 비교평가Example 3. Comparative evaluation of the next-generation sequencing panel of the present invention and the existing panel

A commercialized targeted RNAseq analysis system (Engvall M, Cahill N, Jonsson BI, Hoglund M, Hallbook H, Cavelier L: Detection of leukemia gene) developed based on the method of constructing cDNA library for specific genes using anchored multiplex PCR. Fusions by targeted RNA-sequencing in routine diagnostics. BMC Med Genomics 2020, 13:106.) and the analysis system using the next-generation sequencing panel of the present invention were compared and evaluated. For this comparative evaluation, one B-ALL patient sample, two AML patient samples, and one patient sample each for acute promyelocytic leukemia and T-ALL were used. As a result, in the B-ALL patient sample accompanied by the IGH-CRLF2 gene fusion, the IGH-CRLF2 gene fusion was detected only by analysis using the next-generation sequencing panel of the present invention.

Philadelphia chromosome-like acute lymphoblastic leukemia (Ph-like ALL) is found in 20-25% of B-ALL, and it is known that about 61% of fusion gene mutations related to the CRLF2 gene are causative genes. Ph-like B-ALL showed a very poor prognosis and was classified as a new subtype of ALL. Therefore, diagnosis of Ph-like B-ALL through gene analysis in ALL, especially B-ALL, is very important for treating leukemia from the point of view of precision medicine. Therefore, it can be seen that the next-generation sequencing panel of the present invention is very useful for detecting Ph-like B-ALL rather than cDNA library-based targeted RNAseq using anchored multiplex PCR used in some clinical test sites (FIG. 6). In Figure 6, Our targeted RNA seq system uses a next-generation sequencing panel that targets 84 genes of the present invention, and the Commercial targeted RNA seq system analyzes gene fusions detected when using the previously commercialized targeted RNAseq analysis system. indicates.

Claims

A next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.
AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3 , MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, and a next-generation nucleotide sequence for diagnosis of leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of WT1 analysis panel.
AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1 , MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 and ZNF384 Next-generation nucleotide sequence for diagnosing leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of analysis panel.
ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214, PCM , TBP, TCL1A, TRB, TRG, and a next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of TYK2.
2. The composition of claim 1, consisting of BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 and ZNF384. A next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group.
The method according to claim 1, which specifically binds to at least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63. Next-generation sequencing panel for leukemia diagnosis further comprising a probe.
The method according to claim 1, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB and the next generation for diagnosis of leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of PCM1 sequencing panel.
The next-generation sequencing panel for leukemia diagnosis according to claim 1, further comprising a probe that specifically binds to PDGFRA.
ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, FG, ETV6 FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT3, MLLT10 MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, SDHA A next-generation sequencing panel for leukemia diagnosis, comprising a probe that specifically binds to TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384.
Obtaining read data by selecting and sequencing a target gene by target capture hybridization with the sequencing panel of any one of claims 1 to 9;

checking whether PHB and PHB2 are overexpressed from the read data; and

From the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, leukemia diagnosis How to provide information for
The method according to claim 10, wherein whether the overexpression is performed by aligning the read data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2, leukemia How to provide information for diagnosis.
11. The method of claim 10, wherein the gene fusion detection is performed by aligning the read data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a STAR-Fusion or Fusion Catcher fusion gene identification tool. An informational method for diagnosing leukemia.
11. AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, DEK, DUSP22, EBF1, EP300, ERG, FGFR3, FIP1L1, FUS, GAPDH, GATA2 , GUSB, HBS1L, HPRT1, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYH11, NSD2, NTRK3, PAX214, NUP98 , PBX1, PCM1, PDGFRA, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384 A method for providing information for leukemia diagnosis, further comprising the step of determining whether any one selected gene is overexpressed, fused or mutated.
(a) binding cDNA synthesized from RNA isolated from an individual to each probe of the sequencing panel of any one of claims 1 to 9 and performing next-generation sequencing (NGS) to obtain raw read data;

(b) adjusting the raw lead data to data having a quality score of Q10 or higher;

(c) detecting the fusion of each gene in the adjusted data;

(d) detecting a mutation compared to a reference sequence in each gene of the adjusted data; and

(e) confirming the expression of each gene from the adjusted data,

The detection of the fusion is performed by aligning the adjusted data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a fusion gene identification tool (STAR-Fusion, Fusion Catcher),

The detection of the mutation is to obtain the SAM / BAM data aligned with the STAR sequence of the adjusted data, classify and label the duplicates in the BAM data with Piccard, and the alignment, classification and deduplication BAM data with Freebayes SNV and This is done by calling Indel,

The expression of the gene is performed by aligning the adjusted data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2. Information for diagnosing leukemia How to provide.