WO2016076524A1 - Standardized quantitative analysis method for nucleic acid, applying sing-pcrseq method - Google Patents

Standardized quantitative analysis method for nucleic acid, applying sing-pcrseq method Download PDF

Info

Publication number
WO2016076524A1
WO2016076524A1 PCT/KR2015/009823 KR2015009823W WO2016076524A1 WO 2016076524 A1 WO2016076524 A1 WO 2016076524A1 KR 2015009823 W KR2015009823 W KR 2015009823W WO 2016076524 A1 WO2016076524 A1 WO 2016076524A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
sample
standardized
value
quantitative
Prior art date
Application number
PCT/KR2015/009823
Other languages
French (fr)
Korean (ko)
Inventor
정상균
오수아
Original Assignee
한국한의학연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국한의학연구원 filed Critical 한국한의학연구원
Publication of WO2016076524A1 publication Critical patent/WO2016076524A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present invention relatives after mixing the dielectric for the determination target sample utilizing the similarity between closely related species sequence by co-amplification (co-amplification) sequence analysis of the competition in which PCR amplicons (piking- s i n g n eighbor enome-coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method, the method of quantifying the nucleic acid in the sample.
  • PCR amplicons piking- s i n g n eighbor enome-coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method, the method of quantifying the nucleic acid in the sample.
  • RNA-seq method which has been developed as a nucleic acid quantification method, can generate hundreds of millions of sequence reads through a single run, thus allowing for abundance ratios and sequence variants between sequences at the whole transcript level.
  • RNA-seq methods allow quantitative comparisons between different samples for the same sequence with high efficiency and reproducibility.
  • the relative quantitative value between different transcripts in the same cell is difficult to deal with when the biase appears, so it is impossible to quantitatively analyze nucleic acids by RNA-seq method alone.
  • RNA-seq methods have been developed that target specific groups of transcripts, including tiling arrays (Mercer, R. et al. , Nature biotechnology, 30: 99-104, 2012). Or constructing a library of target sequences captured using soluble probes (Levin, JZ et al. , Genomic biology, 10: R115, 2009) or spiked-in by multiplexed competitive PCR. There are methods with steps.
  • the present inventors have expanded the concept of nucleic acid quantitative analysis methods developed before, and have tried to develop accurate and simple ultra-fast nucleic acid quantitative analysis methods.
  • competitors' genomes e.g. use, as long competitor PCR amplicon sequencing associated with the amplifier closest relative dielectric by quantifying the competitor sequences within the amplicon to enable the next generation of high-speed sequencing (piking- s i n g n eighbor enome-coupled competitive PCR amplicon seq uencing, SiNG-PCRseq) method was developed.
  • iPSC induced pluripotent stem cells
  • the SiNG-PCRseq method of the present invention can be usefully used in the standardized analysis method for quantifying nucleic acid in a sample.
  • An object of the present invention is closely related species then mixing the dielectric for the determination target sample utilizing the similarity between closely related species sequence by co-amplification (co-amplification) sequence analysis of the competition in which PCR amplicons (piking- s i n g n eighbor enome- Through the coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method, a high-throughput nucleic acid quantification method that can accurately and easily obtain the relative quantitative values of different nucleic acids in a sample regardless of the experimental conditions or the type of sample. To provide.
  • step ii) amplifying the DNA sample mixed in step i) by multiplex PCR to obtain an amplicon
  • step iii) sequencing the sequence library comprising the amplicons obtained in step ii) and comparing it with a reference sequence to quantify the read count;
  • step iv) calculating the fraction of sequence ( ⁇ A) of the target sequence (SEQ ID A) to the amplicon obtained in step ii) using the read value quantified in step iii);
  • step ii) calculating a standardized ratio between different genes in one species of interest using the standardized quantitative value of step i). to provide.
  • Gene transcripts which are characterized by a disease state having a transitional characteristic to the disease or disease, for the target cell isolated from the diseased or suspected disease, using the standardized quantitative method of the present invention Or calculating a standardized quantity of target sequences in the genome;
  • step ii) diagnosing the gene related disease comprising comparing the standardized quantitative value of step i) with a standardized quantitative value of the same nucleotide sequence generated from a transcript or genome of a cell isolated from a homogenous normal individual. It provides a method of providing information for.
  • Sequencing of competitive PCR amplicons by mixing the related species of the present invention to the sample to be quantified and co-amplifying the similarity between the related species of the related species can provide standardized relative DNA quantification even under various experimental conditions including variable elements such as cell lines, polymerases, and the standardized DNA quantification provided above provides Compared to the RNA-seq method used as an analytical method, the relative quantitative value of the different nucleic acids in the sample can be provided more accurately regardless of the experimental conditions or the type of sample, the SiNG-PCRseq method of the present invention It can be usefully used in standardized assay methods for quantifying nucleic acids.
  • FIG. 1 is a schematic diagram illustrating the steps of the quantitative method of the present invention:
  • Figure 1a shows a schematic diagram showing the cDNA quantification method step of the present invention, to show that the relative quantitative value of the human (HS) and orangutan (PA) sequences are expressed step by step, four experimental genes (Gene A to Expressed as an arbitrary read number for Gene D);
  • 1B is a diagram showing an experimental set configured to divide a sample group of the cDNA quantification method of the present invention.
  • Figure 2 is a diagram showing the results of electrophoresis to confirm the integrity of the intermediate (intermediary) amplicon DNA in the preparation of cDNA amplicon library for sequence analysis.
  • Figure 3 shows the size of the read for each sample due to the construction of the cDNA amplicon library.
  • the total read size and informative reads for use in quantitative analysis for each sample are shown in bars, the usage rate of the sequenced reads in the present invention is shown in diamonds, and inter In the ideal case excluding the absence of -species variation (ISV), the utilization of sequenced reads is shown as circles.
  • ISV -species variation
  • FIG. 4 is a diagram showing a pairwise comparison of Pearson correlation for each sample due to the construction of the cDNA amplicon library.
  • FIG. 5 is a distribution diagram showing the dropping of the fraction of human sequence ( ⁇ HS) to amplified PCR amplicons in a mixed sample of human and orangutan genes.
  • FIG. 6 is a diagram showing the correlation between the resulting ⁇ HS according to the difference in the polymerase by dropping the ⁇ HS for each amplicon in the sample as a scatter plot.
  • FIG. 7 is a diagram showing a box plot for confirming a relationship between a variation type in an orangutan sequence variation and a procedural bias in calculating ⁇ HS in quantitative analysis using SiNG-PCRseq.
  • FIG. 8 is a diagram showing a distribution diagram in which ⁇ HS C calculated by correcting a deflection of ⁇ HS of an amplicon in a sample is dropped as a scatter diagram.
  • FIG. 9 is a scatter diagram illustrating root mean square deviation (RMSD) obtained for ⁇ HS C calculated by correcting a deflection of ⁇ HS of an amplicon in a sample.
  • RMSD root mean square deviation
  • FIG. 10 is a plot showing standardized quantities (StQs) for each amplicon amplified in five gDNA samples.
  • the area marked in black on the lower axis represents the amplicon of the autosomal gene, and the area marked in gray at the right end of it represents the amplicon of the X-chromosome associated sex chromosome gene.
  • FIG. 11 shows the steps for quantifying cDNA from SiC-PCRseq method from cDNA samples derived from human fibroblast (Fib; Fib; F1 to F3) and induced pluripotent stem cell lines (iPSC; I1 to I3):
  • FIG. 11A shows the relative abundance of HS to PA (RA H / P ) of a human sequence with respect to a competing orangutan sequence in a cDNA sample derived from iPSC and a cDNA sample derived from Fib.
  • * denotes the portion where the coefficient of variation (CV) changes significantly;
  • FIG. 11B shows the distribution of standardized quantitative values (StQs) for cDNA samples derived from iPSC and cDNA samples derived from Fib;
  • FIG. 11C shows the distribution of coefficient of variation (CV) for each amplicon by averaging StQs values calculated from three samples for each cell line of iPSC and Fib.
  • FIG. 12 is a diagram showing a scatter plot showing the difference in StQs values due to the difference in hTaq and fTaq polymerase in quantifying cDNA samples derived from various cell lines using the SiNG-PCRseq method of the present invention.
  • Figure 13 is a view confirming the difference in quantitative analysis between the SiNG-PCRseq method of the present invention and the existing quantitative method;
  • FIG. 13A is a scatter plot showing the distribution of StQs values quantified by RNA-seq method and SiNG-PCRseq method for the same sample of iPSC and Fib cDNA;
  • FIG. 13B shows the fold difference (I / F) of StQs values quantified by RNA-seq method and SiNG-PCRseq method in samples of iPSC and Fib cDNA.
  • I / F fold difference
  • the present invention is a.
  • step ii) amplifying the DNA sample mixed in step i) by multiplex PCR to obtain an amplicon
  • step iii) sequencing the sequence library comprising the amplicons obtained in step ii) and comparing it with a reference sequence to quantify the read count;
  • step iv) calculating the fraction of sequence ( ⁇ A) of the target sequence (SEQ ID A) to the amplicon obtained in step ii) using the read value quantified in step iii);
  • the term "reference sequence" may be a known sequence capable of determining the identity of a nucleotide sequence by the species and / or gene name derived.
  • the reference sequence may be, but is not limited to, a sequence retrieved from an NCBI reference sequence database.
  • the term “lead value” may refer to the number of sequencing performed for each reference sequence.
  • the term "representative value” may be a relative abundance ratio of the relative abundance gene of a specific base sequence or a relative abundance ratio derived for all or some nucleotide sequences.
  • the representative value may mean, but is not limited to, an arithmetic mean, geometric mean, harmonic mean, variance, and standard deviation of two or more relative abundances.
  • the subject of step i) may be any microorganism, yeast, mammalian, etc. that extracts genes from its cells and wishes to quantitate any of them, the related species of which have been reported genetically related to the subject for the subject. Also may be used.
  • the subject is a human, and a related species thereof may be, but is not limited to, an orangutan, a chimpanzee, a monkey, or a gorilla.
  • the cells of step i) can be any cell that can extract DNA therefrom or synthesize cDNA using its RNA.
  • the cell of interest when the cell of interest is a human-derived cell, it may be human lymphoblast, human fibroblast, or human induced pluripotent stem cells (iPSC), but is not limited thereto.
  • the multiple PCR of step ii) may be multiple competitive PCR, but is not limited thereto.
  • the ratio value ⁇ A of step iv) may be calculated using Equation 7 below, but is not limited thereto;
  • the deflection correction value ⁇ A C of step v) may be calculated using Equation 4 below, but is not limited thereto;
  • ⁇ A and ⁇ B represent the fraction of the target sequence A and the related species B in the amplified amplicons in the mixed sample
  • pA and pB represent the mixing ratios mixed in the reference sample in which the target sequence A and the related species sequence B were mixed in different moles;
  • ⁇ A Ref and ⁇ A Ref represent the ratio of subject sequence A and myoma sequence B to amplified amplicons in a reference sample in which subject sequence A and myoma sequence B are mixed in different moles;
  • n may be an integer of 1 or more, wherein n represents the number of mixed samples used for the deviation correction.
  • the relative abundance ratio RA A / B of step vi) may be calculated using Equation 8 below, but is not limited thereto;
  • the standardized quantitative values StQs of step vii) may be calculated using Equation 9, but are not limited thereto;
  • Representative RA A / B value in the sample in [Equation 9] refers to the representative value of step vii) in the standardized quantitative method of the present invention, the relative abundance or all base sequence or part of a specific one base sequence It can be the average value of the relative abundances derived for the sequences.
  • the representative value may mean, but is not limited to, an arithmetic mean, geometric mean, harmonic mean, variance, and standard deviation of two or more relative abundances.
  • the inventors have found that closely related species of the mixture after the quantification target sample dielectric utilize similarities between closely related species sequence by co-amplification (coamplification) which compete sequencing (piking- s i n g n eighbor enome
  • coamplification co-amplification
  • sequencing piking- s i n g n eighbor enome
  • the orangutan-derived cell line and the human-derived cell line were each cultured and then cultured. GDNA or cDNA was obtained from the cells.
  • the present inventors selected 263 gene sequences as reference sequences for human genes, and prepared 20 pairs of primer pairs to prepare primer sequences for multiplex PCR. The total was divided into 24 groups.
  • amplicons were amplified by multiple competitive PCR, and intermediate amplicon DNA was prepared to construct a sequence library. As a result, mediator amplicons according to mixing ratios and polymerase types in various samples were obtained. It was confirmed that DNA completeness and consistency between samples were maintained (see FIG. 2).
  • the present inventors quantified the amplicon library by quantifying the reference sequence and the amplified amplicon library to obtain read values for each amplicon.
  • the inventors used two different polymerases that were used as polymerases in the amplicon polymerization. It was confirmed that there was a high correlation between each sample in the used hTaq set and fTaq set, but the correlation between the two sets is inferior (see FIGS. 3 and 4).
  • the present inventors obtained the ratio of human sequence ( ⁇ HS) to amplified PCR amplicons in order to apply the SiNG-PCRseq method of the present invention to a gDNA sample, respectively, in a sample It was confirmed that procedural biases were indicated by a non-flat plot pattern, in which the read value of the amplicon of the amplicon was not shown (see FIG. 5), which is a process of repeating a plurality of PCR reactions and purifications. It is confirmed that a quantitative deviation occurs in and is essentially related to the mutant structure of the gene sequence (see FIGS. 6 and 7).
  • the present inventors modified the previously reported method to correct the deflection of the amplicon ⁇ HS appearing in the variation of the sequence to the ⁇ HS of the amplicon in the sample As a result of calculating the deflection correction value ⁇ HS C which corrected the deflection, it was confirmed that the deflection shown in ⁇ HS was efficiently corrected to show an evenly arranged quantitative pattern (see FIGS. 8 and 9).
  • the present inventors obtain the relative abundance ratio (RA H / P ) of the human sequence to the orangutan sequence, which is the competition sequence in the sample, in order to calculate the final quantitative value as the final quantitative value, and each of them is RA H of the autosomal sequences.
  • the standardized quantities (StQs) were calculated as relative to the average value of / P , and the relative quantitative ratios of each of the sequences were accurately reflected, and the SiNG-PCRseq of the present invention was repeated through different experimental conditions. It was confirmed that the accuracy (accuracy) is maintained even when (see FIG. 10).
  • a cDNA sample prepared from Fib and iPSC two different polymerase a (hTaq, fTaq) used in SiNG-PCRseq method was applied to, through the Sequencing Analysis calculated RA H / a P value RA of all the base sequence H /
  • RA H / a P value RA of all the base sequence H / As a result of normalization with an average of P values, it was successful to obtain constant normalized values regardless of the relative amount of orangutan gDNA mixed in the sample (see FIG. 11).
  • the correlation (R 2 ) of the standardized quantitative values obtained using different polymerases showed high reproducibility of 0.92 or more (see FIG. 12 and Table 6).
  • RNA-seq method and the correlation of SiNG-PCRseq the StQs value quantified by the method relationship (R 2) is very low with out a sample two assay for each other quantitative comparison of the different sequences within the The difference was confirmed (see FIG. 13A).
  • FIG. 13B it was confirmed that there is no significant difference between the two quantification methods to reveal the quantitative difference between the different samples for the specific nucleotide sequence.
  • the SiNG-PCRseq method of the present invention can provide standardized relative DNA quantitative values in a variety of variable experimental conditions such as cell lines, polymerases, and the standardized DNA quantitative values provided above can be used as conventional nucleic acid quantitative assays.
  • the relative quantitative value of the nucleic acid in the sample is accurately and easily provided regardless of the experimental conditions or the type of the sample. It can be useful for the method.
  • step ii) calculating a standardized ratio between different genes in one species of interest using the standardized quantitative value of step i). to provide.
  • the method can be useful for determining the cytological and histological identity by applying to the analysis of unknown biosamples.
  • Gene transcripts which are characterized by a disease state having a transitional characteristic to the disease or disease, for the target cell isolated from the diseased or suspected disease, using the standardized quantitative method of the present invention Or calculating a standardized quantity of target sequences in the genome;
  • step ii) diagnosing the gene related disease comprising comparing the standardized quantitative value of step i) with a standardized quantitative value of the same nucleotide sequence generated from a transcript or genome of a cell isolated from a homogenous normal individual. It provides a method of providing information for.
  • normal individual may refer to a healthy individual who has or is unlikely to develop a disease.
  • the standardized quantitative value used in the method for providing information for diagnosing the gene-related disease may be a distribution of the calculated standardized quantitative value, but is not limited thereto.
  • cells such as microorganisms, yeasts, mammals, etc. may be used as cells for extracting and quantitating genes, which are reported to be genetically related to the cells of the subject. Anything can be used.
  • the subject is a human, and a related species thereof may be an orangutan, a chimpanzee, a monkey, or a gorilla, but is not limited thereto.
  • the cell of step i) can be any cell that can be used to extract DNA from the cell of interest.
  • the cells may be fibroblasts or induced pluripotent stem cells, but are not limited thereto.
  • the "gene transcript characterizing the disease” of step i) may mean a gene transcriptome known to be associated with the disease, such as its expression increases or decreases when a particular disease occurs.
  • Disease state with transitional characteristics to disease may refer to a gene transcript known to be associated with prognostic or early symptoms appearing before the onset of the disease.
  • the multiple PCR of step ii) may be multiple competitive PCR, but is not limited thereto.
  • step ii) when the standardized quantitative value of step i) in step ii) is compared with the standardized quantitative value of the same genes calculated from cells isolated from homogenous normal individuals, the disease develops or develops when different values are shown. Determining that there is a possibility to further include after step ii).
  • the method of providing information for diagnosing a disease related to a gene using a standardized quantitative value of the gene according to the present invention is not limited to a specific gene, and may be applied regardless of the type of gene.
  • the SiNG-PCRseq method of the present invention can provide standardized relative DNA quantitative values under various experimental conditions including variable elements such as cell lines and polymerases, and the standardized DNA quantitative values provided above can be used as conventional nucleic acid quantitative assays. Compared with the RNA-seq method used, the relative quantitative value of the nucleic acid in the sample can be provided accurately and simply regardless of the experimental conditions or the type of the sample. Standardized quantitative values can provide useful information on diagnosis of gene-related diseases.
  • i) by providing a method for providing information for diagnosing the disease associated with the gene may provide a method for treating the disease, comprising administering a therapeutic agent for the disease to the individual diagnosed with the disease.
  • the agent for treatment of the disease may optionally use a known agent.
  • the subject may mean any animal, including humans, who have or are likely to develop a diagnosed disease.
  • the animal may be a mammal such as, but not limited to, a human, a cow, a horse, a sheep, a pig, a goat, a camel, a antelope, a dog, a cat, and the like, which require treatment of similar symptoms.
  • the therapeutic agent may be administered in a pharmaceutically effective amount.
  • orangutan-derived cell lines and human-derived cell lines were cultured, respectively, and gDNA or cDNA was obtained from the cultured cells.
  • a male orangutan-derived lymphoblastoid cell line (Public Health England, UK) was prepared and cultured in RPMI1540 medium containing 15% fetal bovine serum (FBS).
  • FBS fetal bovine serum
  • human fibroblasts fibroblasts (System Biosciences) derived from human male foreskin were prepared and cultured in DMEM medium containing 15% FBS.
  • Human derived induced pluripotent stem cells (iPSCs) are derived from a human unknown lymphoblastic cell line (I-1210 / 002 / 002-02) provided under the approval of the Institutional Audit Committee of the Korean Institute of Oriental Medicine.
  • mTesR1 conditional medium (Stemma Technologies Inc.) containing 0.5 ⁇ M sodium butylate (Sigma, USA) and 25 ⁇ M SB431542 (Sigma, USA).
  • gDNA genomic DNA
  • G-DEX TM IIC iNtRON Biotechnology, Korea
  • Human-derived fibroblast line (Fib) and human-derived pluripotent stem cell (iPSC) were extracted with whole cell RNA using Trizol® Reagent (TRIzol ® Reagent; Life Technologies, USA) following the manufacturer's protocol
  • Human cDNA samples were prepared using cDNA synthesis kit (cDNA synthesis kit; BioRad, USA) according to the protocol provided by the manufacturer using 1 ⁇ g of the extracted RNA as a template.
  • the prepared gDNA or cDNA was mixed with a composition as shown in the following [Table 1] to [Table 3] to prepare a DNA sample.
  • Tables 1 to 3 show the mixing ratios of human genomic DNA (gDNA) samples, human fibroblast line-derived cDNA samples, and human induced pluripotent stem cell-derived cDNA samples, respectively.
  • Example 2 Amplicon amplification by preparing primer sequences for multiplex PCR and performing multiple competitive PCR
  • cDNAs prepared from human-derived cells in order to select genes of the present invention, stretches were searched for orangutan genome sequences and homologous sequences.
  • gene mRNA sequences were retrieved from the NCBI reference sequence database. Then, the searched mRNA sequence is arranged and compared with the genome sequence (NCBI Pongo_pygmaeus_abelii-2.0.2 assembly) of Sumatra orangutan (Pongo abelli) through BLAST. By searching for homologous sequences including species variation (ISV), 263 gene sequences were selected as reference sequences for human genes.
  • primer sequences were selected and divided into groups.
  • G10 to G0 human gDNA samples, F1 to F3 human Fib cDNA samples, and I1 to I3 human iPSC cDNA samples prepared in Example 1 and mixed with gDNA derived from orangutans were prepared. Then, each prepared DNA sample was used as a template, and either a Solg TM h-Taq DNA polymerase (hTaq; Solgent, Korea) or FastStart Taq polymerase (fTaq; Roche, Switzerland) was used as a polymerase.
  • hTaq Solgent, Korea
  • fTaq FastStart Taq polymerase
  • each of the amplicons obtained by performing multiple PCR in Example ⁇ 2-3> was purified using an Expin TM PCR kit (GeneAll, South Korea). 1 ⁇ g of the purified amplicon group was obtained and phosphorylated using the T4 polynucleotide kinase (T4 Polynucleotide Kinase; Promega, USA) according to the manufacturer's protocol to phosphorylate the 5'-end of the amplicon. The phosphorylated 6 ng amplicon was then attached with a 10 ⁇ g Y-shaped adapter molecule (15 ⁇ M) using a T4 ligase (Promega) to illuminate sequencing. platform).
  • T4 Polynucleotide Kinase T4 Polynucleotide Kinase
  • PCR was performed to attach a sequence module to each purified amplicon.
  • the purified amplicon 4 pg, MP1 and MP2 primers, and hTaq or fTaq polymerase was mixed and the first PCR reaction, the reaction conditions were as described in Table 5 below.
  • the 0.05-fold volume of the amplicon product of the first PCR reaction was then mixed with one of the IdxPs and the primers of MP1, and subjected to a second PCR according to the reaction conditions of Table 5 below to attach the sequence module.
  • the sequence library was constructed by preparing intermediate amplicon DNA.
  • sequence libraries amplified and constructed using hTaq or fTaq were obtained from various samples prepared in Examples ⁇ 2-3> and ⁇ 2-4>. Then, the amplicon product amplified by ⁇ 21-plexing PCR in Example ⁇ 2-3>, a mixed amplicon pool, and the intermediate amplicon in Example ⁇ 2-4>. Primary PCR and secondary PCR products for preparing DNA were mapped by electrophoresis on a 12% polyacrylamide gel.
  • sequence libraries constructed from various samples were prepared using polymerase pairs of hTaq or fTaq as shown in FIG. 2, and they were confirmed to be consistent between samples (FIG. 2).
  • the amplicon sequence library was constructed by performing the method of ⁇ Example 2> from the G10 or G0 sample prepared in ⁇ Example 1> from the template.
  • the constructed sequence library is then divided into groups in equal amounts, followed by multiple parallel sequencing using the Illumina HiSeq 2500 platform, which is capable of reading 100 bp sequences at single-ends.
  • Each sequence was analyzed.
  • BLAST was performed to compare the amplicon reads of the analyzed sequence with the NCBI reference sequence database. After execution, reads corresponding to a reference sequence of at least 75 bp in arranged length exhibiting at least 90% query coverage were obtained.
  • the reads of each amplicon target were arranged in multiples to determine the matched sequence structure. At this time, sequences occupying at least 25% of each nucleotide portion of the array were present.
  • the constructed reference sequence was identified as an amplicon corresponding to 248 genes containing 425 ISVs, and the remaining 70 sequences were identified as 4 genes that were not amplified and 66 genes that did not include ISVs.
  • the amplicon mixed sample constructed and identified in the sequence library in ⁇ Example 2> was compared with the sample-specific reference sequence constructed in Example ⁇ 3-1> by BLAST analysis. After comparison, for each amplicon target, a sequence derived from the human mRNA and the orangutan genome, for each amplicon target, was targeted, with a sequence that exhibited 100% sequence identity at 75 bp or greater and at least 90% query coverage. The number was counted.
  • Pearson correlation was calculated.
  • the calculated Pearson correlation coefficients were shown from various samples comparing hTaq sets using Solg TM hTaq as polymerase and fTaq sets using FastStart Taq as polymerase in amplicon polymerization.
  • Example 4 Spike-in neighbor genome-coupled competitive PCR using a gDNA sample mixed with the species related to the quantitative sample and co-amplified using similarity between the species Application of amplicon sequencing and SiNG-PCRseq method
  • the fraction of human sequence ( ⁇ HS) to the amplified PCR amplicons was determined.
  • multi-competitive PCR using hTaq or fTaq by the method of ⁇ Example 2> and ⁇ Example 3> from the G10, G9, G7, G5, G3, G1 and G0 samples prepared in ⁇ Example 1> was performed to amplify the amplicons and quantify the reads of the sequence library. From the quantified read values, an amplicon showing an average value of 200 leads or less per sample for the same amplicon and an amplicon showing impurity of 5% or more in the G0 and G10 samples was removed.
  • FIGS. 5 and 6 in one sample, a non-flat plot pattern is shown in which the read values of each amplicon do not show a straight line, which is different from each other. It was confirmed that the same pattern in the G10, G9, G7, G5, G3, G1 and G0 samples containing the human sequence (Fig. 5). Through this, a large portion of the amplicon amplified in the sample was directly quantified for any of the mutations in the human or orangutan sequence, confirming that it exhibits a procedural bias (FIG. 5).
  • Example 1> Specifically, from the G10, G9, G7, G5, G3, G1 and G0 samples prepared in ⁇ Example 1> by the method of Example ⁇ 4-1> to obtain the ⁇ HS contained in the amplicon in each sample This was averaged. Then, the sequence variation of the orangutan in the amplicon was confirmed, and when the mutations were found in guanine (G) and cytosine (C) nucleotides showing strong hydrogen bonds (Stronger; S), adenine (weak hydrogen bonds) A) and thymine (T) nucleotides were divided (Weaker; W) and the degree of hydrogen bonding was neutral (Neutral; N).
  • G guanine
  • C cytosine nucleotides showing strong hydrogen bonds
  • S adenine (weak hydrogen bonds)
  • T thymine
  • the divided S, W and N groups were analyzed by Student's t-test using the average of ⁇ HS, and a box plot was drawn to confirm the degree of distribution.
  • the box plot showed a p value of 2.54 ⁇ 10 ⁇ 9 , and in the case where the variant sequence is represented by guanine (G) and cytosine (C) residues (S), It was confirmed that a relatively high ⁇ HS average was shown in comparison with the case where the mutation was caused by the adenine (A) and thymine (T) residues (N) (FIG. 7). This difference was confirmed in the quantitative analysis by SiNG-PCRseq of the present invention, it can be seen through the process of repeating a plurality of PCR reactions and purification.
  • the degree of deflection in one sequence is proportional to the ratio of the total amplicons. Calculated by application.
  • a procedural bias between sequence variations results in a difference in ⁇ HS in the amplicon, which is efficient using ⁇ HS obtained from a reference amplicon calculated by analyzing two competing sequences having the same molar composition. (Jeong, S. et al. , Genome Res., 17: 1093-1100, 2007), it was calculated using a formula modified to suit the present invention.
  • ⁇ A and ⁇ B represent the ratio of sequences A and B in the amplified amplicons in the mixed sample
  • ⁇ A H and ⁇ B H represent the ratio of sequences A and B in the amplified amplicons in the reference sample in which the A and B sequences were mixed in equal moles.
  • ⁇ A and ⁇ B represent the fraction of sequences A and B in the amplified amplicon in the mixed sample; pA and pB represent the mixing ratios mixed in the reference sample in which sequences A and B were mixed in different ratios; ⁇ A Ref and ⁇ B Ref represent the ratio of sequences A and B to the amplified amplicons in the reference sample in which sequences A and B were mixed at different ratios.
  • Equation 4 Equation 4
  • the quantitative value is used to reflect the difference in the number of copies that can appear between the autosomal sequence and the sex chromosomal sequence, which is an X-associated sequence according to the male genome, from the ⁇ HS C s corrected bias according to amplicons.
  • ⁇ HS C was calculated by the method of Example ⁇ 4-2> from the G9, G7, G5, G3 and G1 samples prepared in ⁇ Example 1>. Then, the calculated ⁇ HS C was applied to Equation 5 below to convert the relative abundance of human sequences (RA H / P ) in each sample.
  • human Fib cDNA samples F1 to F3 and human iPSC cDNA samples I1 to I3 prepared in Example 1 and mixed with gDNA derived from orangutans were prepared. Then, the cDNA samples were subjected to the SiNG-PCRseq method using the method of ⁇ Example 3> to ⁇ Example 7> using hTaq or fTaq polymerase, respectively, for the amplicons amplified in each sample. ⁇ HS, ⁇ HS C , RA H / P and StQs values were determined.
  • ⁇ HS C value for the correction of bias, quantitative analysis was performed excluding amplicons having a read value of 200 or less in each sample, and amplicons whose sequence was different from the gDNA sequence. After the calculations, for iPSC samples and Fib samples, amplicons showing higher RA H / P and StQs values based on the hTaq set were dropped in order starting from the left.
  • the SiNGPCR-seq method of the present invention can be used for various samples by correcting the ratio of amplified gDNA in the cDNA sample.
  • ⁇ HS, ⁇ HS C , RA H / P and StQs values for amplicons amplified in each sample for Fib and iPSC cell cDNA were obtained using the method of Example ⁇ 8-1>.
  • the read value of the amplified amplicon sequence was not high and was often excluded in the calculation of ⁇ HS. Therefore, the hTaq targets 365 amplicons (86%), whereas in the fTaq set, 240 amplicons are used. Quantitative analysis was performed on only (56%). Then, the relationship between the hTaq set and the fTaq set using the StQs value calculated from I1 to I3 or the StQs value calculated from F1 to F3 is shown as a scatter plot.
  • the StQs value of the iPSC sample showed a correlation (R 2 ) of 0.93 with a regression slope of 1.02.
  • StQs value of the Fib sample was confirmed to show a regression slope of 0.93 and R 2 value of 0.92, which confirmed that the SiNG-PCRseq method of the present invention can be used reproducibly under other reaction conditions (Fig. 12). .
  • ⁇ HS, ⁇ HS C , RA H / P and StQs values for amplicons amplified in each sample for Fib and iPSC cell cDNA were obtained using the method of Example ⁇ 8-1>.
  • DNA quantitative analysis was performed in a sample through relative quantitative analysis (RNA-seq) between RNA-sequences according to a previously reported method (Blomquist, TM et al. PLoS ONE 8, e79120). , 2013).
  • Read counts for reference mRNA, a template of cDNA for amplicon amplification, were normalized to length to obtain RPKM (Read Per Kilobase per Millon mapped reads) values.
  • RPKM Read Per Kilobase per Millon mapped reads
  • genes exhibiting RA H / P of 0.05 or more in all samples in the SiNG-PCRseq method and 40 or more read values in the RNA-seq method were used.
  • RA H / P and RPKM values were standardized by averaging the values obtained from the selected genes.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a quantitative analysis method for nucleic acids in a sample, the method applying a spiking-in neighbor genome-coupled competitive PCR amplicon sequencing (SiNG-PCRseq) method in which a neighbor genome is mixed with a sample to be quantified, and then co-amplification is carried out by using similarities between neighboring sequences. Specifically, the SiNG-PCRseq method of the present invention can provide a standardized and relative DNA quantitative value under various experimental conditions including a variable element such as a cell line and a polymerase, and when compared with a RNA-seq method to be used as a conventional method for quantitative analysis of nucleic acids, the provided standardized DNA quantitative value accurately and conveniently provides a relative quantitative value with respect to nucleic acids in a sample irrespective of the experimental conditions or the type of sample, and thus the SiNG-PCRseq method of the present invention is useful for a standardized quantitative analysis method for nucleic acids in a sample.

Description

SING-PCRSEQ 방법을 적용한 핵산의 표준화 정량 분석 방법Standardized Quantitative Analysis of Nucleic Acids Using the SING-PCRSEQ Method
본 발명은 근연종 유전체를 정량 대상 시료에 혼합한 뒤 근연종 서열간의 유사성을 활용하여 공동 증폭(co-amplification)시킨 경쟁 PCR 앰플리콘들의 서열분석(spiking-in neighbor genome-coupled competitive PCR amplicon sequencing, SiNG-PCRseq) 방법을 적용한, 시료 내 핵산의 표준화 정량 방법에 관한 것이다.The present invention relatives after mixing the dielectric for the determination target sample utilizing the similarity between closely related species sequence by co-amplification (co-amplification) sequence analysis of the competition in which PCR amplicons (piking- s i n g n eighbor enome-coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method, the method of quantifying the nucleic acid in the sample.
세포의 전사체를 정확하게 정량 분석하는 것은, 세포가 가지는 생리학적 특징 및 기능을 연구하기 위한 근본적 시각을 제공하는데 필요하다. 전사체(transcriptome)를 정밀분석하기 위한 분석적인 기반은, 현재 마이크로어레이를 기술에서 차세대 서열분석(next generation sequencing, NGS)에 의한 기술로 전환되는 추세이다(Ozsplak, F. & Milos, P.M., Nat. Rev. Genet., 12: 87-98, 2011; Metzker, M.L., Nat. Rev. Genet., 11: 31-46, 2010; Blencowe, B.J. et al., Genes. Dev., 23: 1379-1386, 2009). 이러한 변화는 NGS 기술이 단일 뉴클레오티드 수준에서 전사체의 많은 부분을 정량할 수 있도록 하는 기술적 진보에 기인한다.Accurate quantitation of the transcripts of cells is necessary to provide a fundamental perspective for studying the physiological characteristics and functions of cells. The analytical basis for the precise analysis of transcriptomes is the current trend to shift microarrays from technology to next generation sequencing (NGS) (Ozsplak, F. & Milos, PM, Nat). Rev. Genet., 12: 87-98, 2011; Metzker, ML, Nat. Rev. Genet., 11: 31-46, 2010; Blencowe, BJ et al ., Genes. Dev., 23: 1379-1386. , 2009). This change is due to technological advances that allow NGS technology to quantify large portions of transcripts at the single nucleotide level.
최근 개발된 핵산 정량 방법으로서 개발된 RNA-seq 방법은, 단일 수행을 통해 수억 개의 서열 리드를 생성할 수 있어, 전(whole) 전사체 수준에서 염기서열 간의 존재비(abundance ratio) 및 서열 변이체에 대한 정보를 제공할 수 있다는 장점을 가진다(Mortazavi, A. et al., Nat. Methods, 5: 621-628, 2008; Wang, Z. et al., Nat. Rev. Genet., 10: 57-63, 2009). RNA-seq 방법은 동일한 서열에 대해 서로 다른 시료 간의 정량값을 높은 효율과 재현성을 가지고 비교할 수 있게 해준다. 그러나, 동일한 세포 내에 존재하는 다른 전사체 간에는 상대적인 정량값은 편향(biase)이 나타나는 경우 이를 다루기 힘들다는 단점을 나타내므로, RNA-seq 방법에 의한 핵산의 정량 분석 만으로는 신뢰성 있는 정량 분석이 불가능한 한계를 가진다.The recently developed RNA-seq method, which has been developed as a nucleic acid quantification method, can generate hundreds of millions of sequence reads through a single run, thus allowing for abundance ratios and sequence variants between sequences at the whole transcript level. Has the advantage of providing information (Mortazavi, A. et al. , Nat. Methods, 5: 621-628, 2008; Wang, Z. et al. , Nat. Rev. Genet., 10: 57-63 , 2009). RNA-seq methods allow quantitative comparisons between different samples for the same sequence with high efficiency and reproducibility. However, the relative quantitative value between different transcripts in the same cell is difficult to deal with when the biase appears, so it is impossible to quantitatively analyze nucleic acids by RNA-seq method alone. Have
상기 편향은 역전사 및 이중 가닥 cDNA 형성 모두에 대한 무작위 6-mer(hexamer) 프라이머링, 단편화 및 유동세포의 집락 형성과 같은 분석 과정 동안에 유도되어(Hansen, K.D. et al., Nucleic acids research, 38: e131, 2010; Oshlack, A. & Wakefield, M.J., Biology direct., 4: 14, 2009; A. Roberts et al., Genomic biology, 12: R22, 2011), 전사체 구성원들의 정량적 지형의 왜곡을 가져온다. 전 전사체 수준의 분석 외에도, 특정 전사체 그룹을 표적으로 하는 RNA-seq 방법들이 개발되었는데 여기에는 타일링 어레이(tiling array; Mercer, R. et al.,Nature biotechnology, 30: 99-104, 2012) 또는 가용성 프로브(Levin, J.Z. et al., Genomic biology, 10: R115, 2009)를 사용하여 포획되거나, 다중 경쟁적 PCR(multiplexed competitive PCR)에 의해 증폭(spike-in)된 표적 서열의 라이브러리를 구축하는 단계를 갖는 방법들이 있다.The bias was induced during analytical procedures such as random 6-mer (hexamer) priming, fragmentation and colony formation of flow cells for both reverse transcription and double strand cDNA formation (Hansen, KD et al. , Nucleic acids research, 38: e131, 2010; Oshlack, A. & Wakefield, MJ, Biology direct., 4: 14, 2009; A. Roberts et al. , Genomic biology, 12: R22, 2011), resulting in distortion of quantitative topography of transcript members. . In addition to analysis at the entire transcript level, RNA-seq methods have been developed that target specific groups of transcripts, including tiling arrays (Mercer, R. et al. , Nature biotechnology, 30: 99-104, 2012). Or constructing a library of target sequences captured using soluble probes (Levin, JZ et al. , Genomic biology, 10: R115, 2009) or spiked-in by multiplexed competitive PCR. There are methods with steps.
서열 간(inter-sequence)의 정량적 표현의 용어에 있어서, 포획 접근법(capture approaches)은 RNA-seq 방법과 유사한 절차상의 편향이 개입되는 한계를 여전히 가지고 있다. 이와는 반대로, 경쟁적 PCR 접근 방법은 정량값을 알고 있는 해당하는 경쟁자 주형을 이용하여, 대상 서열의 정량값을 보정함으로써, 절차상의 편향을 보정할 수 있는 플랫폼을 제공한다. 이러한 경쟁자 주형 역시 대응하는 표적서열이 겪는 동일한 수준의 생화학적 반응을 겪게 되므로 대응하는 표적서열과 같이 동일한 편향 정도를 가질 수 있을 것으로 가정할 수 있다(Blomquist, T.M. et al., PLoS ONE, 8: e79120, 2013). 그러나 이 방법은 경쟁자 주형들의 혼합물을 생성할 때 그들 간의 엄격한 양적관계의 유지가 필수적이나, 이러한 과정이 서로 다른 실험자들 사이에서 이루어질 때 그 엄격성을 담보할 수 있는 수단이 제한적이다.In terms of quantitative representation of inter-sequence, capture approaches still have the limitation of involving procedural biases similar to RNA-seq methods. In contrast, the competitive PCR approach provides a platform for correcting procedural biases by calibrating the quantitative value of the subject sequence, using the corresponding competitor template knowing the quantitative value. Since the competitor template also undergoes the same level of biochemical reactions as the corresponding target sequence, it can be assumed that it can have the same degree of bias as the corresponding target sequence (Blomquist, TM et al. , PLoS ONE, 8: e79120, 2013). This method, however, is essential to maintain a rigorous quantitative relationship between competitors when creating a mixture of competitor templates, but there are limited means to ensure the rigor when this process is carried out between different experimenters.
따라서, 정확한 정량 분석을 위해 결합되는 표준 분자에 의한 통계학적 또는 실험적인 보정(adjustment)을 수행함에도 불구하고(Jiang, L. et al., Genomic research, 21: 1543-1551, 2011; Schwartz, S. et al., PLoS ONE, 6: e16685, 2011; Zook, J.M. et al., PLoS ONE, 7: e41356, 2012), 이러한 편향을 완벽하게 보정할 수 있는지에 대한 확실함이 부족하여, 이를 대체할 수 있는 신뢰적인 핵산 정량 분석 방법의 개발이 요구되고 있다.Thus, despite performing statistical or experimental adjustments with standard molecules bound for accurate quantitative analysis (Jiang, L. et al. , Genomic research, 21: 1543-1551, 2011; Schwartz, S et al. , PLoS ONE, 6: e16685, 2011; Zook, JM et al. , PLoS ONE, 7: e41356, 2012), lacking certainty that these deflections can be completely compensated for There is a need for development of reliable nucleic acid quantitative analysis methods.
이에 관하여, 정량적인 경쟁자 PCR 전략을 사용한 다른 서열의 상대적인 정량값을 결정하는 방법에 관하여 보고된 바 있다(Jeong, S. et al., DNA Res., 19: 209-217, 2012). 상기 방법에 있어서, 경쟁자 주형들은 서로 간의 상대적인 정량비를 정확하게 파악하게 할 수 있도록 플라스미드 삽입을 통해 경쟁자 어레이를 형성하였다. 즉, 하나의 플라스미드에 각 경쟁자 주형들의 복사체의 수(copy number)를 알 수 있도록 재조합시킴으로서 경쟁자 주형의 정확한 정량적인 관계를 제공할 수 있다. 이를 대립 유전자(allele) 정량을 위한 변성 분석(melting analysis)과 결합하여 적용하면, 상기 방법은 적은 수의 서열을 정량하는데 있어서 매우 정확하고 예리하다는 장점을 가진다(Jeong, S. et al., Genome. Res., 17: 1093-1100, 2007).In this regard, a method of determining the relative quantitative value of other sequences using a quantitative competitor PCR strategy has been reported (Jeong, S. et al. , DNA Res., 19: 209-217, 2012). In this method, competitor templates formed competitor arrays through plasmid insertion to enable accurate determination of relative quantitative ratios between each other. In other words, by recombining the copy number of each competitor template in one plasmid can provide an accurate quantitative relationship of the competitor template. When applied in combination with melting analysis for quantitating alleles, the method has the advantage of being very accurate and sharp in quantifying small numbers of sequences (Jeong, S. et al. , Genome Res., 17: 1093-1100, 2007).
따라서, 본 발명자들은 이전까지 개발된 핵산 정량 분석 방법에 대한 개념을 확장시켜, 정확하고 간편한 초고속 핵산 정량 분석 방법을 개발하고자 노력한 결과, 경쟁자 주형으로서 진화적으로 이웃된 종의 유전체(근연종 유전체)를 사용하여, 앰플리콘 내의 경쟁자 서열을 정량함으로써 초고속의 차세대 서열분석이 가능하도록 한, 증폭된 근연종 유전체와 연관된 경쟁자 PCR 앰플리콘 서열분석(spiking-in neighbor genome-coupled competitive PCR amplicon sequencing, SiNG-PCRseq) 방법을 개발하였다. 본 발명에서, 인간 유래 섬유아세포주(fibroblast) 또는 유도만능줄기세포(induced pluripotent stem cells, iPSC)로부터 수득한 cDNA 및 인간과 근연종인 오랑우탄 gDNA를 혼합한 시료를 이용하여 인간 cDNA의 SiNG-PCRseq 정량 분석을 수행하였을 때, 세포주, 중합 효소와 같은 변할 수 있는 다양한 실험 조건에서 표준화된 상대적인 DNA 정량값을 제공할 수 있음을 확인하였고, 상기 제공된 표준화된 DNA 정량값은 기존 핵산 정량 분석방법으로 사용되는 RNA-seq 방법과 비교하였을 때, 정확한 정량 정보를 제공하는 것을 확인하였으므로, 본 발명의 SiNG-PCRseq 방법은 시료 내 핵산을 정량하는 표준화 분석 방법에 유용하게 사용될 수 있다.Accordingly, the present inventors have expanded the concept of nucleic acid quantitative analysis methods developed before, and have tried to develop accurate and simple ultra-fast nucleic acid quantitative analysis methods. As a result, competitors' genomes (e.g. use, as long competitor PCR amplicon sequencing associated with the amplifier closest relative dielectric by quantifying the competitor sequences within the amplicon to enable the next generation of high-speed sequencing (piking- s i n g n eighbor enome-coupled competitive PCR amplicon seq uencing, SiNG-PCRseq) method was developed. In the present invention, SiNG-PCRseq quantification of human cDNA using a sample of cDNA obtained from a human-derived fibroblast or induced pluripotent stem cells (iPSC) and a human and related species orangutan gDNA When the analysis was performed, it was confirmed that it can provide a standardized relative DNA quantitative value in a variety of variable experimental conditions, such as cell lines, polymerase, the standardized DNA quantitative value provided is used as a conventional nucleic acid quantitative assay As compared with the RNA-seq method, it was confirmed that the accurate quantitative information is provided, the SiNG-PCRseq method of the present invention can be usefully used in the standardized analysis method for quantifying nucleic acid in a sample.
[선행기술문헌][Preceding technical literature]
[비특허문헌][Non-Patent Documents]
1. Ozsolak, F. & Milos, P.M., RNA sequencing: advances, challenges and opportunities, Nat. Rev. Genet., 12: 87-98 (2011).Ozsolak, F. & Milos, P.M., RNA sequencing: advances, challenges and opportunities, Nat. Rev. Genet., 12: 87-98 (2011).
2. Metzker, M.L., Sequencing technologies-the next generation, Nat. Rev. Genet., 11: 31-46 (2010).Metzker, M.L., Sequencing technologies-the next generation, Nat. Rev. Genet., 11: 31-46 (2010).
3. Blencowe, B.J. et al., Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes, Genes Dev., 23: 1379-1386 (2009).3. Blencowe, BJ et al. , Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes, Genes Dev., 23: 1379-1386 (2009).
4. Mortazavi, A. et al., Mapping and quantifying mammalian transcriptomes by RNA-seq, Nat. Methods, 5: 621-628 (2008).4. Mortazavi, A. et al. , Mapping and quantifying mammalian transcriptomes by RNA-seq, Nat. Methods, 5: 621-628 (2008).
5. Wang, Z. et al., RNA-seq: a revolutionary tool for transcriptomics, Nat. Rev. Genet., 10: 57-63 (2009).5. Wang, Z. et al. , RNA-seq: a revolutionary tool for transcriptomics, Nat. Rev. Genet., 10: 57-63 (2009).
6. 't Hoen, P.A. et al., Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories, Nat. Biotechnol., 31: 1015-1022 (2013).6. 't Hoen, PA et al. , Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories, Nat. Biotechnol., 31: 1015-1022 (2013).
7. Hansen, K.D. et al., Biases in Illumina transcriptome sequencing caused by random hexamer priming, Nucleic Acids Res., 38: e131 (2010).7. Hansen, KD et al. , Biases in Illumina transcriptome sequencing caused by random hexamer priming, Nucleic Acids Res., 38: e131 (2010).
8. Oshlack, A. & Wakefield, M.J., Transcript length bias in RNA-seq data confounds systems biology, Biol. Direct., 4: 14 (2009).8. Oshlack, A. & Wakefield, M.J., Transcript length bias in RNA-seq data confounds systems biology, Biol. Direct., 4: 14 (2009).
9. Roberts, A. et al., Improving RNA-seq expression estimates by correcting for fragment bias, Genome Bio., l12: R22 (2011).9. Roberts, A. et al. , Improving RNA-seq expression estimates by correcting for fragment bias, Genome Bio., L 12: R22 (2011).
10. Jiang, L. et al., Synthetic spike-in standards for RNA-seq experiments, Genome Res., 21: 1543-1551 (2011).10. Jiang, L. et al. , Synthetic spike-in standards for RNA-seq experiments, Genome Res., 21: 1543-1551 (2011).
11. Schwartz, S. et al., Detection and removal of biases in the analysis of next-generation sequencing reads, PLoS ONE, 6: e16685 (2011).11. Schwartz, S. et al. , Detection and removal of biases in the analysis of next-generation sequencing reads, PLoS ONE, 6: e16685 (2011).
12. Zook, J.M. et al., Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing, PLoS ONE, 7: e41356 (2012).12. Zook, JM et al. , Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing, PLoS ONE, 7: e41356 (2012).
13. Mercer, T.R. et al., Targeted RNA sequencing reveals the deep complexity of the human transcriptome, Nat. Biotechnol., 30: 99-104 (2012).13. Mercer, TR et al. , Targeted RNA sequencing reveals the deep complexity of the human transcriptome, Nat. Biotechnol., 30: 99-104 (2012).
14. Levin, J.Z. et al., Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts, Genome Bio., l10: R115 (2009).14. Levin, JZ et al. , Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts, Genome Bio., L10: R115 (2009).
15. Blomquist, T.M. et al., Targeted RNA-sequencing with competitive multiplex-PCR amlicon libraries, PLoS ONE, 8: e79120 (2013).15. Blomquist, TM et al. , Targeted RNA-sequencing with competitive multiplex-PCR amlicon libraries, PLoS ONE, 8: e79120 (2013).
16. Jeong, S. et al., Accurate measurement of the relative abundance of different DNA species in complex DNA mixtures, DNA Res., 19: 209-217 (2012).16. Jeong, S. et al. , Accurate measurement of the relative abundance of different DNA species in complex DNA mixtures, DNA Res., 19: 209-217 (2012).
17. Jeong, S. et al., Accurate quantitation of allele-specific expression patterns by analysis of DNA melting, Genome Res., 17: 1093-1100 (2007).17. Jeong, S. et al. , Accurate quantitation of allele-specific expression patterns by analysis of DNA melting, Genome Res., 17: 1093-1100 (2007).
18. Untergasser, A. et al., Primer3-new capabilities and interfaces, Nucleic Acids Res., 40: e115 (2012).18. Untergasser, A. et al. , Primer3-new capabilities and interfaces, Nucleic Acids Res., 40: e115 (2012).
본 발명의 목적은 근연종 유전체를 정량 대상 시료에 혼합한 뒤 근연종 서열 간의 유사성을 활용하여 공동 증폭(co-amplification)시킨 경쟁 PCR 앰플리콘들의 서열분석(spiking-in neighbor genome-coupled competitive PCR amplicon sequencing , SiNG-PCRseq) 방법을 통해, 시료 내 서로 다른 핵산에 대한 상대적인 정량값을 실험 조건이나 시료 종류에 관계없이 정확하고 간편하게 구할 수 있는 초고속(high-through put) 핵산 정량 방법을 제공하는 데 있다.An object of the present invention is closely related species then mixing the dielectric for the determination target sample utilizing the similarity between closely related species sequence by co-amplification (co-amplification) sequence analysis of the competition in which PCR amplicons (piking- s i n g n eighbor enome- Through the coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method, a high-throughput nucleic acid quantification method that can accurately and easily obtain the relative quantitative values of different nucleic acids in a sample regardless of the experimental conditions or the type of sample. To provide.
상기 목적을 달성하기 위해서, 본 발명은In order to achieve the above object, the present invention
i) 대상 유래 세포에서 분리된 DNA 또는 cDNA와, 이의 근연종 유래 세포에서 분리된 유전체 DNA(genomic DNA, gDNA)를 혼합하는 단계;i) mixing DNA or cDNA isolated from a subject-derived cell with genomic DNA (gDNA) isolated from a cell of the species of late myelin;
ii) 상기 단계 i)에서 혼합한 DNA 시료를 다중 PCR(multiplex PCR)로 증폭하여 앰플리콘(amplicon)을 수득하는 단계;ii) amplifying the DNA sample mixed in step i) by multiplex PCR to obtain an amplicon;
iii) 상기 단계 ii)에서 수득한 앰플리콘을 포함하는 서열 라이브러리를 서열분석(sequencing)한 후, 이를 기준 서열(reference sequence)과 비교하여 리드값(read count)을 정량하는 단계;iii) sequencing the sequence library comprising the amplicons obtained in step ii) and comparing it with a reference sequence to quantify the read count;
iv) 상기 단계 iii)에서 정량한 리드값을 이용하여, 상기 단계 ii)에서 수득한 앰플리콘에 대한 대상 서열(서열 A)의 비율값(fraction of sequence, γA)을 산출하는 단계;iv) calculating the fraction of sequence (γA) of the target sequence (SEQ ID A) to the amplicon obtained in step ii) using the read value quantified in step iii);
v) 상기 단계 iv)에서 산출한 비율값(γA)을 이용하여, 절차상의 편향을 보정한 편향 보정값(γAc)을 계산하는 단계;v) calculating a deflection correction value γ A c that corrects procedural deflection using the ratio value γ A calculated in step iv);
vi) 상기 단계 v)에서 산출한 편향 보정값(γAc)으로부터 근연종 서열(서열 B)에 대한 대상 서열(서열 A)의 상대적인 존재비(relative abundance of A to B; RAA/B)를 산출하는 단계; 및vi) Calculating the relative abundance of A to B (RA A / B ) of the target sequence (SEQ ID NO: A) to the near-missing sequence (SEQ ID NO: B) from the bias correction value γ A c calculated in step v). Making; And
vii) 상기 단계 vi)에서 산출한 상대적인 존재비(RAA/B)를 이의 대표값으로 나누어 표준화된 정량값(standardized quantities, StQs)을 산출하는 단계를 포함하는, 시료 내 핵산의 표준화 정량 방법을 제공한다.vii) providing a standardized quantity of nucleic acid in a sample by dividing the relative abundance ratio (RA A / B ) calculated in step vi) by a representative value thereof and calculating standardized quantities (StQs). do.
또한, 본 발명은In addition, the present invention
i) 상기 본 발명의 표준화 정량 방법을 이용하여 대상 유래의 서로 다른 유전자들 사이의 표준화된 정량값을 산출하는 단계; 및i) calculating standardized quantitative values between different genes of a subject using the standardized quantitative method of the present invention; And
ii) 상기 단계 i)의 표준화 정량값을 통해 대상인 하나의 종 내의 서로 다른 유전자 간의 표준화된 비율을 계산하는 단계를 포함하는, 동일 종 내의 서로 다른 유전자 간의 표준화된 비율에 대한 정보를 제공하는 방법을 제공한다.ii) calculating a standardized ratio between different genes in one species of interest using the standardized quantitative value of step i). to provide.
아울러, 본 발명은In addition, the present invention
i) 상기 본 발명의 표준화 정량 방법을 이용하여, 질병이 있거나 발병이 의심되는 개체로부터 분리한 대상 세포에 대해, 상기 질병 또는 질병으로 이행하는 과도기적 특성을 가진 질병적 상태를 특징하는 유전자 전사체들 또는 유전체 내 대상 염기서열들의 표준화된 정량값을 산출하는 단계; 및i) Gene transcripts which are characterized by a disease state having a transitional characteristic to the disease or disease, for the target cell isolated from the diseased or suspected disease, using the standardized quantitative method of the present invention Or calculating a standardized quantity of target sequences in the genome; And
ii) 상기 단계 i)의 표준화 정량값을 동종의 정상 개체로부터 분리한 세포의 전사체 또는 유전체로부터 산출된 동일한 염기서열의 표준화된 정량값과 비교하는 단계를 포함하는, 상기 유전자 관련 질병을 진단하기 위한 정보를 제공하는 방법을 제공한다.ii) diagnosing the gene related disease comprising comparing the standardized quantitative value of step i) with a standardized quantitative value of the same nucleotide sequence generated from a transcript or genome of a cell isolated from a homogenous normal individual. It provides a method of providing information for.
본 발명의 근연종 유전체를 정량 대상 시료에 혼합한 뒤 근연종 서열 간의 유사성을 활용하여 공동 증폭(co-amplification)시킨 경쟁 PCR 앰플리콘들의 서열분석(spiking-in neighbor genome-coupled competitive PCR amplicon sequencing, SiNG-PCRseq) 방법은 세포주, 중합 효소와 같이 가변적인 요소를 포함하는 다양한 실험 조건에서도 표준화된 상대적인 DNA 정량값을 제공할 수 있고, 상기 제공된 표준화된 DNA 정량값은 기존 핵산 정량 분석방법으로 사용되는 RNA-seq 방법과 비교하였을 때, 시료 내의 서로 다른 핵산에 대한 상대적인 정량값을 실험 조건이나 시료 종류에 관계없이 보다 정확하게 제공할 수 있으므로, 본 발명의 SiNG-PCRseq 방법은 시료 내 핵산을 정량하는 표준화된 분석 방법에 유용하게 사용될 수 있다.Sequencing of competitive PCR amplicons by mixing the related species of the present invention to the sample to be quantified and co-amplifying the similarity between the related species of the related species ( s piking- i n neighbor g enome-coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method can provide standardized relative DNA quantification even under various experimental conditions including variable elements such as cell lines, polymerases, and the standardized DNA quantification provided above provides Compared to the RNA-seq method used as an analytical method, the relative quantitative value of the different nucleic acids in the sample can be provided more accurately regardless of the experimental conditions or the type of sample, the SiNG-PCRseq method of the present invention It can be usefully used in standardized assay methods for quantifying nucleic acids.
도 1은 본 발명의 정량 방법의 단계를 나타내는 모식도이다:1 is a schematic diagram illustrating the steps of the quantitative method of the present invention:
도 1a는 본 발명의 cDNA 정량 방법 단계를 나타내는 모식도를 나타내는 것으로, 인간(HS) 및 오랑우탄(PA) 염기서열들의 상대적인 정량값이 단계적으로 표현되는 것을 나타내기 위해, 4개의 실험 유전자(Gene A 내지 Gene D)에 대한 임의의 리드 숫자(read number)로 표현하였다;Figure 1a shows a schematic diagram showing the cDNA quantification method step of the present invention, to show that the relative quantitative value of the human (HS) and orangutan (PA) sequences are expressed step by step, four experimental genes (Gene A to Expressed as an arbitrary read number for Gene D);
도 1b는 본 발명의 cDNA 정량 방법의 시료 그룹을 나누기 위해 구성된 실험 세트를 나타내는 도이다.1B is a diagram showing an experimental set configured to divide a sample group of the cDNA quantification method of the present invention.
도 2는 서열 분석을 위한 cDNA 앰플리콘 라이브러리의 제조에 있어서 중간체(intermediary) 앰플리콘 DNA의 완성도(integrity)를 확인하기 위한 전기영동 결과를 나타낸 도이다.Figure 2 is a diagram showing the results of electrophoresis to confirm the integrity of the intermediate (intermediary) amplicon DNA in the preparation of cDNA amplicon library for sequence analysis.
도 3은 cDNA 앰플리콘 라이브러리의 구축으로 인한 각각의 시료에 대한 리드의 크기를 나타낸 도이다. 전체 리드의 크기 및 각각의 시료에 있어서 정량 분석에 사용하기 위한 정보적인 리드는 막대(bar)로 나타내었으며, 본 발명에서 서열화된 리드의 사용률(usage rate)은 다이아몬드로 나타내었고, 이종간변이(inter-species variation, ISV)를 포함하지 않아 이를 제외한 이상적인 경우에서 서열화된 리드의 사용률은 원으로 나타내었다.Figure 3 shows the size of the read for each sample due to the construction of the cDNA amplicon library. The total read size and informative reads for use in quantitative analysis for each sample are shown in bars, the usage rate of the sequenced reads in the present invention is shown in diamonds, and inter In the ideal case excluding the absence of -species variation (ISV), the utilization of sequenced reads is shown as circles.
도 4는 cDNA 앰플리콘 라이브러리의 구축으로 인한 각각의 시료에 대한 피어슨 상관관계(Pearson correlation)의 쌍 비교(Pairwise comparison)를 나타낸 도이다.4 is a diagram showing a pairwise comparison of Pearson correlation for each sample due to the construction of the cDNA amplicon library.
도 5는 인간 및 오랑우탄 유전자의 혼합 시료 내에서 증폭된 PCR 앰플리콘에 대한 인간 서열의 비율(fraction of human sequence, γHS)을 점적하여 나타낸 분포도이다.FIG. 5 is a distribution diagram showing the dropping of the fraction of human sequence (γHS) to amplified PCR amplicons in a mixed sample of human and orangutan genes.
도 6은 시료 내의 각각의 앰플리콘에 대한 γHS를 산포도로 점적하여, 중합 효소의 차이에 따라 결과된 γHS 간의 상관관계를 나타낸 도이다.6 is a diagram showing the correlation between the resulting γHS according to the difference in the polymerase by dropping the γHS for each amplicon in the sample as a scatter plot.
도 7은 SiNG-PCRseq을 통한 정량 분석에 있어서, 오랑우탄 서열의 변이에서 나타나는 변이 유형과 γHS 계산시 나타나는 절차상의 편향 간의 관계를 확인하기 위한 박스 플롯(box plot)을 나타낸 도이다.FIG. 7 is a diagram showing a box plot for confirming a relationship between a variation type in an orangutan sequence variation and a procedural bias in calculating γHS in quantitative analysis using SiNG-PCRseq.
도 8은 시료 내 앰플리콘의 γHS에 대한 편향을 보정하여 계산한 γHSC를 산포도로 점적한 분포도를 나타낸 도이다.FIG. 8 is a diagram showing a distribution diagram in which γHS C calculated by correcting a deflection of γHS of an amplicon in a sample is dropped as a scatter diagram. FIG.
도 9는 시료 내 앰플리콘의 γHS에 대한 편향을 보정하여 계산한 γHSC에 대해서, 평균 제곱근 편차(Root Mean Square Deviation; RMSD)를 구하여 나타낸 산포도이다.FIG. 9 is a scatter diagram illustrating root mean square deviation (RMSD) obtained for γHS C calculated by correcting a deflection of γHS of an amplicon in a sample. FIG.
도 10은 5개의 gDNA 시료 내에서 증폭된 각각의 앰플리콘에 대한, 표준화된 정량값(standardized quantities, StQs)을 점적한 플롯을 나타낸 도이다. 하단의 축 상에 검은 색으로 표시된 영역은 상염색체 유전자의 앰플리콘을 나타내며, 이의 우측 말단에 회색으로 표시된 영역은 X-염색체 연관된 성염색체 유전자의 앰플리콘을 나타낸다.FIG. 10 is a plot showing standardized quantities (StQs) for each amplicon amplified in five gDNA samples. The area marked in black on the lower axis represents the amplicon of the autosomal gene, and the area marked in gray at the right end of it represents the amplicon of the X-chromosome associated sex chromosome gene.
도 11은 인간 섬유아세포주(Fibroblast; Fib; F1 내지 F3) 및 유도만능줄기세포주(iPSC; I1 내지 I3) 유래의 cDNA 시료로부터 SiNG-PCRseq 방법을 적용한 cDNA 정량을 위한 단계를 나타낸 도이다:FIG. 11 shows the steps for quantifying cDNA from SiC-PCRseq method from cDNA samples derived from human fibroblast (Fib; Fib; F1 to F3) and induced pluripotent stem cell lines (iPSC; I1 to I3):
도 11a는 iPSC 유래의 cDNA 시료 및 Fib 유래의 cDNA 시료에 있어서, 경쟁서열인 오랑우탄 서열에 대한 인간 서열의 상대적인 존재비(relative abundance of HS to PA; RAH/P)를 나타낸다. 이때, *는 변동계수(CV)가 유의적으로 변하는 부분을 나타낸다;FIG. 11A shows the relative abundance of HS to PA (RA H / P ) of a human sequence with respect to a competing orangutan sequence in a cDNA sample derived from iPSC and a cDNA sample derived from Fib. Where * denotes the portion where the coefficient of variation (CV) changes significantly;
도 11b는 iPSC 유래의 cDNA 시료 및 Fib 유래의 cDNA 시료에 있어서, 표준화된 정량값(StQs)의 분포를 나타낸다;FIG. 11B shows the distribution of standardized quantitative values (StQs) for cDNA samples derived from iPSC and cDNA samples derived from Fib; FIG.
도 11c는 iPSC 및 Fib의 각 세포주에 대한 3개의 시료로부터 계산된 StQs 값의 평균을 구하여, 각각의 앰플리콘에 대한 변동계수(coefficient of variation, CV)의 분포를 나타낸다.FIG. 11C shows the distribution of coefficient of variation (CV) for each amplicon by averaging StQs values calculated from three samples for each cell line of iPSC and Fib.
도 12는 본 발명의 SiNG-PCRseq 방법을 이용하여 다양한 세포주 유래의 cDNA 시료를 정량함에 있어서, hTaq 및 fTaq 중합효소의 차이로 인한 StQs 값의 차이를 점적한 산포도를 나타낸 도이다.12 is a diagram showing a scatter plot showing the difference in StQs values due to the difference in hTaq and fTaq polymerase in quantifying cDNA samples derived from various cell lines using the SiNG-PCRseq method of the present invention.
도 13은 본 발명의 SiNG-PCRseq 방법 및 기존의 정량 방법 간의 정량 분석 차이를 확인한 도이다;Figure 13 is a view confirming the difference in quantitative analysis between the SiNG-PCRseq method of the present invention and the existing quantitative method;
도 13a는 iPSC 및 Fib cDNA의 동일 시료에 대하여, RNA-seq 방법 및 SiNG-PCRseq 방법을 통해 정량된 StQs 값의 분포를 점적한 산포도이다;FIG. 13A is a scatter plot showing the distribution of StQs values quantified by RNA-seq method and SiNG-PCRseq method for the same sample of iPSC and Fib cDNA; FIG.
도 13b는 iPSC 및 Fib cDNA의 시료에서, RNA-seq 방법 및 SiNG-PCRseq 방법을 통해 정량된 StQs 값의 배수 차이(I/F)를 나타낸다. I/F의 배수 비교에 있어서, 2배 이상의 차이를 나타내는 유전자를 빈 사각형으로 표시하였고, 이를 이용한 커넬밀도 분석(Kernel density analysis)은 실선 및 우측 축으로 나타내었다.13B shows the fold difference (I / F) of StQs values quantified by RNA-seq method and SiNG-PCRseq method in samples of iPSC and Fib cDNA. In comparison of multiples of I / F, genes showing a difference of two or more times are indicated by empty squares, and kernel density analysis using the same is shown by the solid line and the right axis.
본 발명은The present invention
i) 대상 유래 세포에서 분리된 DNA 또는 cDNA와, 이의 근연종 유래 세포에서 분리된 유전체 DNA(genomic DNA, gDNA)를 혼합하는 단계;i) mixing DNA or cDNA isolated from a subject-derived cell with genomic DNA (gDNA) isolated from a cell of the species of late myelin;
ii) 상기 단계 i)에서 혼합한 DNA 시료를 다중 PCR(multiplex PCR)로 증폭하여 앰플리콘(amplicon)을 수득하는 단계;ii) amplifying the DNA sample mixed in step i) by multiplex PCR to obtain an amplicon;
iii) 상기 단계 ii)에서 수득한 앰플리콘을 포함하는 서열 라이브러리를 서열분석(sequencing)한 후, 이를 기준 서열(reference sequence)과 비교하여 리드값(read count)을 정량하는 단계;iii) sequencing the sequence library comprising the amplicons obtained in step ii) and comparing it with a reference sequence to quantify the read count;
iv) 상기 단계 iii)에서 정량한 리드값을 이용하여, 상기 단계 ii)에서 수득한 앰플리콘에 대한 대상 서열(서열 A)의 비율값(fraction of sequence, γA)을 산출하는 단계;iv) calculating the fraction of sequence (γA) of the target sequence (SEQ ID A) to the amplicon obtained in step ii) using the read value quantified in step iii);
v) 상기 단계 iv)에서 산출한 비율값(γA)을 이용하여, 절차상의 편향을 보정한 편향 보정값(γAc)을 계산하는 단계;v) calculating a deflection correction value γ A c that corrects procedural deflection using the ratio value γ A calculated in step iv);
vi) 상기 단계 v)에서 산출한 편향 보정값(γAc)으로부터 근연종 서열(서열 B)에 대한 대상 서열(서열 A)의 상대적인 존재비(relative abundance of A to B; RAA/B)를 산출하는 단계; 및vi) Calculating the relative abundance of A to B (RA A / B ) of the target sequence (SEQ ID NO: A) to the near-missing sequence (SEQ ID NO: B) from the bias correction value γ A c calculated in step v). Doing; And
vii) 상기 단계 vi)에서 산출한 상대적인 존재비(RAA/B)를 이의 대표값으로 나누어 표준화된 정량값(standardized quantities, StQs)을 산출하는 단계를 포함하는, 시료 내 핵산의 표준화 정량 방법을 제공한다.vii) providing a standardized quantity of nucleic acid in a sample by dividing the relative abundance ratio (RA A / B ) calculated in step vi) by a representative value thereof and calculating standardized quantities (StQs). do.
본 발명에서 용어, "기준 서열"은 유래된 종 및/또는 유전자명으로 염기서열의 정체를 판별할 수 있는 공지의 서열일 수 있다. 예컨대, 상기 기준 서열은 NCBI 레퍼런스 서열 데이터 베이스(NCBI reference sequence database)로부터 검색되는 서열일 수 있으나, 이에 제한되지 않는다.In the present invention, the term "reference sequence" may be a known sequence capable of determining the identity of a nucleotide sequence by the species and / or gene name derived. For example, the reference sequence may be, but is not limited to, a sequence retrieved from an NCBI reference sequence database.
본 발명에서 용어, "리드값"은 각 기준 서열별로 서열분석(sequencing)이 이루어진 횟수를 의미할 수 있다.In the present invention, the term "lead value" may refer to the number of sequencing performed for each reference sequence.
본 발명에서 용어, "대표값"은 특정 1개 염기서열이 갖는 근연종 유전자와의 상대적 존재비 또는 모든 염기서열 또는 일부 염기서열들에 대해 도출된 상대적인 존재비의 평균값일 수 있다. 상기 대표값은 2 이상의 상대적인 존재비들의 산술 평균, 기하 평균, 조화 평균, 분산, 표준편차를 의미하는 것일 수 있으나, 이에 제한되지 않는다.In the present invention, the term "representative value" may be a relative abundance ratio of the relative abundance gene of a specific base sequence or a relative abundance ratio derived for all or some nucleotide sequences. The representative value may mean, but is not limited to, an arithmetic mean, geometric mean, harmonic mean, variance, and standard deviation of two or more relative abundances.
상기 단계 i)의 대상은 미생물, 효모, 포유류 등으로 이의 세포로부터 유전자를 추출하여 정량분석하고자 하는 어떠한 것도 사용할 수 있으며, 이의 근연종은 상기 대상에 대해 당업계에 유전적으로 근연종인 것으로 보고된 어떤 것도 사용될 수 있다. 예컨대, 상기 대상은 인간이고, 이의 근연종은 오랑우탄, 침팬지, 원숭이 또는 고릴라일 수 있으나, 이에 한정되지 않는다.The subject of step i) may be any microorganism, yeast, mammalian, etc. that extracts genes from its cells and wishes to quantitate any of them, the related species of which have been reported genetically related to the subject for the subject. Also may be used. For example, the subject is a human, and a related species thereof may be, but is not limited to, an orangutan, a chimpanzee, a monkey, or a gorilla.
상기 단계 i)의 세포는 그로부터 DNA를 추출하거나 그의 RNA를 이용하여 cDNA를 합성할 수 있는 모든 세포일 수 있다. 구체적으로 상기 대상 유래 세포가 인간 유래 세포일 경우, 인간 림프아세포(lymphoblast), 인간 섬유아세포(fibroblast) 또는 인간 유도만능줄기세포(induced pluripotent stem cells, iPSC)일 수 있으나, 이에 한정되지 않는다.The cells of step i) can be any cell that can extract DNA therefrom or synthesize cDNA using its RNA. Specifically, when the cell of interest is a human-derived cell, it may be human lymphoblast, human fibroblast, or human induced pluripotent stem cells (iPSC), but is not limited thereto.
상기 단계 ii)의 다중 PCR은 다중 경쟁적 PCR(multiple competitive PCR)일 수 있으나, 이에 한정되지 않는다.The multiple PCR of step ii) may be multiple competitive PCR, but is not limited thereto.
상기 단계 iv)의 비율값(γA)은 하기 [수학식 7]을 이용하여 산출할 수 있으나, 이에 한정되지 않는다;The ratio value γA of step iv) may be calculated using Equation 7 below, but is not limited thereto;
[수학식 7][Equation 7]
Figure PCTKR2015009823-appb-I000001
Figure PCTKR2015009823-appb-I000001
상기 단계 v)의 편향 보정값(γAC)은 하기 [수학식 4]를 이용하여 산출할 수 있으나, 이에 한정되지 않는다;The deflection correction value γ A C of step v) may be calculated using Equation 4 below, but is not limited thereto;
[수학식 4][Equation 4]
Figure PCTKR2015009823-appb-I000002
Figure PCTKR2015009823-appb-I000002
상기 식에서, γA 및 γB는 혼합 시료 내의 증폭된 앰플리콘에 있어서, 대상 서열 A 및 근연종 서열 B의 비율(fraction)을 나타내며;Wherein γA and γB represent the fraction of the target sequence A and the related species B in the amplified amplicons in the mixed sample;
pA 및 pB는 대상 서열 A 및 근연종 서열 B가 상이한 몰수로 혼합된 기준 시료 내에 혼합된 혼합비율을 나타내고;pA and pB represent the mixing ratios mixed in the reference sample in which the target sequence A and the related species sequence B were mixed in different moles;
γARef 및 γARef는 대상 서열 A 및 근연종 서열 B가 상이한 몰수로 혼합된 기준 시료 내의 증폭된 앰플리콘에 있어서, 대상서열 A 및 근연종 서열 B의 비율을 나타내며;γA Ref and γA Ref represent the ratio of subject sequence A and myoma sequence B to amplified amplicons in a reference sample in which subject sequence A and myoma sequence B are mixed in different moles;
n은 1 이상의 정수일 수 있으며, 상기 n은 편차보정을 위해 사용한 혼합시료의 갯수를 나타낸다.n may be an integer of 1 or more, wherein n represents the number of mixed samples used for the deviation correction.
상기 단계 vi)의 상대적인 존재비(RAA/B)는 하기 [수학식 8]을 이용하여 산출할 수 있으나, 이에 한정되지 않는다;The relative abundance ratio RA A / B of step vi) may be calculated using Equation 8 below, but is not limited thereto;
[수학식 8][Equation 8]
Figure PCTKR2015009823-appb-I000003
Figure PCTKR2015009823-appb-I000003
상기 단계 vii)의 표준화된 정량값(StQs)은 [수학식 9]를 이용하여 산출할 수 있으나, 이에 한정되지 않는다;The standardized quantitative values StQs of step vii) may be calculated using Equation 9, but are not limited thereto;
[수학식 9][Equation 9]
Figure PCTKR2015009823-appb-I000004
Figure PCTKR2015009823-appb-I000004
상기 [수학식 9]에서 시료 내 대표 RAA/B 값은 상기 본 발명의 표준화 정량 방법에 있어서 단계 vii)의 대표값을 의미하며, 특정 1개 염기서열이 갖는 상대적 존재비 또는 모든 염기 서열 또는 일부 염기서열들에 대해 도출된 상대적인 존재비의 평균값일 수 있다. 상기 대표값은 2 이상의 상대적인 존재비들의 산술 평균, 기하 평균, 조화 평균, 분산, 표준편차를 의미하는 것일 수 있으나, 이에 제한되지 않는다.Representative RA A / B value in the sample in [Equation 9] refers to the representative value of step vii) in the standardized quantitative method of the present invention, the relative abundance or all base sequence or part of a specific one base sequence It can be the average value of the relative abundances derived for the sequences. The representative value may mean, but is not limited to, an arithmetic mean, geometric mean, harmonic mean, variance, and standard deviation of two or more relative abundances.
본 발명의 구체적인 실시예에 있어서, 본 발명자들은 근연종 유전체를 정량 대상 시료에 혼합한 뒤 근연종 서열 간의 유사성을 활용하여 공동 증폭(coamplification)시킨 경쟁 서열분석(spiking-in neighbor genome-coupled competitive PCR amplicon sequencing, SiNG-PCRseq) 방법의 대상 세포로서 사용하기 위해, 본 발명의 대상이 되는 유전체 DNA를 수득하기 위해, 오랑우탄 유래의 세포주 및 인간 유래 세포주를 각각 배양한 후, 배양된 세포로부터 gDNA 또는 cDNA를 수득하였다.In a specific embodiment of the present invention, the inventors have found that closely related species of the mixture after the quantification target sample dielectric utilize similarities between closely related species sequence by co-amplification (coamplification) which compete sequencing (piking- s i n g n eighbor enome In order to obtain the genomic DNA of the present invention for use as a target cell of the -coupled competitive PCR amplicon seq uencing (SiNG-PCRseq) method, the orangutan-derived cell line and the human-derived cell line were each cultured and then cultured. GDNA or cDNA was obtained from the cells.
또한, 본 발명자들은 다중 중합효소 연쇄 반응(multiplex PCR)을 위한 프라이머 염기서열을 제작하기 위해, 인간 유전자에 대한 기준 서열(reference sequence)로서 263개 유전자 서열을 선별하였고, 이에 대한 프라이머 쌍을 20쌍의 총 24개 군으로 나누었다. 그리고, 다중 경쟁적 중합효소반응(multiple competitive PCR)을 수행하여 앰플리콘을 증폭한 다음, 중간체 앰플리콘 DNA를 제조하여 서열 라이브러리를 구축한 결과, 다양한 시료에서 혼합 비율 및 중합효소 종류에 따른 매개자 앰플리콘 DNA의 완성도 및 시료 간 일관성(consistency)이 유지되는 것을 확인하였다(도 2 참조).In addition, the present inventors selected 263 gene sequences as reference sequences for human genes, and prepared 20 pairs of primer pairs to prepare primer sequences for multiplex PCR. The total was divided into 24 groups. In addition, amplicons were amplified by multiple competitive PCR, and intermediate amplicon DNA was prepared to construct a sequence library. As a result, mediator amplicons according to mixing ratios and polymerase types in various samples were obtained. It was confirmed that DNA completeness and consistency between samples were maintained (see FIG. 2).
또한, 본 발명자들은 앰플리콘 DNA 라이브러리를 정량하기 위해, 기준 서열 및 증폭된 앰플리콘 라이브러리를 정량하여 각 앰플리콘에 대한 리드값을 구한 결과, 앰플리콘 중합 시 중합효소로서 사용한 서로 다른 두 중합효소를 이용한 hTaq 셋트 및 fTaq 셋트 내에서 각 시료 간에 높은 상관관계를 가지나 두 셋트 간에는 상관성이 떨어지는 것을 확인하였다(도 3 및 도 4 참조).In addition, the present inventors quantified the amplicon library by quantifying the reference sequence and the amplified amplicon library to obtain read values for each amplicon. Thus, the inventors used two different polymerases that were used as polymerases in the amplicon polymerization. It was confirmed that there was a high correlation between each sample in the used hTaq set and fTaq set, but the correlation between the two sets is inferior (see FIGS. 3 and 4).
또한, 본 발명자들은 gDNA 시료에 대해 본 발명의 SiNG-PCRseq 방법을 적용하기 위해, 증폭된 PCR 앰플리콘에 대한 인간 서열의 비율(fraction of human sequence, γHS)을 구한 결과, 하나의 시료 내에서 각각의 앰플리콘의 리드값이 일직선을 나타내지 않는, 평평하지 않은 형태의 플롯 패턴(plot pattern)을 나타내어 절차상의 편향이 나타난 것을 확인하였고(도 5 참조), 이는 다수의 PCR 반응 및 정제를 반복하는 과정에서 양적인 편차가 발생하게 되며 이는 유전자 서열의 변이구조와 본질적으로 연관된다는 것을 확인하였다(도 6 및 도 7 참조).In addition, the present inventors obtained the ratio of human sequence (γHS) to amplified PCR amplicons in order to apply the SiNG-PCRseq method of the present invention to a gDNA sample, respectively, in a sample It was confirmed that procedural biases were indicated by a non-flat plot pattern, in which the read value of the amplicon of the amplicon was not shown (see FIG. 5), which is a process of repeating a plurality of PCR reactions and purifications. It is confirmed that a quantitative deviation occurs in and is essentially related to the mutant structure of the gene sequence (see FIGS. 6 and 7).
또한, 본 발명자들은 본 발명의 SiNG-PCRseq을 통한 정량 분석에 있어서, 서열의 변이에서 나타나는 앰플리콘의 γHS에 대한 편향을 보정하기 위해, 기존 보고된 방법을 수정하여 시료 내 앰플리콘에 대한 γHS의 편향을 보정한 편향 보정값(γHSC)을 산출한 결과, γHS에서 나타난 편향을 효율적으로 보정하여 균등하게 배열된 정량 패턴을 나타내는 것을 확인하였다(도 8 및 도 9 참조).In addition, in the quantitative analysis of the SiNG-PCRseq of the present invention, the present inventors modified the previously reported method to correct the deflection of the amplicon γHS appearing in the variation of the sequence to the γHS of the amplicon in the sample As a result of calculating the deflection correction value γHS C which corrected the deflection, it was confirmed that the deflection shown in γHS was efficiently corrected to show an evenly arranged quantitative pattern (see FIGS. 8 and 9).
또한, 본 발명자들은 정량한 값을 최종적인 정량값으로서 산출하기 위하여, 시료 내 경쟁서열인 오랑우탄 서열에 대한 인간 서열의 상대적인 존재비(RAH/P)를 구하고, 이들 각각을 상염색체 서열들의 RAH/P 평균값에 대한 상대적 값으로 표준화된 정량값(standardized quantities, StQs)을 구한 결과 모든 서열에서 서로 간의 상대적 정량비를 정확하게 반영하였고, 본 발명의 SiNG-PCRseq가 다른 조건의 실험 환경을 통해 반복되었을 때에도 그 정확성(accuracy)이 유지됨을 확인하였다(도 10 참조).In addition, the present inventors obtain the relative abundance ratio (RA H / P ) of the human sequence to the orangutan sequence, which is the competition sequence in the sample, in order to calculate the final quantitative value as the final quantitative value, and each of them is RA H of the autosomal sequences. The standardized quantities (StQs) were calculated as relative to the average value of / P , and the relative quantitative ratios of each of the sequences were accurately reflected, and the SiNG-PCRseq of the present invention was repeated through different experimental conditions. It was confirmed that the accuracy (accuracy) is maintained even when (see FIG. 10).
또한, 본 발명자들은 본 발명의 SiNG-PCRseq가 실제로 다양한 세포주 유래의 cDNA 시료에서도 서로 다른 실험조건에서 재현성이 있는지를 확인하였다. Fib 및 iPSC로부터 합성한 cDNA 시료에 대해 두 개의 서로 다른 중합효소를(hTaq, fTaq) 이용하여 SiNG-PCRseq 방법을 적용하였고, Sequencing 분석을 통해 구해진 RAH/P 값들을 모든 염기서열들의 RAH/P 값 평균으로 표준화한 결과, 시료에 혼합한 오랑우탄 gDNA의 상대적 양에 관계없이 일정한 표준화 값들을 얻는데 성공하였다(도 11 참조). 또한 서로 다른 중합효소를 이용해 얻은 표준화 정량값들의 상관성(R2)도 0.92 이상으로 높은 재현성을 나타내는 것을 확인하였다(도 12 및 표 6 참조).In addition, the inventors confirmed that the SiNG-PCRseq of the present invention is actually reproducible under different experimental conditions even in cDNA samples derived from various cell lines. For a cDNA sample prepared from Fib and iPSC two different polymerase a (hTaq, fTaq) used in SiNG-PCRseq method was applied to, through the Sequencing Analysis calculated RA H / a P value RA of all the base sequence H / As a result of normalization with an average of P values, it was successful to obtain constant normalized values regardless of the relative amount of orangutan gDNA mixed in the sample (see FIG. 11). In addition, it was confirmed that the correlation (R 2 ) of the standardized quantitative values obtained using different polymerases showed high reproducibility of 0.92 or more (see FIG. 12 and Table 6).
또한, 본 발명자들은 본 발명의 SiNG-PCRseq 방법을 통한 시료 내 유전자의 정량 결과가, 기존에 사용되고 있는 유전자 정량 방법과 차이를 나타내는지 확인하기 위해, RNA-seq를 기반으로 시료 내 유전자 간 정량 분석을 수행하여 비교한 결과, RNA-seq 방법 및 SiNG-PCRseq 방법을 통해 정량된 StQs 값의 상관관계(R2)가 매우 낮음을 밝혀 한 시료 내에서 서로 다른 염기서열 간의 양적 비교에 대한 두 정량법 간의 차이를 확인하였다(도 13a 참조). 그러나 특정 염기서열에 대해 서로 다른 시료 간의 정량적 차이를 밝혀내는 데에는 두 정량방법 사이에 큰 차이가 없다는 것을 확인하였다(도 13b 참조).In addition, the present inventors quantitative analysis between genes in a sample based on RNA-seq in order to check whether the quantitative result of the gene in the sample by the SiNG-PCRseq method of the present invention is different from the conventional method for quantifying genes. between the performance compared result, RNA-seq method and the correlation of SiNG-PCRseq the StQs value quantified by the method relationship (R 2) is very low with out a sample two assay for each other quantitative comparison of the different sequences within the The difference was confirmed (see FIG. 13A). However, it was confirmed that there is no significant difference between the two quantification methods to reveal the quantitative difference between the different samples for the specific nucleotide sequence (see Fig. 13b).
따라서, 본 발명의 SiNG-PCRseq 방법은 세포주, 중합 효소와 같은 변할 수 있는 다양한 실험 조건에서 표준화된 상대적인 DNA 정량값을 제공할 수 있고, 상기 제공된 표준화된 DNA 정량값은 기존 핵산 정량 분석방법으로 사용되는 RNA-seq 방법과 비교하였을 때, 시료 내 핵산에 대한 상대적인 정량값을 실험 조건이나 시료 종류에 관계없이 정확하고 간편하게 제공하므로, 본 발명의 SiNG-PCRseq 방법은 시료 내 핵산을 정량하는 표준화된 분석 방법에 유용하게 사용될 수 있다.Thus, the SiNG-PCRseq method of the present invention can provide standardized relative DNA quantitative values in a variety of variable experimental conditions such as cell lines, polymerases, and the standardized DNA quantitative values provided above can be used as conventional nucleic acid quantitative assays. Compared with the RNA-seq method, the relative quantitative value of the nucleic acid in the sample is accurately and easily provided regardless of the experimental conditions or the type of the sample. It can be useful for the method.
또한, 본 발명은In addition, the present invention
i) 상기 본 발명의 표준화 정량 방법을 이용하여 대상 유래의 서로 다른 유전자들 사이의 표준화된 정량값을 산출하는 단계; 및i) calculating standardized quantitative values between different genes of a subject using the standardized quantitative method of the present invention; And
ii) 상기 단계 i)의 표준화 정량값을 통해 대상인 하나의 종 내의 서로 다른 유전자 간의 표준화된 비율을 계산하는 단계를 포함하는, 동일 종 내의 서로 다른 유전자 간의 표준화된 비율에 대한 정보를 제공하는 방법을 제공한다.ii) calculating a standardized ratio between different genes in one species of interest using the standardized quantitative value of step i). to provide.
상기 방법은 미지의 생물표본의 분석에 적용하여 해당 세포학적, 조직학적 정체를 판단하는데 유용하게 사용될 수 있다.The method can be useful for determining the cytological and histological identity by applying to the analysis of unknown biosamples.
아울러, 본 발명은In addition, the present invention
i) 상기 본 발명의 표준화 정량 방법을 이용하여, 질병이 있거나 발병이 의심되는 개체로부터 분리한 대상 세포에 대해, 상기 질병 또는 질병으로 이행하는 과도기적 특성을 가진 질병적 상태를 특징하는 유전자 전사체들 또는 유전체 내 대상 염기서열들의 표준화된 정량값을 산출하는 단계; 및i) Gene transcripts which are characterized by a disease state having a transitional characteristic to the disease or disease, for the target cell isolated from the diseased or suspected disease, using the standardized quantitative method of the present invention Or calculating a standardized quantity of target sequences in the genome; And
ii) 상기 단계 i)의 표준화 정량값을 동종의 정상 개체로부터 분리한 세포의 전사체 또는 유전체로부터 산출된 동일한 염기서열의 표준화된 정량값과 비교하는 단계를 포함하는, 상기 유전자 관련 질병을 진단하기 위한 정보를 제공하는 방법을 제공한다.ii) diagnosing the gene related disease comprising comparing the standardized quantitative value of step i) with a standardized quantitative value of the same nucleotide sequence generated from a transcript or genome of a cell isolated from a homogenous normal individual. It provides a method of providing information for.
본 발명의 용어, "정상 개체"는 질병이 발병하였거나 발병할 가능성이 없는 건강한 개체를 의미할 수 있다.As used herein, the term "normal individual" may refer to a healthy individual who has or is unlikely to develop a disease.
상기 유전자 관련 질병을 진단하기 위한 정보를 제공하는 방법에 있어서 사용되는 표준화된 정량값은 산출된 표준화된 정량값의 분포일 수 있으나, 이에 제한되지 않는다.The standardized quantitative value used in the method for providing information for diagnosing the gene-related disease may be a distribution of the calculated standardized quantitative value, but is not limited thereto.
상기 단계 i)의 대상으로서 미생물, 효모, 포유류 등의 세포를 유전자를 추출하여 정량분석하기 위한 세포로 사용할 수 있으며, 이의 근연종은 상기 대상으로 하는 세포에 대해 당업계에 유전적으로 근연종인 것으로 보고된 어떤 것도 사용할 수 있다. 구체적으로, 상기 대상은 인간이고, 이의 근연종은 오랑우탄, 침팬지, 원숭이 또는 고릴라일 수 있으나, 이에 한정되지 않는다.As a subject of step i), cells such as microorganisms, yeasts, mammals, etc. may be used as cells for extracting and quantitating genes, which are reported to be genetically related to the cells of the subject. Anything can be used. Specifically, the subject is a human, and a related species thereof may be an orangutan, a chimpanzee, a monkey, or a gorilla, but is not limited thereto.
상기 단계 i)의 세포는 대상 세포로부터 DNA를 추출하기 위해 사용될 수 있는 모든 세포일 수 있다. 예컨대, 상기 세포는 섬유아세포 또는 유도만능줄기세포일 수 있으나, 이에 한정되지 않는다.The cell of step i) can be any cell that can be used to extract DNA from the cell of interest. For example, the cells may be fibroblasts or induced pluripotent stem cells, but are not limited thereto.
상기 단계 i)의 "질병을 특징하는 유전자 전사체"는 특정 질병의 발생 시 그 발현이 증가 또는 감소하는 등 그 질병과의 관련성이 알려져 있는 유전자 전사체(transcriptome)를 의미할 수 있다.The "gene transcript characterizing the disease" of step i) may mean a gene transcriptome known to be associated with the disease, such as its expression increases or decreases when a particular disease occurs.
"질병으로 이행하는 과도기적 특성을 가진 질병적 상태"는 질병의 발생 전 나타나는 전조증상 또는 초기증상과 관련성이 알려져 있는 유전자 전사체를 의미할 수 있다."Disease state with transitional characteristics to disease" may refer to a gene transcript known to be associated with prognostic or early symptoms appearing before the onset of the disease.
상기 단계 ii)의 다중 PCR은 다중 경쟁적 PCR일 수 있으나, 이에 한정되지 않는다.The multiple PCR of step ii) may be multiple competitive PCR, but is not limited thereto.
예컨대, 상기 단계 ii)에서 단계 i)의 표준화 정량값을 동종의 정상 개체로부터 분리한 세포로부터 산출된 동일한 유전자의 표준화된 정량값과 비교하였을 때, 서로 상이한 값을 나타내는 경우 질병이 발병하였거나, 발병할 가능성이 있는 것으로 판단하는 단계를 단계 ii) 이후에 추가로 포함할 수 있다.For example, when the standardized quantitative value of step i) in step ii) is compared with the standardized quantitative value of the same genes calculated from cells isolated from homogenous normal individuals, the disease develops or develops when different values are shown. Determining that there is a possibility to further include after step ii).
구체적으로, 대상 세포로부터 도출한 표준화 정량값의 분포가 동종의 정상 개체로부터 분리한 세포로부터 산출된 동일한 유전자의 표준화된 정량값 분포에 비해 더 낮은 또는 더 높은 영역으로 이동한 경우 해당 유전자와 관련된 질병이 발병하였거나, 발병할 가능성이 있는 것으로 판단할 수 있다.Specifically, a disease associated with the gene when the distribution of the standardized quantitative value derived from the target cell has moved to a lower or higher region compared to the standardized quantitative distribution of the same gene produced from cells isolated from homogenous normal individuals. It can be judged that this disease has or is likely to occur.
즉, 상기 본 발명에 따른 유전자의 표준화 정량값을 이용하여 유전자 관련 질병을 진단하기 위한 정보를 제공하는 방법은 특정 유전자에 한정되는 것이 아니며, 유전자의 종류에 무관하게 적용할 수 있다.That is, the method of providing information for diagnosing a disease related to a gene using a standardized quantitative value of the gene according to the present invention is not limited to a specific gene, and may be applied regardless of the type of gene.
본 발명의 SiNG-PCRseq 방법은 세포주, 중합 효소와 같은 가변적인 요소를 포함하는 다양한 실험 조건에서 표준화된 상대적인 DNA 정량값을 제공할 수 있고, 상기 제공된 표준화된 DNA 정량값은 기존 핵산 정량 분석방법으로 사용되는 RNA-seq 방법과 비교하였을 때, 시료 내 핵산에 대한 상대적인 정량값을 실험 조건이나 시료 종류에 관계없이 정확하고 간편하게 제공할 수 있으므로, 본 발명의 핵산 표준화 정량 방법을 통해 제공되는 대상 내 상대적인 표준화 정량값을 통해, 유전자 관련 질병 진단의 정보를 유용하게 제공할 수 있다.The SiNG-PCRseq method of the present invention can provide standardized relative DNA quantitative values under various experimental conditions including variable elements such as cell lines and polymerases, and the standardized DNA quantitative values provided above can be used as conventional nucleic acid quantitative assays. Compared with the RNA-seq method used, the relative quantitative value of the nucleic acid in the sample can be provided accurately and simply regardless of the experimental conditions or the type of the sample. Standardized quantitative values can provide useful information on diagnosis of gene-related diseases.
아울러, 본 발명은In addition, the present invention
i) 상기 유전자 관련 질병을 진단하기 위한 정보를 제공하는 방법에 의하여 질병이 진단된 개체에게 질병의 치료제를 투여하는 단계를 포함하는, 질병의 치료방법을 제공할 수 있다.i) by providing a method for providing information for diagnosing the disease associated with the gene may provide a method for treating the disease, comprising administering a therapeutic agent for the disease to the individual diagnosed with the disease.
상기 질병의 치료제는 공지의 치료제를 임의로 사용할 수 있다.The agent for treatment of the disease may optionally use a known agent.
상기 개체는 진단된 질병이 발병되었거나 발병할 가능성이 있는 인간을 포함한 모든 동물을 의미할 수 있다. 상기 동물은 인간뿐만 아니라 이와 유사한 증상의 치료를 필요로 하는 소, 말, 양, 돼지, 염소, 낙타, 영양, 개, 고양이 등의 포유동물일 수 있으나 이에 제한되지는 않는다.The subject may mean any animal, including humans, who have or are likely to develop a diagnosed disease. The animal may be a mammal such as, but not limited to, a human, a cow, a horse, a sheep, a pig, a goat, a camel, a antelope, a dog, a cat, and the like, which require treatment of similar symptoms.
상기 치료제는 약학적으로 유효한 양으로 투여될 수 있다.The therapeutic agent may be administered in a pharmaceutically effective amount.
이하, 실시예를 통하여 본 발명을 더욱 상세히 설명하고자 한다. 이들 실시예는 본 발명을 보다 구체적으로 설명하기 위한 것으로, 본 발명의 범위가 이들 실시예에 의해 제한되는 것은 아니다.Hereinafter, the present invention will be described in more detail with reference to Examples. These examples are intended to illustrate the present invention more specifically, but the scope of the present invention is not limited by these examples.
<실시예 1> 오랑우탄 및 인간 유래 세포주의 배양 및 유전체 DNA 수득Example 1 Culture of Orangutans and Human-derived Cell Lines and Obtaining Genomic DNA
본 발명의 대상이 되는 유전체 DNA를 수득하기 위해, 오랑우탄 유래의 세포주 및 인간 유래 세포주를 각각 배양한 후, 배양된 세포로부터 gDNA 또는 cDNA를 수득하였다.In order to obtain the genomic DNA of the present invention, orangutan-derived cell lines and human-derived cell lines were cultured, respectively, and gDNA or cDNA was obtained from the cultured cells.
구체적으로, 수컷 오랑우탄 유래의 림프아구성 세포주(lymphoblastoid cell line; Public Health England, 영국)를 준비하여, 15% 소태아혈청(fetal bovine serum, FBS)를 포함하는 RPMI1540 배지에서 배양하였다. 인간 섬유아세포주로서는, 인간 남성 포피(foreskin) 유래의 섬유아세포주(fibroblast; System Biosciences)를 준비하여, 15% FBS를 포함하는 DMEM 배지에서 배양하였다. 인간 유래의 유도만능줄기세포(induced pluripotent stem cells, iPSC)는, 인간 불명성 림프아구성 세포주로부터 한국 한의학연구원의 기관감사위원회의 승인 하에 제공되는 프로토콜(I-1210/002/002-02)에 따라 제조되었고, 0.5 mM 소듐 부티레이트(sodium butylate; Sigma 사, 미국) 및 25 μM SB431542(Sigma 사, 미국)를 포함하는 mTesR1 조건 배지(mTesR1 conditional medium; Stemcell Technologies Inc.)에서 배양하였다.Specifically, a male orangutan-derived lymphoblastoid cell line (Public Health England, UK) was prepared and cultured in RPMI1540 medium containing 15% fetal bovine serum (FBS). As human fibroblasts, fibroblasts (System Biosciences) derived from human male foreskin were prepared and cultured in DMEM medium containing 15% FBS. Human derived induced pluripotent stem cells (iPSCs) are derived from a human unknown lymphoblastic cell line (I-1210 / 002 / 002-02) provided under the approval of the Institutional Audit Committee of the Korean Institute of Oriental Medicine. Were prepared and incubated in mTesR1 conditional medium (Stemma Technologies Inc.) containing 0.5 μM sodium butylate (Sigma, USA) and 25 μM SB431542 (Sigma, USA).
배양한 오랑우탄 유래 림프아구성 세포주는 G-DEX™ IIC(iNtRON Biotechnology, 한국)을 이용하여, 제조사의 제공하는 프로토콜에 따라 오랑우탄의 유전체 DNA(genomic DNA, gDNA)를 분리하였다. 인간 유래 섬유아세포주(Fib) 및 인간 유래 유도만능줄기세포(iPSC)는 트리졸 시약(TRIzol® Reagent; Life Technologies 사, 미국)을 이용하여, 제조사의 제공하는 프로토콜을 따라 전체 세포성 RNA를 추출하였고, 추출한 RNA 1 ㎍을 주형(template)으로 하여, 제조사의 제공하는 프로토콜에 따라 cDNA 합성 키트(cDNA synthesis kit; BioRad 사, 미국)을 이용해, 인간 cDNA 시료를 제조하였다. 제조된 gDNA 또는 cDNA는 하기 [표 1] 내지 [표 3]에 나타난 바와 같은 조성으로 혼합하여 DNA 시료를 준비하였다. [표 1] 내지 [표 3]은 각각 인간 유전체 DNA(gDNA) 시료, 인간 섬유아세포주 유래 cDNA 시료 및 인간 유도만능줄기세포 유래 cDNA 시료의 혼합 비율을 나타낸다.Cultured orangutan-derived lymphoblastic cell lines were isolated from genomic DNA (gDNA) of orangutans using G-DEX ™ IIC (iNtRON Biotechnology, Korea) according to the manufacturer's protocol. Human-derived fibroblast line (Fib) and human-derived pluripotent stem cell (iPSC) were extracted with whole cell RNA using Trizol® Reagent (TRIzol ® Reagent; Life Technologies, USA) following the manufacturer's protocol Human cDNA samples were prepared using cDNA synthesis kit (cDNA synthesis kit; BioRad, USA) according to the protocol provided by the manufacturer using 1 μg of the extracted RNA as a template. The prepared gDNA or cDNA was mixed with a composition as shown in the following [Table 1] to [Table 3] to prepare a DNA sample. Tables 1 to 3 show the mixing ratios of human genomic DNA (gDNA) samples, human fibroblast line-derived cDNA samples, and human induced pluripotent stem cell-derived cDNA samples, respectively.
표 1
시료명 G10 G9 G7 G5 G3 G1 G0
인간(HS) (20 ng) 10 9 7 5 3 1 0
오랑우탄(PA) (20 ng) 0 1 3 5 7 9 10
Table 1
Sample name G10 G9 G7 G5 G3 G1 G0
Human (HS) (20 ng) 10 9 7 5 3 One 0
Orangutan (PA) (20 ng) 0 One 3 5 7 9 10
표 2
시료명 F1 F2 F3
인간(HS) (2.5 ng) 4 2 1
오랑우탄(PA) (1 ng) 1 1 1
TABLE 2
Sample name F1 F2 F3
Human (HS) (2.5 ng) 4 2 One
Orangutan (PA) (1 ng) One One One
표 3
시료명 I1 I2 I3
인간(HS) (2.5 ng) 4 2 1
오랑우탄(PA) (1 ng) 1 1 1
TABLE 3
Sample name I1 I2 I3
Human (HS) (2.5 ng) 4 2 One
Orangutan (PA) (1 ng) One One One
<실시예 2> 다중 중합효소 연쇄 반응(multiplex PCR)을 위한 프라이머 염기서열의 제작 및 다중 경쟁적 중합효소반응(multiple competitive PCR) 수행을 통한 앰플리콘의 증폭<Example 2> Amplicon amplification by preparing primer sequences for multiplex PCR and performing multiple competitive PCR
<2-1> 정량 대상 유전자의 선별<2-1> Selection of genes to be quantified
인간 유래 세포로부터 제조한 cDNA 중에서, 본 발명의 대상이 되는 유전자를 선별하기 위해서, 오랑우탄 유전체 서열과 상동 서열(homologous sequence)인 구간(stretches)을 검색하였다.Among the cDNAs prepared from human-derived cells, in order to select genes of the present invention, stretches were searched for orangutan genome sequences and homologous sequences.
구체적으로, NCBI 레퍼런스 서열 데이터 베이스(NCBI reference sequence database)로부터 유전자 mRNA 서열을 검색하였다. 그런 다음, 검색한 mRNA 서열을 BLAST를 통해 수마트라 오랑우탄(퐁고 아벨리, Pongo abelli)의 유전체 서열(NCBI Pongo_pygmaeus_abelii-2.0.2 assembly)과 배열 비교하여, 소수의 뉴클레오티드(nucleotide)를 이종간변이(inter-species variation, ISV)로 포함하는 상동서열구간을 검색하여, 인간 유전자에 대한 기준 서열(reference sequence)로서 263개 유전자 서열을 선별하였다.Specifically, gene mRNA sequences were retrieved from the NCBI reference sequence database. Then, the searched mRNA sequence is arranged and compared with the genome sequence (NCBI Pongo_pygmaeus_abelii-2.0.2 assembly) of Sumatra orangutan (Pongo abelli) through BLAST. By searching for homologous sequences including species variation (ISV), 263 gene sequences were selected as reference sequences for human genes.
<2-2> 프라이머 염기서열 군의 제작<2-2> Preparation of Primer Sequence Group
선별한 유전자 서열을 다중 PCR(multiplex PCR)로 증폭하기 위해, 프라이머 염기서열을 선별하여 군으로 나누었다.In order to amplify the selected gene sequence by multiplex PCR, primer sequences were selected and divided into groups.
구체적으로, 상기 실시예 <2-1>에서 선별한 기준 서열 유전자 263개에 대하여, 시료 내 인간 cDNA 서열 및 오랑우탄 gDNA 서열 모두에 동시에 결합하여 이들을 다중 PCR을 통해 평균 67 bp의 크기(표준편차, SD=10.4)의 단편으로 증폭하면서, 증폭된 앰플리콘(amplicon) 산물에서 소수의 이종간변이(ISV) 뉴클리오티드를 인식할 수 있는 부위를 증폭하는 것을 통해 종의 기원을 식별할 수 있도록 하는 프라이머쌍을 기존에 보고된 Primer3 프로그램을 (Untergasser, A. et al., Nucleic Acids Res., 40: e115, 2012) 이용하여 설계하였다. 선택한 프라이머 염기서열은, 단일 전사체에 대해서 복합적인 프라이머를 사용하는 경우 생산될 수 있는, 의도치 않은 앰플리콘의 생산을 피할 수 있는 방식으로 프라이머 쌍을 20쌍의 총 24개 군으로 나누었다.Specifically, for the 263 reference sequence genes selected in Example <2-1>, they bind to both the human cDNA sequence and the orangutan gDNA sequence in the sample at the same time, and average the size of the standard 67 bp through multiple PCR (standard deviation, A primer to identify the origin of a species by amplifying a fragment of SD = 10.4) and amplifying a site capable of recognizing a small number of heterologous (ISV) nucleotides in the amplified amplicon product Pairs were designed using the previously reported Primer3 program (Untergasser, A. et al. , Nucleic Acids Res., 40: e115, 2012). The selected primer sequences were divided into a total of twenty-four groups of primer pairs in a manner that avoids unintended production of amplicons, which can be produced when using multiple primers for a single transcript.
<2-3> 다중 PCR 수행을 통한 기준 서열(reference sequence)의 동시 증폭<2-3> Simultaneous Amplification of Reference Sequence by Performing Multiple PCR
동일한 시료(sample) 내에 존재하는 인간 유래 gDNA 또는 cDNA와 혼합된 오랑우탄 유래 gDNA를 동시에 증폭하기 위해, 다중 PCR을 수행하여 앰플리콘을 증폭하였다.In order to simultaneously amplify orangutan derived gDNA mixed with human-derived gDNA or cDNA present in the same sample, multiple PCR was performed to amplify the amplicons.
구체적으로, 상기 <실시예 1>에서 제조하고 오랑우탄 유래의 gDNA와 혼합한, G10 내지 G0의 인간 gDNA 시료, F1 내지 F3의 인간 Fib cDNA 시료, 및 I1 내지 I3의 인간 iPSC cDNA 시료를 준비하였다. 그런 다음, 준비한 각각의 DNA 시료를 주형으로 하고, 중합 효소로서 Solg™ h-Taq DNA 중합효소(hTaq; 솔젠트 사, 한국) 또는 FastStart Taq 중합효소(fTaq; Roche 사, 스위스) 중 어느 하나를 각각 사용하여, 상기 실시예 <2-2>에서 나눈 24개의 프라이머 군을 각각 사용해, 각각의 DNA 시료 당 총 24회(round) 씩 다중 경쟁적 PCR을 수행하였다. 다중 PCR의 반응 조건은 하기 [표 4]에서 기재한 바와 같다.Specifically, G10 to G0 human gDNA samples, F1 to F3 human Fib cDNA samples, and I1 to I3 human iPSC cDNA samples prepared in Example 1 and mixed with gDNA derived from orangutans were prepared. Then, each prepared DNA sample was used as a template, and either a Solg ™ h-Taq DNA polymerase (hTaq; Solgent, Korea) or FastStart Taq polymerase (fTaq; Roche, Switzerland) was used as a polymerase. Using each, each of the 24 primer groups divided in Example <2-2> was used to perform multiple competitive PCRs for a total of 24 rounds per each DNA sample. The reaction conditions of the multiple PCR are as described in the following [Table 4].
표 4
온도(℃) 시간 비고
95 15분 효소 활성화
95 20초 40 주기(cycle) 반복 수행
55 25초
65 2분
Table 4
Temperature (℃) time Remarks
95 15 minutes Enzyme activation
95 20 seconds 40 cycle iterations
55 25 seconds
65 2 minutes
<2-4> 서열분석을 위한 중간체(intermediary) 앰플리콘의 제조<2-4> Preparation of Intermediary Amplicons for Sequencing
증폭된 앰플리콘 산물의 서열을 분석하기 위해서, 앰플리콘의 5'-말단 인산화(5'-end phosphorylation), 어댑터 결합(adaptor ligation)을 수행하고, 유세포 부착(flow cell attachment), 서열 분석 프라이머 결합 및 바코딩(barcoding)을 가능하게 하는 서열 모듈(sequence module)을 앰플리콘에 부착하기 위한 추가적인 2회의 PCR 증폭을 수행하였다.To analyze the sequence of the amplified amplicon product, 5'-end phosphorylation of the amplicon, adapter ligation, flow cell attachment, sequencing primer binding And two additional PCR amplifications for attaching a sequence module to the amplicon to enable barcoding.
구체적으로, 상기 실시예 <2-3>에서 다중 PCR을 수행하고 수득한 각각의 앰플리콘은 Expin™ PCR 키트(GeneAll 사, 한국)를 사용하여 정제하였다. 정제한 앰플리콘 군을 1 ㎍ 수득하여, T4 폴리뉴클리오티드 인산화효소(T4 Polynucleotide Kinase; Promega 사, 미국)을 사용해 제조사에서 제공하는 프로토콜에 따라 앰플리콘의 5'-말단을 인산화하였다. 그런 다음, 인산화한 6 ng 앰플리콘에 T4 리가아제(T4 ligase; Promega 사)를 이용해서 10 ㎍의 Y-형상 어댑터 분자(Y-shaped adaptor molecule, 15 μM)를 부착하여 일루미나 시퀀싱 플랫폼(Illumina sequencing platform)을 구축하였다.Specifically, each of the amplicons obtained by performing multiple PCR in Example <2-3> was purified using an Expin ™ PCR kit (GeneAll, South Korea). 1 µg of the purified amplicon group was obtained and phosphorylated using the T4 polynucleotide kinase (T4 Polynucleotide Kinase; Promega, USA) according to the manufacturer's protocol to phosphorylate the 5'-end of the amplicon. The phosphorylated 6 ng amplicon was then attached with a 10 μg Y-shaped adapter molecule (15 μM) using a T4 ligase (Promega) to illuminate sequencing. platform).
부착 후, 정제된 각각의 앰플리콘에 서열 모듈을 부착하기 위한 PCR을 수행하였다. 상기 정제된 앰플리콘 4 pg, MP1 및 MP2 프라이머, 및 hTaq 또는 fTaq 중합효소를 혼합하여 1차 PCR 반응하였으며, 반응 조건은 하기 [표 5]에 기재된 바에 따랐다. 그런 다음, 1차 PCR 반응의 앰플리콘 산물의 0.05배 부피를, IdxPs 중 하나 및 MP1의 프라이머와 혼합하고, 하기 [표 5]의 반응 조건에 따라 2차 PCR을 수행하여, 서열 모듈을 부착함으로써, 중간체 앰플리콘 DNA 제조를 통한 서열 라이브러리를 구축하였다.After attachment, PCR was performed to attach a sequence module to each purified amplicon. The purified amplicon 4 pg, MP1 and MP2 primers, and hTaq or fTaq polymerase was mixed and the first PCR reaction, the reaction conditions were as described in Table 5 below. The 0.05-fold volume of the amplicon product of the first PCR reaction was then mixed with one of the IdxPs and the primers of MP1, and subjected to a second PCR according to the reaction conditions of Table 5 below to attach the sequence module. The sequence library was constructed by preparing intermediate amplicon DNA.
표 5
1차 PCR 2차 PCR
온도(℃) 시간 비고 온도(℃) 시간 비고
95 10분 효소 활성화 95 10분 효소 활성화
95 10초 20 주기 반복 수행 95 10초 10 주기 반복 수행
65 30초 65 30초
72 2분 72 2분
Table 5
Primary PCR 2nd PCR
Temperature (℃) time Remarks Temperature (℃) time Remarks
95 10 minutes Enzyme activation 95 10 minutes Enzyme activation
95 10 sec 20 cycle repetition 95 10 sec 10 cycle repetition
65 30 seconds 65 30 seconds
72 2 minutes 72 2 minutes
<2-5> 중간체 앰플리콘 DNA의 완성도(integrity) 확인<2-5> Confirmation of Integrity of Intermediate Amplicon DNA
중간체 앰플리콘 DNA를 제조하는 과정을 통해 구축되는 서열 라이브러리에 있어서, 다양한 시료에서의 혼합 비율 및 중합효소 종류에 따른 매개자 앰플리콘 DNA의 완성도 및 시료 간 일관성(consistency)이 유지되는지 확인하기 위해, 전기영동을 수행하여 다중 경쟁적 PCR의 각 단계에서 시료 내 앰플리콘을 확인하였다.In the sequence library constructed through the preparation of the intermediate amplicon DNA, in order to confirm the integrity of the mediator amplicon DNA and the consistency between the samples according to the mixing ratios and polymerase types in various samples, The electrophoresis was performed to identify the amplicons in the sample at each stage of multiple competitive PCR.
구체적으로, 상기 실시예 <2-3> 및 <2-4>에서 제조된, 다양한 시료로부터 hTaq 또는 fTaq을 이용하여 증폭되고 구축된 서열 라이브러리를 수득하였다. 그런 다음, 상기 실시예 <2-3>에서 ~21-플렉싱(plexing) PCR을 통해 증폭한 앰플리콘 산물, 혼합된 앰플리콘 풀(pool) 및 상기 실시예 <2-4>에서 중간체 앰플리콘 DNA를 제조하기 위한 1차 PCR 및 2차 PCR 산물을, 12% 폴리아크릴아마이드겔(polyacrylamide gel)에 전기영동하여 지도화(mapping)하였다.Specifically, sequence libraries amplified and constructed using hTaq or fTaq were obtained from various samples prepared in Examples <2-3> and <2-4>. Then, the amplicon product amplified by ˜21-plexing PCR in Example <2-3>, a mixed amplicon pool, and the intermediate amplicon in Example <2-4>. Primary PCR and secondary PCR products for preparing DNA were mapped by electrophoresis on a 12% polyacrylamide gel.
그 결과, 도 2에서 나타나는 바와 같이 hTaq 또는 fTaq의 중합효소 쌍을 이용하여, 다양한 시료로부터 구축된 서열 라이브러리가 제조된 것을 확인하였고, 이들은 시료 간에 일관성을 가지는 것을 확인하였다(도 2).As a result, it was confirmed that sequence libraries constructed from various samples were prepared using polymerase pairs of hTaq or fTaq as shown in FIG. 2, and they were confirmed to be consistent between samples (FIG. 2).
<실시예 3> 앰플리콘 DNA 라이브러리의 서열분석Example 3 Sequencing of Amplicon DNA Libraries
<3-1> 시료-특이적인 기준 서열(reference sequence)의 서열분석<3-1> Sequencing of Sample-Specific Reference Sequences
NCBI 데이터베이스로부터 검색한 인간 및 오랑우탄의 기준 서열 분석에 있어서, 서열 변이가 나타남에 따라 본 발명에서 순조롭게 이용될 수 없으므로, 순수한 인간 gDNA 및 오랑우탄 gDNA로부터 구축된 앰플리콘 서열 라이브러리의 서열을 분석하였다.In baseline sequence analysis of human and orangutans retrieved from the NCBI database, sequences of amplicon sequence libraries constructed from pure human gDNA and orangutan gDNA were analyzed because sequence variations were not readily available in the present invention.
구체적으로, 상기 <실시예 1>에서 제조한 G10 또는 G0 시료를 주형으로부터 상기 <실시예 2>의 방법을 수행하여 앰플리콘 서열 라이브러리를 구축하였다. 그런 다음, 구축된 서열 라이브러리를 동일한 양으로 군에 따라 나눈 다음, 단일-말단으로 100 bp 씩 서열을 읽을 수 있는 Illumina HiSeq 2500 플랫폼을 사용하여 다중 평행적 서열분석(multiple parallel sequence)을 수행하여 시료 각각의 서열을 분석하였다. 그런 다음, NCBI 레퍼런스 서열 데이터 베이스(NCBI reference sequence database)와 상기 분석한 서열의 앰플리콘 리드(read)를 비교하기 위해 BLAST를 수행하였다. 수행 후, 75 bp 이상의 배열된 길이가 90% 이상의 쿼리 커버리지(query coverage)를 나타내는 기준 서열에 해당하는 리드를 수득하였다. 그런 다음, 각각의 앰플리콘 표적의 리드들을 다중으로 배열하여 일치된 염기서열 구조를 결정하였다. 이때 배열상의 각 뉴클레오티드 부위에서 최소 25% 이상을 차지하는 서열들은 존치되었다.Specifically, the amplicon sequence library was constructed by performing the method of <Example 2> from the G10 or G0 sample prepared in <Example 1> from the template. The constructed sequence library is then divided into groups in equal amounts, followed by multiple parallel sequencing using the Illumina HiSeq 2500 platform, which is capable of reading 100 bp sequences at single-ends. Each sequence was analyzed. Then, BLAST was performed to compare the amplicon reads of the analyzed sequence with the NCBI reference sequence database. After execution, reads corresponding to a reference sequence of at least 75 bp in arranged length exhibiting at least 90% query coverage were obtained. Then, the reads of each amplicon target were arranged in multiples to determine the matched sequence structure. At this time, sequences occupying at least 25% of each nucleotide portion of the array were present.
이형 접합적 위치(heterozygous positions)가 제거된 단일형(monomorphic)이며 서열 높이(sequencing depth)가 100 이상의 높은 정도를 가지는 서열 라이브러리를 확인하여, 인간 및 오랑우탄 간의 다른 서열을 가지는 인간/오랑우탄 서열 쌍이 시료-특이적인 기준 서열로서 구축된 것을 확인하였다. 구축된 기준 서열은 425 ISV를 포함하고 248개 유전자에 해당되는 앰플리콘으로 확인되었으며, 남은 70개의 서열은 증폭되지 않은 4개 유전자 및 ISV를 포함하지 않는 66개 유전자인 것으로 확인하였다.We identified a sequence library with heteromorphic, heterozygous positions removed, and a high degree of sequencing depth of at least 100, indicating that human / orangutan sequence pairs with different sequences between human and orangutans It was confirmed that it was constructed as a specific reference sequence. The constructed reference sequence was identified as an amplicon corresponding to 248 genes containing 425 ISVs, and the remaining 70 sequences were identified as 4 genes that were not amplified and 66 genes that did not include ISVs.
<3-2> 다중 경쟁적 PCR로 증폭된 앰플리콘의 리드 확인<3-2> Read Confirmation of Amplicons Amplified by Multiple Competitive PCR
인간 cDNA 및 오랑우탄 gDNA가 혼합된 시료로부터 이들의 비율(fraction)을 계산하기 위해, 다중 경쟁적 PCR로 증폭된 앰플리콘 산물의 리드를 확인하였다.To calculate their fraction from samples mixed with human cDNA and orangutan gDNA, reads of amplicon products amplified by multiple competitive PCR were identified.
구체적으로, 상기 <실시예 2>에서 서열 라이브러리를 구축하고 확인한 앰플리콘 혼합 시료를, 상기 실시예 <3-1>에서 구축한 시료-특이적인 기준 서열과 BLAST 분석하여 비교하였다. 비교 후, 75 bp 이상에서 100% 서열 상동성(sequence identity)을 나타내고, 90% 이상의 쿼리 커버리지를 나타내는 서열을 표적으로 하여, 각각의 앰플리콘 표적에 대해, 인간 mRNA 및 오랑우탄 유전체로부터 유래된 리드의 수를 계수하였다.Specifically, the amplicon mixed sample constructed and identified in the sequence library in <Example 2> was compared with the sample-specific reference sequence constructed in Example <3-1> by BLAST analysis. After comparison, for each amplicon target, a sequence derived from the human mRNA and the orangutan genome, for each amplicon target, was targeted, with a sequence that exhibited 100% sequence identity at 75 bp or greater and at least 90% query coverage. The number was counted.
그 결과, 도 3에서 나타난 바와 같이 I1 내지 I3, F1 내지 F3, G10, G9, G7, G5, G3, G1 및 G0의 시료로부터 hTaq 중합효소로 각각 증폭된 앰플리콘 서열은 기준 서열과 배열되었을 때, 1.4 내지 2.9 백만 리드 범위에서 100% 상동성을 가지는 것을 확인하였으며, fTaq 중합효소로 증폭된 앰플리콘 서열 시료들은 기준 서열과 배열되었을 때 1.2 내지 1.9 백만 리드의 범위에서 100% 상동성을 나타내는 것을 확인하였고, 상기 리드 범위는 전체 서열에서 약 61%에 해당하는 것을 확인하였다(도 3).As a result, as shown in Figure 3 amplicon sequence amplified by hTaq polymerase from the samples of I1 to I3, F1 to F3, G10, G9, G7, G5, G3, G1 and G0, respectively, when aligned with the reference sequence , 100% homology in the 1.4 to 2.9 million read range, and the amplicon sequence samples amplified with fTaq polymerase showed 100% homology in the range of 1.2 to 1.9 million reads when arranged with the reference sequence. It was confirmed that the read range corresponds to about 61% of the total sequence (FIG. 3).
<3-3> 동일 시료에 대한 중합효소(polymerase) 종류에 따른 상관관계의 확인<3-3> Confirmation of Correlation by Polymerase Type for the Same Sample
동일한 시료를 이용하여 본 발명의 방법으로 정량분석을 수행할 때, 중합효소를 포함하는 기술적인 구성에서 결과에 차이를 가져올 수 있는지 확인하기 위해, 서로 다른 시료에서 계산한 앰플리콘의 리드값 간의 피어슨 상관관계(pearson correlation)를 계산하였다. 계산된 피어슨 상관계수는 다양한 시료로부터, 앰플리콘 중합 시 Solg™ hTaq를 중합효소로서 사용한 hTaq 셋트 및 FastStart Taq를 중합효소로서 사용한 fTaq 셋트를 비교하여 나타내었다.When performing quantitative analysis with the method of the present invention using the same sample, Pearson between the read values of amplicons calculated from different samples to see if there could be a difference in the results in the technical configuration including the polymerase. Pearson correlation was calculated. The calculated Pearson correlation coefficients were shown from various samples comparing hTaq sets using Solg ™ hTaq as polymerase and fTaq sets using FastStart Taq as polymerase in amplicon polymerization.
그 결과, 도 4에서 나타난 바와 같이 동일한 시료에 대하여 hTaq 셋트 및 fTaq 셋트 사이의 상관관계를 비교하였을 때는 낮은 상관관계를 가지나, 동일한 중합효소 셋트를 사용하였을 때 다른 시료에서 각각의 앰플리콘의 리드 높이는 매우 높은 상관관계를 가져, 두 중합효소가 서로 다른 증폭성질을 가지나, 시료 간에 각각의 중합효소의 증폭 수행은 지속적인 것을 확인하였다(도 4).As a result, as shown in FIG. 4, when the correlation between the hTaq set and the fTaq set was compared with the same sample, the correlation was low. However, when the same polymerase set was used, the read height of each amplicon in the other sample was With a very high correlation, the two polymerases have different amplification properties, but it was confirmed that the amplification performance of each polymerase between samples was continuous (FIG. 4).
<실시예 4> gDNA 시료를 이용하여 근연종 유전체를 정량 대상 시료에 혼합한 뒤 근연종 서열 간의 유사성을 활용하여 공동 증폭(co-amplification)시킨 경쟁 서열분석(spiking-in neighbor genome-coupled competitive PCR amplicon sequencing, SiNG-PCRseq) 방법의 적용<Example 4> Spike-in neighbor genome-coupled competitive PCR using a gDNA sample mixed with the species related to the quantitative sample and co-amplified using similarity between the species Application of amplicon sequencing and SiNG-PCRseq method
<4-1> PCR 앰플리콘에 대한 인간 서열의 비율(fraction) 계산<4-1> Fraction calculation of human sequence to PCR amplicon
인간 및 오랑우탄의 gDNA 혼합 시료로부터 증폭된 앰플리콘 내에 포함된 인간 유래 서열의 비율을 확인하기 위해, 증폭된 PCR 앰플리콘에 대한 인간 서열의 비율(fraction of human sequence, γHS)을 구하였다.To determine the ratio of human-derived sequences contained in the amplified amplicons from gDNA mixed samples of human and orangutans, the fraction of human sequence (γHS) to the amplified PCR amplicons was determined.
구체적으로, 상기 <실시예 1>에서 제조한 G10, G9, G7, G5, G3, G1 및 G0 시료로부터 상기 <실시예 2> 및 <실시예 3>의 방법으로 hTaq 또는 fTaq을 이용한 다중 경쟁적 PCR을 수행하여 앰플리콘을 증폭하고, 서열 라이브러리의 리드를 정량하였다. 정량한 리드값에서, 동일한 앰플리콘에 대하여 시료 당 200 리드 이하의 평균값을 나타내는 앰플리콘과, G0 및 G10 시료에서 5% 이상의 불순도(impurity)를 나타내는 앰플리콘을 제거하였다. 그런 다음, 각각의 시료 내에서 각각의 앰플리콘에 대한 리드의 정량값을 하기 [수학식 1]에 이용하여, 각각의 앰플리콘 내에 포함된 인간 유래의 서열에 대한 비율(γHS)을 구하였다. 그런 다음, hTaq 셋트 및 fTaq 셋트 각각에 대하여, hTaq 셋트를 기준으로 더 높은 평균 γHS를 나타내는 앰플리콘을 왼편으로부터 시작하여 순서대로 점적하였다. 또한, hTaq 셋트 및 fTaq 셋트로부터, 동일한 서열을 포함하는 앰플리콘의 γHS를 각각 평균하여 산포도(scatter diagram)로 점적하였다.Specifically, multi-competitive PCR using hTaq or fTaq by the method of <Example 2> and <Example 3> from the G10, G9, G7, G5, G3, G1 and G0 samples prepared in <Example 1> Was performed to amplify the amplicons and quantify the reads of the sequence library. From the quantified read values, an amplicon showing an average value of 200 leads or less per sample for the same amplicon and an amplicon showing impurity of 5% or more in the G0 and G10 samples was removed. Then, the quantitative value of the read for each amplicon in each sample was used in the following [Equation 1] to obtain the ratio (γHS) to the human-derived sequence included in each amplicon. Then, for each hTaq set and fTaq set, amplicons showing higher mean γHS based on the hTaq set were dropped in order starting from the left. In addition, from the hTaq set and the fTaq set, γHS of amplicons containing the same sequence were averaged and scattered on a scatter diagram.
[수학식 1][Equation 1]
Figure PCTKR2015009823-appb-I000005
Figure PCTKR2015009823-appb-I000005
그 결과, 도 5 및 도 6에서 나타난 바와 같이, 하나의 시료 내에서 각각의 앰플리콘의 리드값이 일직선을 나타내지 않는, 평평하지 않은 형태의 플롯 패턴(plot pattern)을 나타내며, 이는 서로 다른 양의 인간 서열이 포함된 G10, G9, G7, G5, G3, G1 및 G0 시료에서 동일한 양상을 나타내는 것을 확인하였다(도 5). 이를 통해, 시료 내에서 증폭된 앰플리콘의 상당 부분이 인간 또는 오랑우탄 서열 내의 변이 중 어느 하나에 대해서 직접적으로 정량되어, 절차상의 편향(procedural bias)을 나타내는 것으로 확인하였다(도 5).As a result, as shown in FIGS. 5 and 6, in one sample, a non-flat plot pattern is shown in which the read values of each amplicon do not show a straight line, which is different from each other. It was confirmed that the same pattern in the G10, G9, G7, G5, G3, G1 and G0 samples containing the human sequence (Fig. 5). Through this, a large portion of the amplicon amplified in the sample was directly quantified for any of the mutations in the human or orangutan sequence, confirming that it exhibits a procedural bias (FIG. 5).
아울러, G10, G9, G7, G5, G3, G1 및 G0의 모든 시료 내에서 동일한 서열을 포함하는 각각의 앰플리콘에 대한 γHS의 평균을 산포도로 점적하였을 때, 이들은 1.03의 기울기를 가지는 선형 분포를 가져, hTaq 셋트 및 fTaq 셋트 간에 매우 밀접한 관계를 나타내는 것을 확인하였으며, 이를 통해 상기 확인한 절차상의 편향의 정도(extent) 및 선호성(preference)은 무작위로 나타나는 것이 아니라, 유전자 서열의 변이 구조와 본질적으로 연관된다는 것을 확인하였다(도 6).In addition, when the average of γHS for each amplicon containing the same sequence in all samples of G10, G9, G7, G5, G3, G1, and G0 was scattered, they showed a linear distribution with a slope of 1.03. It was confirmed that there is a very close relationship between the hTaq set and the fTaq set, and thus, the extent and preference of the procedural biases identified above are not random, but are intrinsically related to the mutation structure of the gene sequence. It was confirmed that (Fig. 6).
<4-2> 변이 유형에 따른 γHS의 평균 분포 확인<4-2> Check the average distribution of γHS according to the variation type
SiNG-PCRseq를 통한 정량 분석에 있어서, γHS에서 절차상의 편향이 나타나는 원인을 파악하기 위해, 오랑우탄 서열의 변이에서 나타나는 변이 유형과 편향 간의 관계를 파악하였다.In quantitative analysis using SiNG-PCRseq, in order to determine the cause of procedural bias in γHS, the relationship between the variation type and the bias in the variation of the orangutan sequence was examined.
구체적으로, 상기 <실시예 1>에서 제조한 G10, G9, G7, G5, G3, G1 및 G0 시료로부터 상기 실시예 <4-1>의 방법으로 각각의 시료에서 앰플리콘 내에 포함된 γHS를 구하여, 이를 평균하였다. 그런 다음, 앰플리콘 내에서 오랑우탄의 서열 변이를 확인하여, 강한 수소결합을 나타내는 구아닌(G) 및 시토신(C) 뉴클리오티드에서 변이가 나타난 경우(Stronger; S), 약한 수소결합을 나타내는 아데닌(A) 및 티민(T) 뉴클리오티드에서 변이가 나타난 경우(Weaker; W), 및 수소결합의 정도가 중성으로 변이가 나타난 경우(Neutral; N)로 나누었다. 나뉜 S, W 및 N 군에 대해서, γHS의 평균을 이용하여 스튜던트 t-test로 분석하고, 박스 플롯(box plot)을 그려 분포 정도를 확인하였다. 그 결과, 도 7에서 나타난 바와 같이 박스 플롯은 2.54×10-9의 p 값(p value)을 나타내었고, 변이 서열이 구아닌(G) 및 시토신(C) 잔기에 의해 나타나는 경우(S)에서, 아데닌(A) 및 티민(T) 잔기에 의해 변이가 나타나는 경우(N)에 비해, 상대적으로 높은 γHS 평균을 나타내는 것을 확인하였다(도 7). 이러한 차이는 본 발명의 SiNG-PCRseq를 통한 정량 분석에 있어서, 다수의 PCR 반응 및 정제를 반복하는 과정을 통해 나타날 수 있는 것을 확인하였다.Specifically, from the G10, G9, G7, G5, G3, G1 and G0 samples prepared in <Example 1> by the method of Example <4-1> to obtain the γHS contained in the amplicon in each sample This was averaged. Then, the sequence variation of the orangutan in the amplicon was confirmed, and when the mutations were found in guanine (G) and cytosine (C) nucleotides showing strong hydrogen bonds (Stronger; S), adenine (weak hydrogen bonds) A) and thymine (T) nucleotides were divided (Weaker; W) and the degree of hydrogen bonding was neutral (Neutral; N). The divided S, W and N groups were analyzed by Student's t-test using the average of γHS, and a box plot was drawn to confirm the degree of distribution. As a result, as shown in FIG. 7, the box plot showed a p value of 2.54 × 10 −9 , and in the case where the variant sequence is represented by guanine (G) and cytosine (C) residues (S), It was confirmed that a relatively high γHS average was shown in comparison with the case where the mutation was caused by the adenine (A) and thymine (T) residues (N) (FIG. 7). This difference was confirmed in the quantitative analysis by SiNG-PCRseq of the present invention, it can be seen through the process of repeating a plurality of PCR reactions and purification.
<실시예 5> 시료 내 앰플리콘의 γHS에 대한 편향 보정(Bias correction)Example 5 Bias Correction of γHS of Amplicons in Samples
본 발명의 SiNG-PCRseq를 통한 정량 분석에 있어서, 서열의 변이에서 나타나는 앰플리콘의 γHS에 대한 편향을 보정하기 위해, 하나의 서열 내에서 나타나는 편향의 정도는 전체 앰플리콘의 비율에 비례하는 법칙을 적용하여 계산하였다. 상기 법칙에 있어서, 서열 변이 간에 절차상의 편향이 나타나는 것으로 인해 앰플리콘에서 γHS의 차이가 발생하고, 이는 동일한 몰의 조성을 가지는 두 개의 경쟁 서열을 분석하여 계산된 기준 앰플리콘으로부터 얻은 γHS를 사용하여 효율적으로 보정할 수 있음이 알려져 있어(Jeong, S. et al., Genome Res., 17: 1093-1100, 2007), 이를 본 발명에 적합하게 수정한 공식을 사용하여 계산하였다.In the quantitative analysis using the SiNG-PCRseq of the present invention, in order to correct the deflection of the amplicons for γHS in sequence variation, the degree of deflection in one sequence is proportional to the ratio of the total amplicons. Calculated by application. In the above rule, a procedural bias between sequence variations results in a difference in γHS in the amplicon, which is efficient using γHS obtained from a reference amplicon calculated by analyzing two competing sequences having the same molar composition. (Jeong, S. et al. , Genome Res., 17: 1093-1100, 2007), it was calculated using a formula modified to suit the present invention.
구체적으로, 이전에 공지된 바에 따라 상이한 2개의 경쟁적 서열인 A 및 B가 동일한 몰수로 혼합되어, 동일한 기여도(contribution)를 가지는 단일의 기준 혼합물(reference mixture)로부터, 시료 내 서열 A가 가지는 비율인 γA의 편향을 보정한 비율인 γAc를 구하는 방법인, 하기 [수학식 2]를 이용하였다.Specifically, as previously known, two different competing sequences, A and B, are mixed in the same mole number, from a single reference mixture having the same contribution, which is the ratio of sequence A in the sample. The following formula (2) was used, which is a method for obtaining γA c , which is a ratio of correcting the deflection of γA.
[수학식 2][Equation 2]
Figure PCTKR2015009823-appb-I000006
Figure PCTKR2015009823-appb-I000006
상기 식에서, γA 및 γB는 혼합 시료 내의 증폭된 앰플리콘에 있어서, 서열 A 및 B의 비율을 나타내며; γAH 및 γBH는 A 및 B 서열이 동일한 몰수로 혼합된 기준 시료 내의 증폭된 앰플리콘에 있어서, 서열 A 및 B의 비율을 나타낸다.Wherein γA and γB represent the ratio of sequences A and B in the amplified amplicons in the mixed sample; γA H and γB H represent the ratio of sequences A and B in the amplified amplicons in the reference sample in which the A and B sequences were mixed in equal moles.
본 발명의 SiNG-PCRseq에 있어서, 기준 시료 내의 경쟁 서열인 인간(HS) 및 오랑우탄(PA)의 유전자는, 동일한 몰수로 혼합되지 않았으나 혼합 시료 내의 각각의 서열의 혼합 비율을 알 수 있으므로, 이를 반영하여 상기 [수학식 2]를 수정하여 하기 [수학식 3]을 구하였다.In the SiNG-PCRseq of the present invention, the genes of human (HS) and orangutan (PA), which are competing sequences in the reference sample, are not mixed in the same molar number, but the mixing ratio of each sequence in the mixed sample is known, and thus reflected. By modifying Equation 2 above, Equation 3 was obtained.
[수학식 3][Equation 3]
Figure PCTKR2015009823-appb-I000007
Figure PCTKR2015009823-appb-I000007
상기 식에서, γA 및 γB는 혼합 시료 내의 증폭된 앰프리콘에 있어서, 서열 A 및 B의 비율(fraction)을 나타내며; pA 및 pB는 서열 A 및 B가 상이한 비율로 혼합된 기준 시료 내에 혼합된 혼합비율을 나타내고; γARef 및 γBRef는 서열 A 및 B가 상이한 비율로 혼합된 기준 시료 내의 증폭된 앰플리콘에 있어서, 서열 A 및 B의 비율을 나타낸다.Wherein γA and γB represent the fraction of sequences A and B in the amplified amplicon in the mixed sample; pA and pB represent the mixing ratios mixed in the reference sample in which sequences A and B were mixed in different ratios; γA Ref and γB Ref represent the ratio of sequences A and B to the amplified amplicons in the reference sample in which sequences A and B were mixed at different ratios.
아울러, 본 발명의 SiNG-PCRseq에서는 다중 PCR 증폭을 통해, 기준 시료 및 혼합 시료에 앰플리콘이 다중으로 포함되어 있으므로 각 기준시료의 보정력을 균등하게 반영하기 위하여, 상기 [수학식 3]을 수정하여 하기 [수학식 4]와 같이 나타내었다.In addition, in the SiNG-PCRseq of the present invention, since multiple amplicons are included in the reference sample and the mixed sample through multiple PCR amplification, the equation [3] is modified to equally reflect the correction power of each reference sample. It is shown as Equation 4 below.
[수학식 4][Equation 4]
Figure PCTKR2015009823-appb-I000008
Figure PCTKR2015009823-appb-I000008
그런 다음, 상기 <실시예 1>에서 제조한 G10, G9, G7, G5, G3, G1 및 G0 시료로부터 상기 실시예 <4-1>의 방법으로 각각의 시료에서 앰플리콘 내에 포함된 γHS를 구하고, 이를 상기 [수학식 4]에 적용하여 시료 내 앰플리콘에 대한 γHS의 편향을 보정한, γHSC를 계산하였다. 이때 각 시료의 편향을 보정하기 위해 다른 시료들을 기준시료로 사용하여 보정을 수행하였으며, 다른 시료들에서 pA와 pB값은 해당 시료의 HS 값들의 평균값과 PA의 평균값을 사용하였다. 이는 각 앰플리콘들의 편향이 사람과 오랑우탄 염기서열에 대해 무작위적이고 일어나며 두 근연종에서 변이 서열의 종류가 무작위적이며, 이러한 앰플리콘의 수가 충분히 많기 때문에 이들의 평균값은 결국 최초 두 gDNA의 혼합율과 근사할 것이기 때문이다. 계산한 γHSC는 더 높은 평균 γHSC를 나타내는 앰플리콘을 왼편으로부터 시작하여 순서대로 점적하였다. 또한, 계산한 γHSC에 대해서, 평균 제곱근 편차(Root Mean Square Deviation; RMSD)를 구하여 산포도로 나타내었다.Then, from the G10, G9, G7, G5, G3, G1 and G0 samples prepared in <Example 1> by using the method of Example <4-1> to obtain the γHS contained in the amplicon in each sample ΓHS C was calculated by correcting the deflection of γHS with respect to the amplicon in the sample by applying it to Equation 4 above. At this time, in order to correct the deflection of each sample, the calibration was performed using other samples as reference samples, and the pA and pB values of the other samples used the average value of the HS values of the corresponding samples and the average value of PA. This is because the bias of each amplicon is random with respect to human and orangutan sequences, and the types of mutation sequences in the two species are random, and since the number of these amplicons is large enough, the average value of the amplicons is eventually mixed and approximated by the first two gDNAs. Because I will. The calculated γHS C was instilled in sequence, starting from the left, with amplicons showing a higher mean γHS C. In addition, the root mean square deviation (RMSD) of the calculated γHS C was obtained and shown in a scatter diagram.
그 결과, 도 8에서 나타난 바와 같이, 모든 시료에서 γHS에서 나타난 편향을 보정한 γHSC를 확인하였고, 이는 γHS에서 나타난 편향을 효율적으로 보정하여 균등하게 배열된 정량 패턴을 나타내는 것을 확인하였다(도 8). 또한, 각각의 시료에 대한 γHSC의 변동계수(coefficient of variation)는 0.1 내지 0.2 범위를 가지는 것을 확인하였고, hTaq 셋트 및 fTaq 셋트 간의 γHSC 값을 비교하였을 때, 기대되는 값으로부터 벗어난 정도가 hTaq 셋트에서는 미미하나, fTaq 셋트에서는 리드의 깊이에 의해 많은 영향을 받는 것을 확인하였다(도 9).As a result, as shown in Figure 8, in all the samples confirmed the γHS C corrected for the deflection shown in γHS, it was confirmed that this shows a uniformly arranged quantitative pattern by efficiently correcting the deflection shown in γHS (Fig. 8). ). In addition, it was confirmed that the coefficient of variation of γHS C for each sample was in the range of 0.1 to 0.2. When comparing γHS C values between the hTaq set and the fTaq set, the degree of deviation from the expected value was hTaq. Although it is insignificant in the set, it is confirmed that the fTaq set is greatly influenced by the depth of the lead (FIG. 9).
<실시예 6> 시료 내 경쟁서열에 대한 인간 서열의 상대적인 존재비(relative abundance of HS to PA; RAH/P)의 산출Example 6 Calculation of relative abundance of HS to PA (RAH / P) of the human sequence relative to the competition sequence in the sample
앰플리콘에 따른 편향을 보정한 γHSCs 값으로부터 상염색체(autosomal) 서열 및 남성 유전체에 따른 X-연관된 서열인 성염색체 서열 사이에서 나타날 수 있는 카피 수의 차이를 반영하기 위해, 정량한 값을 시료 내 경쟁서열인 오랑우탄 서열에 대한 인간 서열의 상대적인 존재비(relative abundance of HS to PA; RAH/P)를 구하였다.The quantitative value is used to reflect the difference in the number of copies that can appear between the autosomal sequence and the sex chromosomal sequence, which is an X-associated sequence according to the male genome, from the γHS C s corrected bias according to amplicons. The relative abundance of HS to PA (RA H / P ) of the human sequence with respect to the orangutan sequence, which is a competition sequence in the sample, was obtained.
구체적으로, 상기 <실시예 1>에서 제조한 G9, G7, G5, G3 및 G1 시료로부터 상기 실시예 <4-2>의 방법으로 γHSC를 계산하였다. 그런 다음, 계산한 γHSC를 하기 [수학식 5]에 적용하여, 각각의 시료에서 인간 서열의 상대적인 존재비(RAH/P)로 전환하였다.Specifically, γHS C was calculated by the method of Example <4-2> from the G9, G7, G5, G3 and G1 samples prepared in <Example 1>. Then, the calculated γHS C was applied to Equation 5 below to convert the relative abundance of human sequences (RA H / P ) in each sample.
[수학식 5][Equation 5]
Figure PCTKR2015009823-appb-I000009
Figure PCTKR2015009823-appb-I000009
<실시예 7> 시료 내 인간 서열의 표준화된 정량값의 산출Example 7 Calculation of Standardized Quantitative Values of Human Sequences in Samples
시료 내 경쟁서열인 오랑우탄 서열에 대한 인간 서열의 상대적인 존재비(RAH/P)로부터 최종적인 정량 값을 산출하기 위하여, 표준화된 정량값(standardized quantities, StQs)을 구하였다.In order to calculate the final quantitative value from the relative abundance ratio (RA H / P ) of the human sequence to the orangutan sequence, the competition sequence in the sample, standardized quantities (StQs) were obtained.
구체적으로, 상기 <실시예 1>에서 제조한 G9, G7, G5, G3 및 G1 시료로부터 상기 <실시예 6>의 방법으로 각각의 시료에서 인간 서열의 상대적인 존재비(RAH/P)를 산출하였다. 그런 다음, RAH/P 정량값은 하기 [수학식 6]에 적용하여, 각각의 시료에서 모든 상염색체 서열의 평균 RAH/P에 대한 상대적인 정량값으로서 표준화된 정량값(StQs)을 수득하였다.Specifically, from the G9, G7, G5, G3 and G1 samples prepared in Example 1, the relative abundance ratio (RA H / P ) of the human sequences in each sample was calculated by the method of <Example 6>. . Then, RA H / P quantitative values were applied to Equation 6 below to obtain standardized quantitative values (StQs) as relative quantitative values for the average RA H / P of all autosomal sequences in each sample. .
[수학식 6][Equation 6]
Figure PCTKR2015009823-appb-I000010
Figure PCTKR2015009823-appb-I000010
<실시예 8> 다양한 인간 세포주 유래 cDNA 시료에 대한 SiNG-PCRseq 방법의 적용Example 8 Application of SiNG-PCRseq Method to Various Human Cell Line-derived cDNA Samples
<8-1> 인간 섬유아세포주(Fib) 및 유도만능줄기세포주(iPSC) 유래의 cDNA 시료로부터 SiNG-PCRseq 방법을 적용한 cDNA 정량<8-1> Determination of cDNA by SiNG-PCRseq Method from cDNA Samples Derived from Human Fibroblast Line and Fibroblast Cell Line
본 발명의 SiNG-PCRseq 방법이 다양한 세포주 유래의 cDNA 시료에서 동일하게 효과적으로 적용할 수 있는지 확인하기 위해, Fib 및 iPSC로부터 합성한 cDNA 시료를 SiNG-PCRseq 방법에 적용하였다.In order to confirm that the SiNG-PCRseq method of the present invention can be equally effectively applied to cDNA samples derived from various cell lines, cDNA samples synthesized from Fib and iPSC were applied to the SiNG-PCRseq method.
구체적으로, 상기 <실시예 1>에서 제조하고 오랑우탄 유래의 gDNA와 혼합한, F1 내지 F3의 인간 Fib cDNA 시료 및 I1 내지 I3의 인간 iPSC cDNA 시료를 준비하였다. 그런 다음, 상기 cDNA 시료는 각각 hTaq 또는 fTaq 중합효소를 사용하면서 상기 <실시예 3> 내지 <실시예 7>의 방법으로 SiNG-PCRseq 방법을 수행하여, 각각의 시료 내에서 증폭된 앰플리콘에 대한 γHS, γHSC, RAH/P 및 StQs 값을 구하였다. 편향의 보정을 위한 γHSC 값의 계산 시, 각 시료 내에서 200 이하의 리드값을 나타내는 앰플리콘 및, 배열된 서열이 gDNA 서열과 상이한 앰플리콘을 제외하여 정량분석을 수행하였다. 계산 후에는, iPSC 시료 및 Fib 시료에 대하여, hTaq 셋트를 기준으로 더 높은 RAH/P 및 StQs 값을 나타내는 앰플리콘을 왼편으로부터 시작하여 순서대로 점적하였다.Specifically, human Fib cDNA samples F1 to F3 and human iPSC cDNA samples I1 to I3 prepared in Example 1 and mixed with gDNA derived from orangutans were prepared. Then, the cDNA samples were subjected to the SiNG-PCRseq method using the method of <Example 3> to <Example 7> using hTaq or fTaq polymerase, respectively, for the amplicons amplified in each sample. γHS, γHS C , RA H / P and StQs values were determined. In calculating the γHS C value for the correction of bias, quantitative analysis was performed excluding amplicons having a read value of 200 or less in each sample, and amplicons whose sequence was different from the gDNA sequence. After the calculations, for iPSC samples and Fib samples, amplicons showing higher RA H / P and StQs values based on the hTaq set were dropped in order starting from the left.
그 결과, 도 11 및 하기 [표 6]에서 나타난 바와 같이, hTaq 셋트의 iPSC 및 Fib 세포 cDNA 시료에서 각각 RAH/P 값을 계산하였을 때, 각각의 시료에 혼합된 DNA에 대한 조성을 유의적으로 반영하는 것을 확인하였다(도 11a 및 표 5). 이로부터 각각의 시료 내 앰플리콘에 대한 StQ 값을 계산하였을 때, 0.05 이상의 RAH/P를 나타내는 넓은 정량 범위를 넘는 3 개의 시료로부터 수득된 iPSC 또는 Fib 시료의 StQ 값은 동일한 산포도의 분포를 가져, 이를 표준된 정량값으로서 각각 사용할 수 있음을 확인하였다(도 11b). 또한, F1 내지 F3, 또는 I1 내지 I3의 각 세포주에 대한 3개의 시료로부터 계산한 StQs 값의 평균을 구하여, 각각의 앰플리콘에 대한 변동계수(coefficient of variation, CV)를 나타내었을 때, iPSC 시료는 0.13의 CV 평균을 나타내고, Fib 시료는 0.16의 CV 평균을 나타내는 것을 확인하여, 0.5 미만 값의 낮은 유지양상을 확인하였다(도 11c). 이러한 결과를 통해, 본 발명의 SiNGPCR-seq 방법은 cDNA 시료 내에서 증폭된 gDNA의 비율을 보정함으로써 다양한 시료에 대해 사용할 수 있음을 확인하였다.As a result, as shown in Figure 11 and Table 6 below, when the RA H / P value of each of the hTaq set iPSC and Fib cell cDNA samples were calculated, the composition of the DNA mixed in each sample significantly It was confirmed to reflect (FIG. 11A and Table 5). From this, when the StQ values for the amplicons in each sample were calculated, the StQ values of the iPSC or Fib samples obtained from three samples over a wide quantitative range representing RA H / P of 0.05 or more had the same distribution of scatter. It was confirmed that these can be used as standard quantitative values, respectively (FIG. 11B). In addition, when the average of StQs values calculated from three samples for each cell line of F1 to F3 or I1 to I3 was obtained, the iPSC sample was obtained when the coefficient of variation (CV) for each amplicon was shown. Indicates a CV average of 0.13, and the Fib sample confirmed a CV average of 0.16, confirming a low maintenance pattern of less than 0.5 (FIG. 11C). Through these results, it was confirmed that the SiNGPCR-seq method of the present invention can be used for various samples by correcting the ratio of amplified gDNA in the cDNA sample.
표 6
시료명 RAH/P 평균 (Std)
섬유아세포주 F1 3.0
F2 1.1
F3 0.5
유도만능줄기세포주 I1 5.3
I2 2.5
I3 1.0
Table 6
Sample name RA H / P Average (Std)
Fibroblast line F1 3.0
F2 1.1
F3 0.5
Induced pluripotent stem cell line I1 5.3
I2 2.5
I3 1.0
<8-2> 다양한 인간 세포주 유래 cDNA 시료 정량에 있어서 중합효소에 따른 영향 확인<8-2> Confirmation of the Effect of Polymerase on Quantification of cDNA Samples from Various Human Cell Lines
본 발명의 SiNG-PCRseq 방법을 이용하여 다양한 세포주 유래의 cDNA 시료를 정량함에 있어서, hTaq 및 fTaq 중합효소의 차이로 인해 정량값의 지속적인 재생산성(consistent representation)이 유지될 수 있는지 확인하였다.In quantifying cDNA samples from various cell lines using the SiNG-PCRseq method of the present invention, it was confirmed whether the continuous representation of the quantitative value could be maintained due to the difference in the hTaq and fTaq polymerases.
구체적으로, 상기 실시예 <8-1>의 방법을 사용하여 Fib 및 iPSC 세포 cDNA에 대한 각각의 시료 내에서 증폭된 앰플리콘에 대한 γHS, γHSC, RAH/P 및 StQs 값을 구하였다. fTaq 셋트에서는 증폭된 앰플리콘 서열의 리드값이 높지 않아, γHS 계산에 있어서 배제되는 경우가 많았고, 이에 따라 hTaq에서는 365 개의 앰플리콘(86%)을 대상으로 하는 반면, fTaq 셋트에서는 240 개의 앰플리콘(56%) 만을 대상으로 하여 정량 분석을 수행하였다. 그런 다음, I1 내지 I3로부터 계산된 StQs 값 또는 F1 내지 F3로부터 계산된 StQs 값을 이용하여 hTaq 셋트 및 fTaq 셋트 간의 연관관계를 회귀분석하여 산포도로서 나타내었다.Specifically, γHS, γHS C , RA H / P and StQs values for amplicons amplified in each sample for Fib and iPSC cell cDNA were obtained using the method of Example <8-1>. In the fTaq set, the read value of the amplified amplicon sequence was not high and was often excluded in the calculation of γHS. Therefore, the hTaq targets 365 amplicons (86%), whereas in the fTaq set, 240 amplicons are used. Quantitative analysis was performed on only (56%). Then, the relationship between the hTaq set and the fTaq set using the StQs value calculated from I1 to I3 or the StQs value calculated from F1 to F3 is shown as a scatter plot.
그 결과, 도 12에서 나타난 바와 같이, hTaq 셋트 및 fTaq 셋트 모두의 대상이 된 240 개의 앰플리콘에 있어서, iPSC 시료의 StQs 값에서는 1.02의 회귀 기울기를 가지고 0.93의 상관관계(R2)를 나타내었고, Fib 시료의 StQs 값에서는 0.93의 회귀 기울기 및 0.92의 R2 값을 나타내는 것을 확인하여, 이는 다른 반응 조건 하에 서도 본 발명의 SiNG-PCRseq 방법이 재생산적으로 사용될 수 있음을 확인하였다(도 12).As a result, as shown in FIG. 12, in the 240 amplicons targeted to both the hTaq set and the fTaq set, the StQs value of the iPSC sample showed a correlation (R 2 ) of 0.93 with a regression slope of 1.02. , StQs value of the Fib sample was confirmed to show a regression slope of 0.93 and R 2 value of 0.92, which confirmed that the SiNG-PCRseq method of the present invention can be used reproducibly under other reaction conditions (Fig. 12). .
<실시예 9> 본 발명의 SiNG-PCRseq 방법 및 기존의 정량 방법 간의 정량 분석 차이 확인Example 9 Confirmation of Quantitative Differences Between SiNG-PCRseq Methods and Existing Quantitative Methods of the Present Invention
본 발명의 SiNG-PCRseq 방법을 통한 시료 내 유전자의 정량 결과가, 기존에 사용되고 있는 유전자 정량 방법과 차이를 나타내는지 확인하기 위해, RNA-서열을 기반으로 하는 서열간 정량분석법을 통한 시료 내 유전자 정량 분석을 수행하여, 비교하였다.In order to check whether the quantification result of a gene in a sample using the SiNG-PCRseq method of the present invention is different from a conventional quantification method of a gene, a quantification of a gene in a sample through an inter-sequence quantitative analysis based on RNA-sequence Analyzes were performed and compared.
구체적으로, 상기 실시예 <8-1>의 방법을 사용하여 Fib 및 iPSC 세포 cDNA에 대한 각각의 시료 내에서 증폭된 앰플리콘에 대한 γHS, γHSC, RAH/P 및 StQs 값을 구하였다. 또한, 상기와 동일한 시료에 대하여, 기존에 보고된 방법을 따라 RNA-서열간의 상대적인 정량 분석(RNA-seq)을 통한 시료 내 DNA 정량분석을 수행하였다(Blomquist, T.M. et al. PLoS ONE 8, e79120, 2013). 앰플리콘 증폭을 위한 cDNA의 주형인 기준 mRNA에 대한 리드값(read count)은 RPKM(Read Per Kilobase per Millon mapped reads) 값을 얻기 위해 길이로 표준화하였다. 그런 다음, 커널 밀도 플롯(kernel density plot)을 나타내어, SiNG-PCRseq 방법 및 RNA-seq 방법을 통한 시료간 정량적 분석 비교에 의한 유의도를 나타내었다.Specifically, γHS, γHS C , RA H / P and StQs values for amplicons amplified in each sample for Fib and iPSC cell cDNA were obtained using the method of Example <8-1>. In addition, for the same sample as described above, DNA quantitative analysis was performed in a sample through relative quantitative analysis (RNA-seq) between RNA-sequences according to a previously reported method (Blomquist, TM et al. PLoS ONE 8, e79120). , 2013). Read counts for reference mRNA, a template of cDNA for amplicon amplification, were normalized to length to obtain RPKM (Read Per Kilobase per Millon mapped reads) values. Then, a kernel density plot was shown, indicating the significance by comparing the quantitative analysis between samples by SiNG-PCRseq method and RNA-seq method.
상기 두 방법에 의한 결과를 효과적으로 비교하기 위해, SiNG-PCRseq 방법에 있어서 모든 시료 내에서 0.05 이상의 RAH/P를 나타내고, RNA-seq 방법에 있어서 40 이상의 리드값을 나타내는 유전자를 대상으로 하였다. RAH/P 및 RPKM 값은 선별된 유전자를 대상으로 구한 값을 평균하여 표준화한 값을 사용하였다.In order to effectively compare the results of the two methods, genes exhibiting RA H / P of 0.05 or more in all samples in the SiNG-PCRseq method and 40 or more read values in the RNA-seq method were used. RA H / P and RPKM values were standardized by averaging the values obtained from the selected genes.
그 결과, 도 13에서 나타난 바와 같이, RNA-seq 방법 및 SiNG-PCRseq 방법을 통해 정량된 StQs 값의 유전자-대-유전자 상관관계(R2)는 iPSC 및 Fib에서 각각 0.46 및 0.43을 나타내는 것을 확인하였으며, 이는 상기 두 방법 사이에 서로 다른 염기서열간의 상대적 정량값을 얻는데 상당한 차이가 있는 것을 나타내주며 이는 전적으로 RNA-seq의 서로 다른 염기서열의 편향성 보정이 이루어지지 않는데 기인한다(도 13a). iPSC 및 Fib의 두 세포주 사이에서 표준 정량값(StQs)의 차이는 0.88의 R2 값을 가져 서로 다른 시료간 동일 염기서열의 비교정량에선 두 방법 사이에 큰 차이가 없음을 확인하였다(도 13b).As a result, as shown in FIG. 13, the gene-to-gene correlation (R 2 ) of StQs values quantified by the RNA-seq method and the SiNG-PCRseq method showed 0.46 and 0.43 in iPSC and Fib, respectively. This indicates that there is a significant difference in obtaining a relative quantitative value between different sequences between the two methods, which is due to the lack of bias correction of the different sequences of RNA-seq entirely (FIG. 13A). The difference in the standard quantitative value (StQs) between the two cell lines of iPSC and Fib has an R 2 value of 0.88, confirming that there is no significant difference between the two methods in comparative quantification of the same sequence between different samples (FIG. 13B). .

Claims (10)

  1. i) 대상 유래 세포에서 분리된 DNA 또는 cDNA와, 이의 근연종 유래 세포에서 분리된 유전체 DNA(genomic DNA, gDNA)를 혼합하는 단계;i) mixing DNA or cDNA isolated from a subject-derived cell with genomic DNA (gDNA) isolated from a cell of the species of late myelin;
    ii) 상기 단계 i)에서 혼합한 DNA 시료를 다중 PCR(multiplex PCR)로 증폭하여 앰플리콘(amplicon)을 수득하는 단계;ii) amplifying the DNA sample mixed in step i) by multiplex PCR to obtain an amplicon;
    iii) 상기 단계 ii)에서 수득한 앰플리콘을 포함하는 서열 라이브러리를 서열분석(sequencing)한 후, 이를 기준 서열(reference sequence)과 비교하여 리드값(read count)을 정량하는 단계;iii) sequencing the sequence library comprising the amplicons obtained in step ii) and comparing it with a reference sequence to quantify the read count;
    iv) 상기 단계 iii)에서 정량한 리드값을 이용하여, 상기 단계 ii)에서 수득한 앰플리콘에 대한 대상 서열(서열 A)의 비율값(fraction of sequence, γA)을 산출하는 단계;iv) calculating the fraction of sequence (γA) of the target sequence (SEQ ID A) to the amplicon obtained in step ii) using the read value quantified in step iii);
    v) 상기 단계 iv)에서 산출한 비율값(γA)을 이용하여, 절차상의 편향을 보정한 편향 보정값(γAc)을 계산하는 단계;v) calculating a deflection correction value γ A c that corrects procedural deflection using the ratio value γ A calculated in step iv);
    vi) 상기 단계 v)에서 산출한 편향 보정값(γAc)으로부터 근연종 서열(서열 B)에 대한 대상 서열(서열 A)의 상대적인 존재비(relative abundance of A to B; RAA/B)를 산출하는 단계; 및vi) Calculating the relative abundance of A to B (RA A / B ) of the target sequence (SEQ ID NO: A) to the near-missing sequence (SEQ ID NO: B) from the bias correction value γ A c calculated in step v). Doing; And
    vii) 상기 단계 vi)에서 산출한 상대적인 존재비(RAA/B)를 이의 대표값으로 나누어 표준화된 정량값(standardized quantities, StQs)을 산출하는 단계를 포함하는, 시료 내 핵산의 표준화 정량 방법.vii) calculating the standardized quantities (StQs) by dividing the relative abundance ratio (RA A / B ) calculated in step vi) by the representative value thereof.
  2. 제1항에 있어서, 상기 단계 i)의 대상은 인간이고, 이의 근연종은 오랑우탄, 침팬지, 원숭이 또는 고릴라인 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법.The method of claim 1, wherein the subject of step i) is a human, and a related species thereof is an orangutan, a chimpanzee, a monkey, or a gorilla.
  3. 제1항에 있어서, 상기 단계 i)의 세포는 림프아세포(lymphoblast), 섬유아세포(fibroblast) 또는 유도만능줄기세포(induced pluripotent stem cells, iPSC)인 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법.The method of claim 1, wherein the cells of step i) are lymphocytes, fibroblasts, or induced pluripotent stem cells (iPSCs). .
  4. 제1항에 있어서, 상기 단계 ii)의 다중 PCR은 다중 경쟁적 PCR(multiple competitive PCR)인 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법.The method of claim 1, wherein the multiple PCR of step ii) is multiple competitive PCR.
  5. 제1항에 있어서, 상기 단계 iv)의 비율값(γA)은 하기 [수학식 7]를 이용하여 산출하는 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법;The method of claim 1, wherein the ratio value γA of step iv) is calculated using Equation 7 below;
    [수학식 7][Equation 7]
    Figure PCTKR2015009823-appb-I000011
    .
    Figure PCTKR2015009823-appb-I000011
    .
  6. 제1항에 있어서, 상기 단계 v)의 편향 보정값(γAc)은 하기 [수학식 4]를 이용하여 산출하는 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법;The method of claim 1, wherein the deflection correction value γAc of step v) is calculated using Equation 4 below;
    [수학식 4][Equation 4]
    Figure PCTKR2015009823-appb-I000012
    Figure PCTKR2015009823-appb-I000012
    상기 식에서, γA 및 γB는 혼합 시료 내의 증폭된 앰플리콘에 있어서, 대상 서열 A 및 근연종 서열 B의 비율(fraction)을 나타내며;Wherein γA and γB represent the fraction of the target sequence A and the related species B in the amplified amplicons in the mixed sample;
    pA 및 pB는 대상 서열 A 및 근연종 서열 B가 상이한 몰수로 혼합된 기준 시료 내에 혼합된 혼합비율을 나타내고;pA and pB represent the mixing ratios mixed in the reference sample in which the target sequence A and the related species sequence B were mixed in different moles;
    γARef 및 γBRef는 대상 서열 A 및 근연종 서열 B가 상이한 몰수로 혼합된 기준 시료 내의 증폭된 앰플리콘에 있어서, 대상 서열 A 및 근연종 서열 B의 비율을 나타내며;γA Ref and γB Ref represent the ratio of the subject sequence A and the related species B to the amplified amplicons in the reference sample in which the target sequence A and the related species B are mixed in different moles;
    n은 1 이상의 정수임.n is an integer of 1 or more.
  7. 제1항에 있어서, 상기 단계 vi)의 상대적인 존재비(RAA/B)는 하기 [수학식 8]를 이용하여 산출하는 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법;According to claim 1, The relative abundance ratio (RA A / B ) of the step vi) is characterized in that calculated using the following Equation 8, standardized quantification method of nucleic acid in a sample;
    [수학식 8][Equation 8]
    Figure PCTKR2015009823-appb-I000013
    .
    Figure PCTKR2015009823-appb-I000013
    .
  8. 제1항에 있어서, 상기 단계 vii)의 표준화된 정량값(standardized quantities, StQs)은 하기 [수학식 9]를 이용하여 산출하는 것을 특징으로 하는, 시료 내 핵산의 표준화 정량 방법;The method of claim 1, wherein the standardized quantities (StQs) of step vii) are calculated using the following Equation (9);
    [수학식 9][Equation 9]
    Figure PCTKR2015009823-appb-I000014
    Figure PCTKR2015009823-appb-I000014
    상기 식에서, 시료 내 대표 RAA/B 값은 RAA/B의 평균, 특정 염기서열의 RAA/B 또는 일부 염기서열의 RAA/B 평균값 중 어느 하나임.Wherein RA represents in the sample A / B value of RA A / B average, RA of the specific base sequence A / B is one or any of the RA A / B average value of some sequences.
  9. i) 제1항의 표준화 정량 방법을 이용하여 대상 유래의 서로 다른 유전자들 사이의 표준화된 정량값을 산출하는 단계; 및i) calculating a standardized quantitative value between different genes of a subject using the standardized quantitative method of claim 1; And
    ii) 상기 단계 i)의 표준화 정량값을 통해 대상인 하나의 종 내의 서로 다른 유전자 간의 표준화된 비율을 계산하는 단계를 포함하는, 동일 종 내의 서로 다른 유전자 간의 표준화된 비율에 대한 정보를 제공하는 방법.ii) calculating a standardized ratio between different genes in one species of interest through the standardized quantitative value of step i).
  10. i) 상기 본 발명의 표준화 정량 방법을 이용하여, 질병이 있거나 발병이 의심되는 개체로부터 분리한 대상 세포에 대해, 상기 질병 또는 질병으로 이행하는 과도기적 특성을 가진 질병적 상태를 특징하는 유전자 전사체들 또는 유전체 내 대상 염기서열들의 표준화된 정량값을 산출하는 단계; 및i) Gene transcripts which are characterized by a disease state having a transitional characteristic to the disease or disease, for the target cell isolated from the diseased or suspected disease, using the standardized quantitative method of the present invention Or calculating a standardized quantity of target sequences in the genome; And
    ii) 상기 단계 i)의 표준화 정량값을 동종의 정상 개체로부터 분리한 세포의 전사체 또는 유전체로부터 산출된 동일한 염기서열의 표준화된 정량값과 비교하는 단계를 포함하는, 상기 유전자 관련 질병을 진단하기 위한 정보를 제공하는 방법.ii) diagnosing the gene related disease comprising comparing the standardized quantitative value of step i) with a standardized quantitative value of the same nucleotide sequence generated from a transcript or genome of a cell isolated from a homogenous normal individual. How to provide information for.
PCT/KR2015/009823 2014-11-14 2015-09-18 Standardized quantitative analysis method for nucleic acid, applying sing-pcrseq method WO2016076524A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020140158851A KR101738481B1 (en) 2014-11-14 2014-11-14 A method of standarized nucleic acid quantitation through spiking-in neighbor genome-coupled competitive PCR amplicon sequencing
KR10-2014-0158851 2014-11-14

Publications (1)

Publication Number Publication Date
WO2016076524A1 true WO2016076524A1 (en) 2016-05-19

Family

ID=55954574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2015/009823 WO2016076524A1 (en) 2014-11-14 2015-09-18 Standardized quantitative analysis method for nucleic acid, applying sing-pcrseq method

Country Status (2)

Country Link
KR (1) KR101738481B1 (en)
WO (1) WO2016076524A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111564178A (en) * 2020-04-15 2020-08-21 圣湘生物科技股份有限公司 Method, apparatus, device and storage medium for generating gene polymorphism analysis report

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117643245B (en) * 2023-12-26 2024-06-18 中国农业科学院烟草研究所 Method for preventing and controlling diseases and insect pests of flue-cured tobacco seedlings

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120164652A1 (en) * 2010-12-27 2012-06-28 Ibis Biosciences, Inc. Quantitating high titer samples by digital pcr

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120164652A1 (en) * 2010-12-27 2012-06-28 Ibis Biosciences, Inc. Quantitating high titer samples by digital pcr

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BLOMQUIST ET AL.: "Targeted RNA-sequencing with competitive multiplex-PCR amplicon libraries", PLOS ONE, vol. 8, no. Issue 11, 2013, pages 1 - 14 *
JEONG ET AL.: "Accurate measurement of the relative abundance of different DNA species in complex DNA mixtures", DNA RESEARCH, vol. 19, 2012, pages 209 - 217 *
JEONG ET AL.: "Accurate quantitation of allele-specific expression patterns by analysis of DNA melting", GENOME RESEARCH, vol. 17, 2007, pages 1093 - 1100 *
OH ET AL.: "J-07) SiNG-PCRseq: Accurate inter sequence quantification achieved by spiking-in neighbor genome for competitive PCR amplicon sequencing", 2014 KOGO 23RD ANNUAL CONFERENCE INTEGRATIVE PRECISION GENOMICS , THE KOREA SCIENCE TECHNOLOGY CENTER, 18 September 2014 (2014-09-18), Seoul, Korea *
OSHLACK ET AL.: "Transcript length bias in RNA-seq data confounds systems biology", BIOLOGY DIRECT, vol. 4, 2009, pages 1 - 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111564178A (en) * 2020-04-15 2020-08-21 圣湘生物科技股份有限公司 Method, apparatus, device and storage medium for generating gene polymorphism analysis report
CN111564178B (en) * 2020-04-15 2023-07-21 圣湘生物科技股份有限公司 Method, device, equipment and storage medium for generating gene polymorphism analysis report

Also Published As

Publication number Publication date
KR101738481B1 (en) 2017-05-22
KR20160057792A (en) 2016-05-24

Similar Documents

Publication Publication Date Title
WO2013041031A1 (en) Gene copy number variation measurement method
WO2016076672A1 (en) Method for detecting off-target site of genetic scissors in genome
Orlando et al. True single-molecule DNA sequencing of a pleistocene horse bone
WO2016111546A2 (en) Endonuclease targeting blood coagulation factor viii gene and composition for treating hemophilia comprising same
WO2021107676A1 (en) Artificial intelligence-based chromosomal abnormality detection method
WO2012081898A2 (en) Marker for predicting stomach cancer prognosis and method for predicting stomach cancer prognosis
WO2016167408A1 (en) Method for predicting organ transplant rejection using next-generation sequencing
WO2013019075A2 (en) Method of preparing nucleic acid molecules
WO2013133680A1 (en) Composition for hot-start reverse transcription reaction or hot-start reverse transcription polymerase chain reaction
WO2018169145A1 (en) System for predicting post-surgery prognosis or anticancer drug compatibility of advanced gastric cancer patients
WO2016076524A1 (en) Standardized quantitative analysis method for nucleic acid, applying sing-pcrseq method
WO2016144136A1 (en) Method for separating nucleic acids from fepe tissue
WO2022097844A1 (en) Method for predicting survival prognosis of pancreatic cancer patients by using gene copy number variation information
WO2024043743A1 (en) Composition for amplifying flt3 gene, and uses thereof
Ohnishi et al. Spontaneous mutagenesis in haploid and diploid Saccharomyces cerevisiae
WO2017094941A1 (en) Method for determining copy-number variation in sample comprising mixture of nucleic acids
WO2013105801A1 (en) Probe for typing chronic myelogenous leukemia fusion genotype, primer and method for using same
WO2018174575A1 (en) Composition for diagnosing insulin resistance and use thereof
WO2023234659A1 (en) Genetic markers for diagnosis or prognosis prediction of degenerative temporomandibular joint osteoarthritis and use thereof
WO2015105336A1 (en) Method for testing mutant gene through real-time polymerase chain reaction using dna polymerase with inhibited activity of 5&#39;-flap endonuclease
Fukushima et al. Incomplete erasure of histone marks during epigenetic reprogramming in medaka early development
WO2021034034A1 (en) Method for detecting chromosomal abnormality by using information about distance between nucleic acid fragments
WO2021075750A1 (en) Self-priming and replicating hairpin adaptor for constructing ngs library, and method for constructing ngs library using same
WO2022108407A1 (en) Method for diagnosing cancer and predicting prognosis by using length ratio of nucleic acids
WO2018021636A1 (en) Human haplotyping system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15858856

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15858856

Country of ref document: EP

Kind code of ref document: A1