WO2014023167A1 - METHOD AND SYSTEM FOR DETECTING α-GLOBIN GENE COPY NUMBER - Google Patents

METHOD AND SYSTEM FOR DETECTING α-GLOBIN GENE COPY NUMBER Download PDF

Info

Publication number
WO2014023167A1
WO2014023167A1 PCT/CN2013/080223 CN2013080223W WO2014023167A1 WO 2014023167 A1 WO2014023167 A1 WO 2014023167A1 CN 2013080223 W CN2013080223 W CN 2013080223W WO 2014023167 A1 WO2014023167 A1 WO 2014023167A1
Authority
WO
WIPO (PCT)
Prior art keywords
primer
sequencing
globin gene
gene
nucleic acid
Prior art date
Application number
PCT/CN2013/080223
Other languages
French (fr)
Chinese (zh)
Inventor
陈仕平
李剑
张现东
甄贺富
陈彩粉
张涛
王俊
Original Assignee
深圳华大基因健康科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因健康科技有限公司 filed Critical 深圳华大基因健康科技有限公司
Publication of WO2014023167A1 publication Critical patent/WO2014023167A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • This invention relates to the field of biomedicine and, in particular, to methods, primer compositions, label compositions and systems for alpha globin gene copy number. Background technique
  • Thalassemia (hereinafter referred to as thalassemia) is a common hemolytic monogenic genetic disease, which occurs in the Middle East, Central Asia, Africa, Southeast Asia and southern China.
  • the molecular mechanism leading to thalassemia is: Defects in the occurrence of the globin gene cause one or more of the peptide chains encoded by it to be reduced or deleted, resulting in an imbalance in the composition of hemoglobin, which in turn leads to hemoglobin instability.
  • thalassemia is mainly divided into e thalassemia and ⁇ thalassemia.
  • the thalassemia is due to the deletion of the e-globin gene, partly due to mutations, of which more than 90% are deleted; most of the ⁇ -thalassemia is due to mutations, small insertions or deletions of the ⁇ -globin gene, and partly It is caused by a large fragment deletion or a repetition of the ⁇ -globin gene.
  • the X-globin gene contains two HBA1 genes and two ⁇ 2 genes, and its gene copy number variation (deletion and duplication) not only causes (thalassaemia also leads to beta thalassemia, so detection (globin gene copy number to ground) The diagnosis of poverty is of great significance.
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • one aspect of the present invention provides a method for determining a copy number of a globin gene in a nucleic acid sample.
  • a method for determining the copy number of a globin gene is provided in a nucleic acid sample capable of effectively performing the method. System.
  • the method for determining the alpha globin gene copy number comprises the steps of: amplifying a nucleic acid sample to obtain an amplification product; constructing a sequencing library for the amplification product; and sequencing the sequencing library to obtain
  • the sequencing result consists of multiple sequencing data; the sequencing results are determined from the sequencing data of the globin gene; based on the number of sequencing data of the globin gene, the copy number of the globin gene is determined in the nucleic acid sample.
  • the above method of determining may also have the following additional technical features:
  • the nucleic acid sample is isolated from at least one of plasma, serum, whole blood and oral exfoliated cells of the subject.
  • the object is a person.
  • the ⁇ -globin gene is at least one selected from the group consisting of an HBA1 gene and a ⁇ 2 gene.
  • the nucleic acid sample is amplified using a specific primer set, wherein the specific primer set comprises a first primer and a second primer, the first primer having SEQ ID NO: 1 The nucleotide sequence shown, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2.
  • the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the sequencing is performed using at least one selected from the group consisting of Hiseq 2000, SOLID, 454 and a single molecule sequencing device.
  • the specific primer set further includes a third primer and a fourth primer, wherein the third primer and the fourth primer are specific to the reference gene, and further comprising: determining the Sequencing data from the internal reference gene in the sequencing results.
  • the internal reference gene is FLNB
  • the third primer has a nucleotide sequence as shown in SEQ ID NO: 3
  • the fourth primer has the sequence shown in SEQ ID NO: Nucleotide sequence.
  • the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the sequencing data from the sequencing result is obtained by aligning the sequencing result with a reference sequence.
  • determining the copy number of the ⁇ -globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene further comprises: sequencing data from the ⁇ -globin gene in the sequencing result Counting, obtaining the value ⁇ ; counting the sequencing data from the internal reference gene in the sequencing result to obtain the value C; calculating the ratio of the value ⁇ and C, obtaining the first parameter H/C, and the first parameter Comparing with a first reference value; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
  • the first reference value is a first parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known e-globin gene copy numbers.
  • the first reference value is a first parameter obtained by conducting a parallel experiment on a nucleic acid sample from a normal individual.
  • the globin gene is HBA1 and ⁇ 2, and the number of sequencing data from the HBA1 in the sequencing result is HI, and the sequencing result is the sequencing data from the HBA2.
  • the number is H2, wherein, based on the number of sequencing data of the globin gene, determining the copy number of the ⁇ globin gene in the nucleic acid sample further comprises: calculating a ratio of the value ⁇ 2 and HI to obtain a second parameter H2 /H1, and comparing the second parameter with a second reference value; and determining a copy number of the e-globin gene in the nucleic acid sample based on a ratio of the second parameter to the second reference value.
  • the second reference value is a second parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known e-globin gene copy numbers.
  • the present invention provides a primer composition.
  • the primer composition comprises a first primer having a nucleotide sequence of SEQ ID NO: 1, and a second primer having SEQ ID NO: : 2 The nucleotide sequence shown.
  • the 5' end of at least one of the aforementioned first primer and the second primer further comprises a tag sequence, the tag sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the primer composition of the present invention further comprises a third primer and a fourth primer, wherein the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, the fourth The primer has a nucleotide sequence as shown in SEQ ID NO: 4.
  • the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the present invention provides the use of the above primer composition for determining a copy number of a globin gene in a nucleic acid sample.
  • the invention provides a label composition.
  • the label composition consists of the label shown in SEQ ID NO: 5-100.
  • the present invention provides a system for determining the copy number of an e-globin gene in a nucleic acid sample.
  • the method includes: an amplification device for amplifying the nucleic acid sample to obtain an amplification product; a library construction device, the library construction device and the The amplification device is ligated, and is adapted to construct a sequencing library for the amplification product; a sequencing device, the sequencing device is connected to the library construction device, and is adapted to sequence the sequencing library to obtain a sequencing result And the sequencing result is composed of a plurality of sequencing data; the analyzing device is connected to the sequencing device, and is adapted to determine the sequencing result from the sequencing data of the globin gene; The number of sequencing data of the e-globin gene is determined in the nucleic acid sample (copy number of the globin gene.
  • the system for determining a copy number of a globin gene in a nucleic acid sample may also have the following additional technical features:
  • nucleic acid sample separation device adapted to isolate a nucleic acid sample from at least one of plasma, serum, whole blood, and oral exfoliated cells of a subject.
  • the ⁇ globin gene is at least one selected from the group consisting of an HBA1 gene and a ⁇ 2 gene.
  • the amplification device is provided with a specific primer set, wherein the specific primer set comprises a first primer and a second primer, and the first primer has SEQ ID: 1 The nucleotide sequence shown, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2.
  • the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the sequencing device is at least one selected from the group consisting of Hiseq2000, SOLID, 454 and a single-sequence sequencing device.
  • the specific primer set further includes a third primer and a fourth primer, wherein the third primer and the fourth primer are specific for an internal reference gene, and the analysis device is adapted to determine sequencing data from the internal reference gene in the sequencing result.
  • the internal reference gene is FLNB
  • the third primer has a nucleotide sequence as shown in SEQ ID NO: 3
  • the fourth primer has the sequence shown in SEQ ID NO: Nucleotide sequence.
  • the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the analysis device is adapted to determine sequencing data from the (globulin gene) in the sequencing result by comparing the sequencing result to a reference sequence.
  • the analyzing device is adapted to determine the copy number of the X globin gene in the nucleic acid sample by counting the sequencing data from the globin gene in the sequencing result, and obtaining a numerical value H; counting the sequencing data from the reference gene in the sequencing result to obtain a value C; calculating the ratio of the values H and C, obtaining the first parameter H/C, and performing the first parameter with the first reference value Comparing; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
  • the globin gene is HBA1 and B HBA2, and the number of sequencing data from the HBA2 in the sequencing result is H2, wherein the analyzing device is adapted to determine the a copy number of the alpha globin gene in the nucleic acid sample; calculating a ratio of the value ⁇ 2 to HI, obtaining a second parameter H2/H1, comparing the second parameter to a second reference value; and based on the second The ratio of the parameter to the second reference value determines the copy number of the e-globin gene in the nucleic acid sample.
  • Figure 1 shows a co-expanded region of cd and al in accordance with one embodiment of the present invention
  • Figure 2 shows the sequence of the HBA1, HBA2 primer candidate regions according to one embodiment of the present invention
  • Figure 3 shows the amplification efficiency of the globin gene (HBA) and the internal reference gene (Control) primers according to one embodiment of the present invention.
  • the left picture shows the HBA amplification curve and amplification efficiency
  • the right picture shows the Control amplification curve and amplification efficiency;
  • Figure 4 Schematic diagram showing PCR products after primer index and adaptor index labeling according to one embodiment of the present invention
  • Figure 5 shows a flow chart for sorting sequencing data according to one embodiment of the present invention
  • Figure 6 shows a globin gene copy number analysis procedure in accordance with one embodiment of the present invention
  • Figure 7 shows the results of electrophoretic detection of samples No. 1-64 in accordance with one embodiment of the present invention
  • Figure 8 shows HBA1 and HBA2 quantitative PCR for non-compliance of samples according to one embodiment of the present invention The result of the test;
  • Figure 9 shows the results of PCR detection of Anti-3.7 and Anti-4.2 specific primers for samples that do not conform to the results according to one embodiment of the present invention. Detailed description of the invention
  • the present invention provides a method for efficiently determining a copy number of a globin gene in a nucleic acid sample.
  • the method for determining a copy number of a globin gene in a nucleic acid sample according to an embodiment of the present invention includes the following steps :
  • the source of the nucleic acid sample is not limited.
  • the nucleic acid sample may be isolated from at least one of plasma, serum, whole blood, and oral exfoliated cells of the subject.
  • the source of the object is not particularly limited.
  • the objects that may be employed include mammals, preferably humans.
  • the e globin gene in the nucleic acid sample is at least one selected from the HBA1 gene and the HBA2 gene.
  • the present invention amplifies the nucleic acid sample by using a specific primer set, wherein the specific primer set comprises a first primer and a second primer, and the first primer has SEQ ID NO: The nucleotide sequence shown in Figure 1, wherein the second primer has the nucleotide sequence shown as SEQ ID NO: 2.
  • the amplification efficiency requirements of HBA1 and HBA2 primers are basically the same; the binding of HBA1 and HBA2 is highly homologous, and there are pseudogenes with high sequence similarity ( ⁇ , ⁇ 2). Characteristic, the highly conserved primer set provided by the present invention is capable of specifically co-amplifying the HBA1 and ⁇ 2 gene regions. Its design includes the following steps:
  • the gDNA sequences of the HBA1 and ⁇ 2 genes were introduced into the MegAlign program, and sequence alignment analysis was carried out, and a region in which the HBA1 and HBA2 sequences were consecutively identical in number of bases of 18 bp or more was used as a primary co-expansion candidate region.
  • the SNP and mutation information of the primer co-expansion candidate region recorded in the dbSNP database and the hbvar database are marked in the primary co-expansion candidate region of the corresponding primer, and the region in which the sequence has more than 18 bases without common gene mutation ( ⁇ 0.1%) is selected.
  • the sequence of the sub-co-expansion candidate region of cd and 02 (in this case, HBA1 gene and HBA2 gene, respectively) and the sequence of the corresponding region of the pseudo-gene ⁇ and ⁇ 2 were introduced into the MegAlign program, and the sequence alignment was analyzed.
  • the base site serves as a sequence-specific candidate site.
  • Primers were designed with sequence-specific candidate sites as primers at the 3' end, and primer sequences were subjected to genome-wide blast analysis. Except for exact alignment to cd and ct2, there were no precisely aligned primers at other positions in the genome.
  • the amplified region of the candidate primer pair is aligned with the deleted region of all the alpha deletion mutations recorded in the hbvar database, and the primer amplification region covers all primer pairs of the known deletion type as the final selected primer pair.
  • the final selected primer pair is HBA-F and HBA-R (Table 1); the co-expanded cd and ct2 regions are shown in Figure 1, as shown in Figure 1, the top wireframe is (beads)
  • Figure 2 shows the sequence of HBA1, ⁇ 2 primer candidate regions, as shown in Figure 2, where HBA1-Q is the A1 region sequence and HBA2-Q is the ⁇ 2 region sequence, frame Inside is the sequence of differences between the two.
  • the present invention further provides an internal reference primer, that is, a third primer and a fourth primer.
  • the internal reference primer is specific for the internal reference gene.
  • the reference gene may be selected from FLNB (bone dysplasia) having a significant clinical phenotype after gene deletion, and the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, The four primers have the nucleotide sequence as shown in SEQ ID NO: 4.
  • the selection of the internal reference gene requires that the copy number of the internal reference gene is constant, and generally the gene which causes obvious symptoms or death due to mutation or deletion is used as an internal reference.
  • FLNB bone dysplasia
  • a series of candidate primer pairs with the same PCR product length as 80-150 bp and the same primer annealing temperature as HBA-F and HBA-R were designed for the sequence conserved region of FLNB.
  • the amplification efficiency of the candidate primer pair and HBA-F/HBA-R was determined by quantitative PCR template concentration gradient dilution method.
  • the invention provides a specific tag sequence.
  • the 5' end of the first primer and the second primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 sequence.
  • the 5' end of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 sequence.
  • the present invention realizes separate labeling of multi-sample PCR products based on primer molecular labeling DNA molecular labeling technology, and mixes multiple samples in a library construction experiment.
  • the present invention designs and screens 96 sets of primer tag sequences according to the designed primer sequences in Tables 1 and 2, as shown in Table 3 (see Table 3).
  • 1 primer label length is 6-8bp
  • each sequencing library can be labeled by adding a different adaptor tag (adaptor index), and a sequencing library double-labeled with the primer tag and the linker tag can be obtained (Fig. 4).
  • a sequencing library can be labeled by adding a different adaptor tag (adaptor index), and a sequencing library double-labeled with the primer tag and the linker tag can be obtained (Fig. 4).
  • Fig. 4 a sequencing library double-labeled with the primer tag and the linker tag can be obtained.
  • multiple sequencing libraries with different linker tags are mixed and sequenced simultaneously (primer tags between sequencing libraries labeled with different adapter tags can be identical).
  • the DNA sequence information of each sample can be obtained by screening the sequence of the linker and the primer tag in the sequencing result.
  • the invention adopts primer label combined with multiplex PCR amplification strategy to simultaneously complete target gene and internal reference gene amplification in the same reaction system, thereby not only eliminating the problem of quantitative inaccuracy of the result caused by the difference in DNA starting amount, but also realizing the problem.
  • the simplicity of the experimental operation is not only eliminating the problem of quantitative inaccuracy of the result caused by the difference in DNA starting amount, but also realizing the problem.
  • the present invention controls the number of PCR amplification cycles in the early to mid-term of the exponential amplification period, that is, 22-26 cycle.
  • each set of label primers is designed with a normal control and a positive control, that is, each set of label primers contains one Samples with normal copy number and abnormal copy number.
  • the normal control is a sample with normal copy number of ⁇ -globin gene detected by GAP-PCR and quantitative PCR.
  • the positive control is to detect two samples of globin gene by GAP-PCR and quantitative PCR. Constructing a sequencing library; sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data;
  • the method of sequencing the e globin gene is not particularly limited.
  • Those skilled in the art can select different methods for constructing a sequencing library according to the specific scheme of the genome sequencing technology adopted.
  • For details of constructing the sequencing library refer to the protocol provided by a manufacturer of the sequencing instrument, such as Illumina, for example, see Illumina Multiplexinging Sample. Preparation Guide (Part #1005361; Feb 2010) or Paired-End SamplePrep Guide (Part #1005063; Feb 2010), which is incorporated herein by reference in its entirety.
  • the instrument for sequencing the globin gene includes but is not limited to Hiseq2000, SOLiD, 454 and single molecule sequencing devices. Determining sequencing data from the alpha globin gene in the sequencing result; and determining a copy number of the alpha globin gene in the nucleic acid sample based on the number of sequencing data of the alpha globin gene.
  • the sequencing results obtained include multiple sequencing data.
  • the sequencing result is determined from the (the sequencing data of the globin gene is obtained by comparing the sequencing result with a reference sequence. Those skilled in the art can understand that it can be adopted Any known method compares the sequencing results to a reference sequence.
  • determining the copy number of the a-globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene further comprises: sequencing data from the a-globin gene in the sequencing result Counting, obtaining the value ⁇ ; counting the sequencing data from the reference gene in the sequencing result to obtain the value C; calculating the ratio of the value ⁇ and C, obtaining the first parameter H/C, and the first parameter Comparing with a first reference value; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
  • the first reference value described herein is the first parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known a globin gene copy numbers. Specifically, it refers to a first parameter obtained by performing parallel experiments on nucleic acid samples from normal individuals.
  • determining the copy number of the e-globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene further comprises: calculating a ratio of the numerical value H2 and HI to obtain a second parameter H2/H1, And comparing the second parameter to a second reference value; and determining a copy number of the alpha globin gene in the nucleic acid sample based on a ratio of the second parameter to the second reference value.
  • the second reference value described herein is a second parameter obtained by performing parallel experiments on nucleic acid samples of individuals known to be globin gene copy numbers.
  • the sequencing data can assign the sequencing data to each site in the corresponding sample according to the library tag, the primer tag and the primer to perform the result analysis.
  • the invention judges the result based on multiple discriminant analysis and multiple Bayesian, and uses the relative quantitative principle to calculate the distance of the result by the relative ratio (sample target gene/internal reference gene to be tested), and calculate the sample to be tested relative to the normal control sample.
  • Figure 5 shows a flow chart for sorting sequencing data, wherein 1 library discrimination: sequencing reads into each library according to the library tag sequence;
  • HBA1 and HBA2 are distinguished: According to the internal difference sequence between HBA1 and HBA2 (Fig. 2)
  • the HBA reads are divided into HBA1 and HBA2.
  • N3 HBA2Y11 Y
  • N3 HBA2Y21
  • Y, set N3 1
  • T1 the ratio of the number of reads of HBA1 and Control to the sample to be tested
  • T2 the ratio of the number of HBA2 and Control reads to the sample to be tested
  • T3 ratio of the number of reads of HBA2 and HBA1 to be tested
  • N1 the ratio of the normal sample HBA1 to the number of reads of Control
  • N2 the ratio of the normal sample HBA2 to the number of reads of the Control
  • N3 the ratio of the number of reads of HBA2 to HBA1 in the normal sample
  • sample to be tested HBA1 gene copy number is a multiple of the normal control HBA1 copy number, used to determine the copy number of HBA1;
  • the HBA2 gene copy number of the sample to be tested is a multiple of the normal control HBA2 copy number and is used to judge HBA2. Number of copies;
  • Sample to be tested HBA2/HBA1 is a multiple of the normal sample HBA2/HBA1 and is used to check the accuracy of HBA1 and HBA2 copy number.
  • Form a Markov distance set calculate the direct distance of each sample of Rl, R2, R3 and the Mahalanobis distance by transforming the mahalanobis distance, and select the shortest distance by multi-discriminant analysis; the shortest according to Rl, R2, R3 The distance is determined by the copy number of HBA1 and HBA2; when one of R1, R2, and R3 does not match the judgment result of the other two distance values, the Bayesian prior value is used to adjust, and the corresponding R1, R2, or R3 is modified. P value, recalculate the distance, and finally judge the result.
  • the present invention provides a detection method based on a next-generation sequencing platform (a globin gene copy number detection method capable of simultaneously detecting various types of thalassemia caused by plaque gene copy number variation. High throughput, high accuracy and easy to automate the detection process. This method can be used for mass screening of thalassemia, such as marriage and pregnancy tests.
  • a next-generation sequencing platform a globin gene copy number detection method capable of simultaneously detecting various types of thalassemia caused by plaque gene copy number variation. High throughput, high accuracy and easy to automate the detection process. This method can be used for mass screening of thalassemia, such as marriage and pregnancy tests.
  • the methods employed in the present invention are equally applicable to the detection of various beta globin gene deletions and other gene copy number variations with similar patterns.
  • System for determining the copy number of alpha globin gene in a nucleic acid sample According to still another aspect of the present invention, the present invention also provides a system for determining a copy number of a Ct globin gene in a nucleic acid sample.
  • a system for determining (a globin gene copy number includes: an amplification device for amplifying the nucleic acid sample to obtain an amplification product; a library construction device, The library construction device is coupled to the amplification device and is adapted to construct a sequencing library for the amplification product; a sequencing device, the sequencing device is coupled to the library construction device, and is adapted to the sequencing library Sequencing to obtain sequencing results, the sequencing results consisting of a plurality of sequencing data; an analysis device, the analysis device being coupled to the sequencing device, and adapted to: determine that the sequencing result is from the (globin) Sequencing data of the gene; and determining the copy number of the globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene.
  • the system for determining the copy number of the e globin gene further comprises a nucleic acid sample separation device adapted to at least one of plasma, serum, whole blood, and oral exfoliated cells from the subject A nucleic acid sample is isolated.
  • the globin gene is at least one selected from the group consisting of an HBA1 gene and an HBA2 gene.
  • the amplification device is further provided with a specific primer set, wherein the specific primer set comprises a first primer and a second primer, the first primer having the SEQ ID NO: 1 a nucleotide sequence, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2.
  • the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being a core selected from at least one of SEQ ID NOs: 5-100 Glycosidic acid sequence.
  • the sequencing device is at least one selected from the group consisting of Hiseq2000, SOLID, 454, and a single molecule sequencing device.
  • the specific primer set further includes a third primer and a fourth primer, wherein the third primer and the fourth primer are specific for an internal reference gene, and the analysis device is adapted Sequencing data from the internal reference gene in the sequencing results is determined.
  • the internal reference gene is FLNB
  • the third primer has a nucleotide sequence as shown in SEQ ID NO: 3
  • the fourth primer has the sequence shown in SEQ ID NO: Nucleotide sequence.
  • the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
  • the analyzing device is adapted to determine sequencing data derived from the (globulin gene) in the sequencing result by comparing the sequencing result with a reference sequence.
  • the analyzing device is adapted to determine the copy number of the ⁇ - globin gene in the nucleic acid sample by: counting the sequencing data from the ⁇ -globin gene in the sequencing result to obtain a numerical value Sequencing the sequencing data from the internal reference gene in the sequencing result to obtain a value C; calculating the ratio of the value ⁇ to C, obtaining a first parameter H/C, and performing the first parameter with the first reference value Comparing; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
  • the globin gene is HBA1 and ⁇ 2, and the sequencing result is The number of sequencing data from the HBA1 is H1, and the number of sequencing data from the HBA2 in the sequencing result is H2, wherein the analysis device is adapted to determine the nucleic acid sample by the following steps (globin a copy number of the gene; calculating a ratio of the value H2 and HI to obtain a second parameter H2/H1, and comparing the second parameter with a second reference value; and based on the second parameter and the second The ratio of the reference values determines the copy number of the globin gene in the nucleic acid sample.
  • the PCR amplification process is divided into three stages: exponential amplification phase, linear amplification phase and plateau phase.
  • the amount of PCR product in the exponential amplification phase is linearly related to the amount of PCR starting template.
  • the real-time quantitative PCR method is used to detect the ratio of the PCR product amount (target gene/internal reference gene) in the exponential amplification phase of the target gene and the reference gene in the unknown sample of the target gene (hereinafter referred to as the unknown sample), and the detection result is Known samples (samples with known copy number of target genes) In comparison with the ratio of the amount of PCR products in the exponential amplification period, the target gene content of each sample to be tested relative to the known sample can be obtained, which is the principle of quantitative quantification of quantitative PCR. .
  • Fluorescence quantitative PCR is a change in the amount of PCR product reflected by the real-time accumulated fluorescence signal intensity during the PCR reaction, that is, the fluorescence signal intensity is linearly related to the amount of PCR product.
  • Experimental studies have shown that when the sequencing reads reach a certain depth, the sequencing start template amount of the next-generation sequencing technology represented by Hiseq is directly proportional to the number of sequencing reads obtained.
  • next-generation sequencing technology Based on the principle of next-generation sequencing technology, the amount of sequencing starting template is directly proportional to the number of sequencing reads obtained and the relative quantification of quantitative PCR.
  • the present invention utilizes the next-generation sequencing technology Hiseq-2000 platform to realize the ct thalassemia gene (HBA1, HBA2) High-throughput, low-cost accurate detection of copy numbers.
  • the method for determining a copy number of a globin gene in a nucleic acid sample of the present invention can effectively perform the aforementioned method for determining a copy number of a globin gene in a nucleic acid sample (for a method for determining a nucleic acid sample (globulin gene)
  • the present invention will be described below by way of specific examples, which need to be described. The examples are for illustrative purposes only and are not to be construed as limiting the invention in any way.
  • the magnetic beads method was used to automatically extract DNA from peripheral blood. 94 samples were taken from each batch of experiments, and 2 negative controls were used. The DNA concentration is required to be greater than 30 ng ⁇ l, the volume is 100 ⁇ l, and the 260/280 is 1.8-2.0.
  • DNA was extracted from 956 (one of which was a normal control and one of which was a positive control) using a KingFisher automatic extractor.
  • the main steps are as follows: Take out the deep hole plate and one shallow hole plate of the three Kingfisher automatic extractor. Add a certain amount of matching reagents according to the instructions and mark them. Place all the well plates with the reagents as required. Position, select the program "Bioeasy_200ul BloodDNA_KF.msz", press "star” to execute the program for nucleic acid extraction. At the end of the program, the eluted product of ⁇ ⁇ in the plate Elution was collected as the extracted DNA as a template for the next PCR.
  • 96 sets of label primers are corresponding to the 96-well PCR reaction plate, and each batch of experiments is tested in parallel with the test sample, the normal control and the positive control, that is, each set of label primers requires at least 3 PCR amplifications per batch of experiments. : One or more samples to be tested, one normal control and one positive control.
  • the 952 DNAs obtained in the sample extraction step were numbered 1-952 (where 951 was the normal control and 952 was the positive control), and 96 sets of HBA and Control tag primers (Tables 1, 2, 3) were used to amplify 952 copies. DNA samples, of which the 96th set of primers were negative controls without template addition.
  • the PCR reaction was carried out in a 96-well plate, in which a normal control (sample No.
  • sample 951 one sample for 96 reactions
  • a positive control sample 952, one sample for 96 reactions
  • a total of 12 plates numbered Q1 to Q10, the normal control is N, and the positive control is P, where Q1 corresponds to template 1-95 and Q2 corresponds to template 96-190, so named in order, and each plate is designed with a negative control without template added.
  • the PCR procedure is as follows:
  • the PCR reaction was run on a Bio-Rad PTC-200 PCR machine. After the PCR was completed, the 2 ⁇ 1 PCR product was detected by 2.0% agarose gel electrophoresis, as shown in Fig. 7).
  • the products obtained by PCR amplification of 96 sets of label primers were mixed, and each set of samples (one 96-well PCR)
  • the PCR products of the reaction plate (the primer labels are different) were separately mixed into an EP tube and purified. From the remaining PCR products of 12 plates of Q l - 10, N and P, take 15 ⁇ 1 of each well in each plate and mix them into 2ml EP tubes corresponding to the mark (this step is pooling), shake and mix, and take each from each.
  • the 1250 ⁇ 1 pooling product was purified by Qiagen DNA Purification kit (see the instructions for specific purification steps), and the obtained 37.5 ⁇ l product was purified.
  • the concentration of the product after purification of the 12-plate product was determined by Nanodrop 8000 (Thermo Fisher Scientific). Shown.
  • the purified product was constructed according to the library preparation process of the next-generation sequencing technology, and the density of the raw-cluster of the library was determined, and the average sequencing depth of the reference internal reference gene was ensured to be more than 1000 times, and then sequenced on the machine.
  • the reaction conditions were: Thermomixer 20 ° C warm bath for 30 min.
  • reaction product was recovered and purified by Qiagen DNA Purification Kit and dissolved in 32 ⁇ L of EB.
  • the reaction conditions were: Thermomixer 37 ° C warm bath for 30 min.
  • reaction product was recovered and purified by Qiagen DNA Purification Kit (QIAGEN) and dissolved in 38 ⁇ M of EB.
  • the reaction conditions were: Thermomixer 16 ° C warm bath for 16 h.
  • the reaction product was purified by 60 ⁇ l Ampure Beads (Beckman Coulter Genomics) and dissolved in 30 ⁇ L of deionized water.
  • the library concentration was determined by real-time PCR (QPCR) as shown in Table 6:
  • Fig. 8 shows the results of quantitative PCR detection of HBA1 and HBA2 in the case of non-compliance samples, as shown in Fig. 8, where Normal is a normal control and contains 2 HBA1 and 2 HBA2 genes.
  • the height of the histogram indicates the copy number ratio of the sample to be tested relative to Normal, where 0.5 indicates that the copy number is 1/2 of Normal, 1 indicates that the copy number is the same as Normal, and 1.5 indicates that the copy number is 1.5 times Normal.
  • Figure 9 shows The results were not consistent with the results of PCR detection of Anti-3.7 and Anti-4.2 specific primers, as shown in Figure 9, where the left image shows Anti-3.7 results, the right image shows Anti-4.2 results, Anti-3.7 and Anti- 3.7 indicates that there are two HBA1 or HBA2 genes on one chromosome, and there is a band indicating that the multi-copy variation exists.
  • the above verification results (Fig. 8, Fig. 9) are consistent with the results of the method. Method out this more advantageous than the conventional method for detecting the first 60 sample results as follows:
  • the method and system for detecting the copy number of an e globin gene of the present invention can be effectively applied to detecting a copy number of a globin gene in a nucleic acid sample, and the obtained test result has a high throughput and high accuracy.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are a method and system for detecting α-globin gene copy number in a nucleic acid sample, the method comprising: amplifying the nucleic sample to obtain an amplification product; establishing a sequencing library for the amplification product; sequencing the sequencing library to obtain a sequencing result, the sequencing result being composed of a plurality of sequenced data; determining in the sequencing result the sequenced data originating from the α-globin gene; and determining the α-globin gene copy number in the nucleic acid sample based on the number of the sequenced data of the α-globin gene. The above method can effectively determine the α-globin gene copy number in a nucleic acid sample.

Description

检测 α珠蛋白基因拷贝数的方法和系统  Method and system for detecting alpha globin gene copy number
优先权信息 Priority information
本申请请求 2012 年 8 月 6 日向中国国家知识产权局提交的、 专利申请号为 201210277141.9的专利申请的优先权和权益, 并且通过参照将其全文并入此处。 技术领域  Priority is claimed on Japanese Patent Application No. 201210277141.9, filed on Aug. Technical field
本发明涉及生物医学领域, 具体而言, 涉及 α珠蛋白基因拷贝数的方法、 引物组合 物、 标签组合物和系统。 背景技术  Field of the Invention This invention relates to the field of biomedicine and, in particular, to methods, primer compositions, label compositions and systems for alpha globin gene copy number. Background technique
地中海贫血 (以下简称地贫)是一种常见的溶血性单基因遗传病, 多发于中东、 中 亚、 非洲、 东南亚和中国南方等地区。 导致地贫的分子机理是: 珠蛋白基因发生缺陷使 其编码的肽链一种或几种合成减少或缺失, 致使血红蛋白的组成成分比例失衡, 进而导 致血红蛋白不稳定。 根据缺陷的珠蛋白基因种类不同, 地贫主要分为 e 地贫和 β地贫。 e 地贫大部分是由于 e 珠蛋白基因发生缺失, 部分是由突变所致, 其中缺失型占 90%以 上; β地贫大部分由于 β珠蛋白基因发生突变、 小的插入或缺失, 部分是由大片段缺失 或 α珠蛋白基因发生重复所致。 其中(X珠蛋白基因包含 2个 HBA1基因和 2个 ΗΒΑ2 基因, 其基因拷贝数变异(缺失和重复)不仅会导致 ( 地贫也会导致 β地贫, 因此检测 ( 珠蛋白基因拷贝数对地贫诊断具有重要意义。 发明内容  Thalassemia (hereinafter referred to as thalassemia) is a common hemolytic monogenic genetic disease, which occurs in the Middle East, Central Asia, Africa, Southeast Asia and southern China. The molecular mechanism leading to thalassemia is: Defects in the occurrence of the globin gene cause one or more of the peptide chains encoded by it to be reduced or deleted, resulting in an imbalance in the composition of hemoglobin, which in turn leads to hemoglobin instability. According to the different types of defective globin genes, thalassemia is mainly divided into e thalassemia and β thalassemia. e Most of the thalassemia is due to the deletion of the e-globin gene, partly due to mutations, of which more than 90% are deleted; most of the β-thalassemia is due to mutations, small insertions or deletions of the β-globin gene, and partly It is caused by a large fragment deletion or a repetition of the α-globin gene. Among them (the X-globin gene contains two HBA1 genes and two ΗΒΑ2 genes, and its gene copy number variation (deletion and duplication) not only causes (thalassaemia also leads to beta thalassemia, so detection (globin gene copy number to ground) The diagnosis of poverty is of great significance.
本发明旨在至少解决现有技术中存在的技术问题之一。为此, 本发明的一个方面提 出了一种能够确定核酸样本中 ( 珠蛋白基因拷贝数的方法。另一方面提供了一种能够有 效实施该方法的确定核酸样本中( 珠蛋白基因拷贝数的系统。  The present invention aims to solve at least one of the technical problems existing in the prior art. To this end, one aspect of the present invention provides a method for determining a copy number of a globin gene in a nucleic acid sample. On the other hand, a method for determining the copy number of a globin gene is provided in a nucleic acid sample capable of effectively performing the method. System.
根据本发明的实施例, 确定 α珠蛋白基因拷贝数的方法包括以下步骤: 对核酸样本 进行扩增, 以便得到扩增产物; 针对扩增产物, 构建测序文库; 对测序文库进行测序, 以便得到测序结果, 该测序结果由多个测序数据构成; 确定测序结果中来自于 ( 珠蛋白 基因的测序数据; 基于 ( 珠蛋白基因的测序数据的数目,确定核酸样本中( 珠蛋白基因 的拷贝数。  According to an embodiment of the present invention, the method for determining the alpha globin gene copy number comprises the steps of: amplifying a nucleic acid sample to obtain an amplification product; constructing a sequencing library for the amplification product; and sequencing the sequencing library to obtain The sequencing result consists of multiple sequencing data; the sequencing results are determined from the sequencing data of the globin gene; based on the number of sequencing data of the globin gene, the copy number of the globin gene is determined in the nucleic acid sample.
根据本发明的一些实施例,上述确定 ( 珠蛋白基因拷贝数的方法还可以具有下列附 加技术特征:  According to some embodiments of the invention, the above method of determining (the globin gene copy number may also have the following additional technical features:
根据本发明的一个实施例, 所述核酸样本是从对象的血浆、 血清、 全血和口腔脱落 细胞的至少一种分离的。 其中, 所述对象为人。 由此, 可以方便地从生物体获取这些样 本, 并且能够具体地针对某些疾病采取不同的样本, 从而针对某些特殊疾病采取特定的 分析手段。 根据本发明的一个实施例, 所述 α珠蛋白基因为选自 HBA1基因和 ΗΒΑ2基因的 至少一种。 According to an embodiment of the invention, the nucleic acid sample is isolated from at least one of plasma, serum, whole blood and oral exfoliated cells of the subject. Wherein the object is a person. Thus, it is convenient to obtain these samples from an organism, and it is possible to take different samples specifically for certain diseases, thereby taking specific analysis means for certain specific diseases. According to an embodiment of the present invention, the α-globin gene is at least one selected from the group consisting of an HBA1 gene and a ΗΒΑ2 gene.
根据本发明的一个实施例, 使用特异性引物组对所述核酸样本进行扩增, 其中, 所 述特异性引物组包含第一引物和第二引物, 所述第一引物具有 SEQ ID N0: 1所示的核 苷酸序列, 所述第二引物具有如 SEQ ID N0:2所示的核苷酸序列。  According to an embodiment of the present invention, the nucleic acid sample is amplified using a specific primer set, wherein the specific primer set comprises a first primer and a second primer, the first primer having SEQ ID NO: 1 The nucleotide sequence shown, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2.
根据本发明的一个实施例, 所述第一引物和第二引物的至少之一的 5 '端进一步含 有标签序列, 所述标签序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列。  According to an embodiment of the present invention, the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的一个实施例, 利用选自 Hiseq2000、 SOLID, 454和单分子测序装置 的至少一种进行所述测序。  According to one embodiment of the invention, the sequencing is performed using at least one selected from the group consisting of Hiseq 2000, SOLID, 454 and a single molecule sequencing device.
根据本发明的一个实施例, 所述特异性引物组进一步包括第三引物和第四引物, 其 中, 所述第三引物和第四引物对于内参基因是特异性的, 并且进一步包括: 确定所述测 序结果中来自所述内参基因的测序数据。  According to an embodiment of the present invention, the specific primer set further includes a third primer and a fourth primer, wherein the third primer and the fourth primer are specific to the reference gene, and further comprising: determining the Sequencing data from the internal reference gene in the sequencing results.
根据本发明的一个实施例, 所述内参基因是 FLNB, 所述第三引物具有如 SEQ ID NO: 3所示的核苷酸序列, 所述第四引物具有如 SEQ ID NO: 4所示的核苷酸序列。  According to an embodiment of the present invention, the internal reference gene is FLNB, the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, and the fourth primer has the sequence shown in SEQ ID NO: Nucleotide sequence.
根据本发明的一个实施例, 所述第三引物和第四引物的至少之一的 5 '端进一步含 有标签序列, 所述标签序列为选自 SEQ ID NO: 5-100的至少之一所示的核苷酸序列。  According to an embodiment of the present invention, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的一个实施例,确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数 据是通过将所述测序结果与参照序列进行比对而得到的。  According to an embodiment of the present invention, it is determined that the sequencing data from the sequencing result is obtained by aligning the sequencing result with a reference sequence.
根据本发明的一个实施例, 基于所述 ( 珠蛋白基因的测序数据的数目, 确定所述核 酸样本中 α珠蛋白基因的拷贝数进一步包括:对测序结果中来自于 α珠蛋白基因的测序 数据进行计数, 得到数值 Η; 对测序结果中来自于内参基因的测序数据进行计数, 得到 数值 C; 计算所述数值 Η和 C的比值, 得到第一参数 H/C, 并将所述第一参数与第一 参照值进行比较; 以及基于所述第一参数与所述第一参照值的比例, 确定所述核酸样本 中 ( 珠蛋白基因的拷贝数。  According to an embodiment of the present invention, determining the copy number of the α-globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene further comprises: sequencing data from the α-globin gene in the sequencing result Counting, obtaining the value Η; counting the sequencing data from the internal reference gene in the sequencing result to obtain the value C; calculating the ratio of the value Η and C, obtaining the first parameter H/C, and the first parameter Comparing with a first reference value; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
根据本发明的一个实施例,所述第一参照值是针对来自已知 e 珠蛋白基因拷贝数的 个体的核酸样本进行平行实验而得到的第一参数。  According to an embodiment of the invention, the first reference value is a first parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known e-globin gene copy numbers.
根据本发明的一个实施例,所述第一参照值是针对来自正常个体的核酸样本进行平 行实验而得到的第一参数。  According to an embodiment of the invention, the first reference value is a first parameter obtained by conducting a parallel experiment on a nucleic acid sample from a normal individual.
根据本发明的一个实施例, 所述 ( 珠蛋白基因为 HBA1禾 Β ΗΒΑ2, 所述测序结果中 来自于所述 HBA1的测序数据数目为 HI, 所述测序结果中来自于所述 HBA2的测序数 据数目为 H2, 其中, 基于所述 ( 珠蛋白基因的测序数据的数目, 确定所述核酸样本中 α 珠蛋白基因的拷贝数进一步包括: 计算所述数值 Η2 和 HI 的比值, 得到第二参数 H2/H1 , 并将所述第二参数与第二参照值进行比较; 以及基于所述第二参数与所述第二 参照值的比例, 确定所述核酸 样本中 e 珠蛋白基因的拷贝数。  According to an embodiment of the present invention, the globin gene is HBA1 and ΗΒΑ2, and the number of sequencing data from the HBA1 in the sequencing result is HI, and the sequencing result is the sequencing data from the HBA2. The number is H2, wherein, based on the number of sequencing data of the globin gene, determining the copy number of the α globin gene in the nucleic acid sample further comprises: calculating a ratio of the value Η2 and HI to obtain a second parameter H2 /H1, and comparing the second parameter with a second reference value; and determining a copy number of the e-globin gene in the nucleic acid sample based on a ratio of the second parameter to the second reference value.
根据本发明的一个实施例,所述第二参照值是针对来自已知 e 珠蛋白基因拷贝数的 个体的核酸样本进行平行实验而得到的第二参数。 根据本发明的又一方面, 本发明提供了一种引物组合物。 根据本发明的实施例, 该 引物组合物, 包含第一引物和第二引物, 所述第一引物具有 SEQ ID N0: 1所示的核苷 酸序列, 所述第二引物具有如 SEQ ID N0:2所示的核苷酸序列。 According to an embodiment of the invention, the second reference value is a second parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known e-globin gene copy numbers. According to still another aspect of the present invention, the present invention provides a primer composition. According to an embodiment of the present invention, the primer composition comprises a first primer having a nucleotide sequence of SEQ ID NO: 1, and a second primer having SEQ ID NO: : 2 The nucleotide sequence shown.
根据本发明的实施例, 前述的第一引物和第二引物的至少之一的 5'端进一步含有 标签序列, 所述标签序列为选自 SEQ ID NO: 5-100的至少之一所示的核苷酸序列。  According to an embodiment of the present invention, the 5' end of at least one of the aforementioned first primer and the second primer further comprises a tag sequence, the tag sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的一个实施例, 本发明的引物组合物进一步包括第三引物和第四引物, 其中, 所述第三引物具有如 SEQ ID NO: 3所示的核苷酸序列, 所述第四引物具有如 SEQ ID N0:4所示的核苷酸序列。  According to an embodiment of the present invention, the primer composition of the present invention further comprises a third primer and a fourth primer, wherein the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, the fourth The primer has a nucleotide sequence as shown in SEQ ID NO: 4.
根据本发明的一个实施例, 所述第三引物和第四引物的至少之一的 5 '端进一步含 有标签序列, 所述序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列。  According to an embodiment of the present invention, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的又一方面,本发明提供了上述引物组合物在确定核酸样本中 ( 珠蛋白 基因拷贝数中的用途。  According to still another aspect of the present invention, the present invention provides the use of the above primer composition for determining a copy number of a globin gene in a nucleic acid sample.
根据本发明的又一方面, 本发明提供了一种标签组合物。 根据本发明的实施例, 该 标签组合物由 SEQ ID NO:5-100所示的标签构成。  According to yet another aspect of the invention, the invention provides a label composition. According to an embodiment of the invention, the label composition consists of the label shown in SEQ ID NO: 5-100.
根据本发明的又一方面,本发明提供了一种确定核酸样本中 e 珠蛋白基因拷贝数的 系统。 根据本发明的实施例, 其特征在于, 包括: 扩增装置, 所述扩增装置用于对所述 核酸样本进行扩增, 以便得到扩增产物; 文库构建装置, 所述文库构建装置与所述扩增 装置相连, 并且适于针对所述扩增产物, 构建测序文库; 测序装置, 所述测序装置与所 述文库构建装置相连, 并且适于对所述测序文库进行测序, 以便得到测序结果, 所述测 序结果由多个测序数据构成; 分析装置, 所述分析装置与所述测序装置相连, 并且适于 确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数据;以及基于所述 e 珠蛋白基因 的测序数据的数目, 确定所述核酸样本中( 珠蛋白基因的拷贝数。  According to still another aspect of the present invention, the present invention provides a system for determining the copy number of an e-globin gene in a nucleic acid sample. According to an embodiment of the present invention, the method includes: an amplification device for amplifying the nucleic acid sample to obtain an amplification product; a library construction device, the library construction device and the The amplification device is ligated, and is adapted to construct a sequencing library for the amplification product; a sequencing device, the sequencing device is connected to the library construction device, and is adapted to sequence the sequencing library to obtain a sequencing result And the sequencing result is composed of a plurality of sequencing data; the analyzing device is connected to the sequencing device, and is adapted to determine the sequencing result from the sequencing data of the globin gene; The number of sequencing data of the e-globin gene is determined in the nucleic acid sample (copy number of the globin gene.
根据本发明的一些实施例,用于确定核酸样本中( 珠蛋白基因拷贝数的系统还可以 具有下列附加技术特征:  According to some embodiments of the invention, the system for determining a copy number of a globin gene in a nucleic acid sample may also have the following additional technical features:
根据本发明的一个实施例, 进一步包括核酸样本分离装置, 所述核酸样本分离装置 适于从对象的血浆、 血清、 全血和口腔脱落细胞的至少一种分离核酸样本。  According to an embodiment of the present invention, there is further provided a nucleic acid sample separation device adapted to isolate a nucleic acid sample from at least one of plasma, serum, whole blood, and oral exfoliated cells of a subject.
根据本发明的一个实施例, 所述 α珠蛋白基因为选自 HBA1基因和 ΗΒΑ2基因的 至少一种。  According to an embodiment of the present invention, the α globin gene is at least one selected from the group consisting of an HBA1 gene and a ΗΒΑ2 gene.
根据本发明的一个实施例, 所述扩增装置中设置有特异性引物组, 其中, 所述特异 性引物组包含第一引物和第二引物, 所述第一引物具有 SEQ ID ΝΟ: 1所示的核苷酸序 列, 所述第二引物具有如 SEQ ID NO:2所示的核苷酸序列。  According to an embodiment of the present invention, the amplification device is provided with a specific primer set, wherein the specific primer set comprises a first primer and a second primer, and the first primer has SEQ ID: 1 The nucleotide sequence shown, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2.
根据本发明的一个实施例, 所述第一引物和第二引物的至少之一的 5 '端进一步含 有标签序列, 所述标签序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列。  According to an embodiment of the present invention, the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being selected from at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的一个实施例, 所述测序装置为选自 Hiseq2000、 SOLID, 454和单分 子测序装置的至少一种。  According to an embodiment of the invention, the sequencing device is at least one selected from the group consisting of Hiseq2000, SOLID, 454 and a single-sequence sequencing device.
根据本发明的一个实施例, 所述特异性引物组进一步包括第三引物和第四引物, 其 中, 所述第三引物和第四引物对于内参基因是特异性的, 并且所述分析装置适于确定所 述测序结果中来自于所述内参基因的测序数据。 According to an embodiment of the present invention, the specific primer set further includes a third primer and a fourth primer, Wherein the third primer and the fourth primer are specific for an internal reference gene, and the analysis device is adapted to determine sequencing data from the internal reference gene in the sequencing result.
根据本发明的一个实施例, 所述内参基因是 FLNB, 所述第三引物具有如 SEQ ID NO:3所示的核苷酸序列, 所述第四引物具有如 SEQ ID NO:4所示的核苷酸序列。  According to an embodiment of the present invention, the internal reference gene is FLNB, the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, and the fourth primer has the sequence shown in SEQ ID NO: Nucleotide sequence.
根据本发明的一个实施例, 所述第三引物和第四引物的至少之一的 5 '端进一步含 有标签序列, 所述标签序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列。  According to an embodiment of the present invention, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的一个实施例,所述分析装置适于通过将所述测序结果与参照序列进行 对比而确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数据。  According to an embodiment of the invention, the analysis device is adapted to determine sequencing data from the (globulin gene) in the sequencing result by comparing the sequencing result to a reference sequence.
根据本发明的一个实施例, 所述分析装置适于通过下列步骤确定所述核酸样本中(X 珠蛋白基因的拷贝数: 对测序结果中来自于 ( 珠蛋白基因的测序数据进行计数, 得到数 值 H; 对测序结果中来自内参基因的测序数据进行计数, 得到数值 C; 计算所述数值 H 和 C的比值, 得到第一参数 H/C, 并将所述第一参数与第一参照值进行比较; 以及基于 所述第一参数与所述第一参照值的比例, 确定所述核酸样本中 ( 珠蛋白基因的拷贝数。  According to an embodiment of the present invention, the analyzing device is adapted to determine the copy number of the X globin gene in the nucleic acid sample by counting the sequencing data from the globin gene in the sequencing result, and obtaining a numerical value H; counting the sequencing data from the reference gene in the sequencing result to obtain a value C; calculating the ratio of the values H and C, obtaining the first parameter H/C, and performing the first parameter with the first reference value Comparing; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
根据本发明的一个实施例, 所述 ( 珠蛋白基因为 HBA1禾 B HBA2, 所述测序结果中 来自所述 HBA2的测序数据数目为 H2, 其中, 所述分析装置适于通过下列步骤确定所 述核酸样本中 α珠蛋白基因的拷贝数; 计算所述数值 Η2和 HI的比值, 得到第二参数 H2/H1 , 并将所述第二参数与第二参照值进行比较; 以及基于所述第二参数与所述第二 参照值的比例, 确定所述核酸样本中 e 珠蛋白基因的拷贝数。 本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。 附图说明  According to an embodiment of the present invention, the globin gene is HBA1 and B HBA2, and the number of sequencing data from the HBA2 in the sequencing result is H2, wherein the analyzing device is adapted to determine the a copy number of the alpha globin gene in the nucleic acid sample; calculating a ratio of the value Η2 to HI, obtaining a second parameter H2/H1, comparing the second parameter to a second reference value; and based on the second The ratio of the parameter to the second reference value determines the copy number of the e-globin gene in the nucleic acid sample. Additional aspects and advantages of the invention will be set forth in part in the description below, and some will be changed from the following description Obviously, or as understood by the practice of the invention.
本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中:  The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图 1 : 显示了根据本发明一个实施例的 cd和 al的共扩区域;  Figure 1: shows a co-expanded region of cd and al in accordance with one embodiment of the present invention;
图 2: 显示了根据本发明一个实施例的 HBA1 , HBA2引物候选区域的序列; 图 3 : 显示了根据本发明一个实施例的 ( 珠蛋白基因 (HBA) 与内参基因 (Control) 引物扩增效率, 其中, 左图为 HBA扩增曲线及扩增效率, 右图为 Control扩增曲线及 扩增效率;  Figure 2: shows the sequence of the HBA1, HBA2 primer candidate regions according to one embodiment of the present invention; Figure 3: shows the amplification efficiency of the globin gene (HBA) and the internal reference gene (Control) primers according to one embodiment of the present invention. , the left picture shows the HBA amplification curve and amplification efficiency, and the right picture shows the Control amplification curve and amplification efficiency;
图 4: 显示了根据本发明一个实施例的 primer index和 adaptor index标记后的 PCR 产物的示意图;  Figure 4: Schematic diagram showing PCR products after primer index and adaptor index labeling according to one embodiment of the present invention;
图 5 : 显示了根据本发明一个实施例的测序数据分类流程图;  Figure 5: shows a flow chart for sorting sequencing data according to one embodiment of the present invention;
图 6: 显示了根据本发明一个实施例的 ( 珠蛋白基因拷贝数分析流程;  Figure 6: shows a globin gene copy number analysis procedure in accordance with one embodiment of the present invention;
图 7: 显示了根据本发明一个实施例的 1-64号样本的电泳检测结果;  Figure 7: shows the results of electrophoretic detection of samples No. 1-64 in accordance with one embodiment of the present invention;
图 8 :显示了根据本发明一个实施例的结果不符样本进行 HBA1和 HBA2定量 PCR 检测的结果; 以及 Figure 8: shows HBA1 and HBA2 quantitative PCR for non-compliance of samples according to one embodiment of the present invention The result of the test;
图 9: 显示了根据本发明一个实施例的结果不符样本进行 Anti-3.7和 Anti-4.2特异 性引物 PCR检测的结果。 发明详细描述  Figure 9: shows the results of PCR detection of Anti-3.7 and Anti-4.2 specific primers for samples that do not conform to the results according to one embodiment of the present invention. Detailed description of the invention
下面详细描述本发明的实施例, 所述实施例的示例在附图中示出, 其中自始至终相 同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附 图描述的实施例是示例性的, 仅用于解释本发明, 而不能理解为对本发明的限制。  The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative only and not to limit the invention.
需要说明的是, 术语 "第一" 、 "第二 "仅用于描述目的, 而不能理解为指示或暗 示相对重要性或者隐含指明所指示的技术特征的数量。 由此, 限定有 "第一"、 "第二" 的特征可以明示或者隐含地包括一个或者更多个该特征。进一步地,在本发明的描述中, 除非另有说明, "多个" 的含义是两个或两个以上。 根据本发明的一个方面,本发明提出了一种能够有效确定核酸样本中 ( 珠蛋白基因 拷贝数的方法。根据本发明的实施例中确定核酸样本中( 珠蛋白基因拷贝数的方法包括 下列步骤:  It should be noted that the terms "first" and "second" are used for descriptive purposes only, and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first", "second" may explicitly or implicitly include one or more of the features. Further, in the description of the present invention, "multiple" means two or more unless otherwise stated. According to one aspect of the present invention, the present invention provides a method for efficiently determining a copy number of a globin gene in a nucleic acid sample. The method for determining a copy number of a globin gene in a nucleic acid sample according to an embodiment of the present invention includes the following steps :
对所述核酸样本进行扩增, 以便得到扩增产物;  Amplifying the nucleic acid sample to obtain an amplification product;
根据本发明的实施例, 核酸样本的来源是不受限制的。 根据本发明的一些实施例, 核酸样本可以是从对象的血浆、 血清、 全血和口腔脱落细胞的至少一种分离的。 根据本 发明的实施例, 对象的来源并不受特别限制。 根据本发明的一些具体示例, 可采用的对 象包括哺乳动物, 优选的是人。 根据本发明的实施例, 核酸样本中的 e 珠蛋白基因为选 自 HBA1基因和 HBA2基因的至少一种。  According to an embodiment of the invention, the source of the nucleic acid sample is not limited. According to some embodiments of the invention, the nucleic acid sample may be isolated from at least one of plasma, serum, whole blood, and oral exfoliated cells of the subject. According to an embodiment of the present invention, the source of the object is not particularly limited. According to some specific examples of the invention, the objects that may be employed include mammals, preferably humans. According to an embodiment of the present invention, the e globin gene in the nucleic acid sample is at least one selected from the HBA1 gene and the HBA2 gene.
根据本发明的实施例, 本发明采用特异性引物组对所述核酸样本进行扩增, 其中, 所述特异性引物组包含第一引物和第二引物, 所述第一引物具有 SEQ ID NO: 1所示的 核苷酸序列, 所述第二引物具有如 SEQ ID NO: 2所示的核苷酸序列。  According to an embodiment of the present invention, the present invention amplifies the nucleic acid sample by using a specific primer set, wherein the specific primer set comprises a first primer and a second primer, and the first primer has SEQ ID NO: The nucleotide sequence shown in Figure 1, wherein the second primer has the nucleotide sequence shown as SEQ ID NO: 2.
根据定量 PCR相对定量对引物扩增效率的要求, HBA1和 HBA2引物的扩增效率 要求基本一致;结合 HBA1和 HBA2高度同源,且都存在序列相似度很高的假基因(ψαΐ , ψα2) 的特点, 本发明提供的高保守性的引物组能够特异性地共扩增 HBA1和 ΗΒΑ2基 因区域。 其设计包括以下步骤:  According to the relative amplification of quantitative PCR, the amplification efficiency requirements of HBA1 and HBA2 primers are basically the same; the binding of HBA1 and HBA2 is highly homologous, and there are pseudogenes with high sequence similarity (ψαΐ, ψα2). Characteristic, the highly conserved primer set provided by the present invention is capable of specifically co-amplifying the HBA1 and ΗΒΑ2 gene regions. Its design includes the following steps:
1.1确定 HBA1和 ΗΒΑ2引物初级共扩候选区域  1.1 Determination of candidate regions for primary co-expansion of HBA1 and ΗΒΑ2 primers
把 HBA1和 ΗΒΑ2基因的 gDNA序列导入 MegAlign程序, 进行序列比对分析, 将 其中 HBA1和 HBA2序列连续相同碱基数在 18bp以上的区域作为初级共扩候选区域。  The gDNA sequences of the HBA1 and ΗΒΑ2 genes were introduced into the MegAlign program, and sequence alignment analysis was carried out, and a region in which the HBA1 and HBA2 sequences were consecutively identical in number of bases of 18 bp or more was used as a primary co-expansion candidate region.
1.2初级引物共扩候选区域序列保守性分析  1.2 Conservative analysis of candidate sequence sequences for primary primer co-expansion
将 dbSNP数据库和 hbvar数据库记录的引物共扩候选区域的 SNP和突变信息标记 在相应引物初级共扩候选区域, 筛选其中序列连续 18 个碱基以上无常见基因突变 ( <0.1%) 的区域为次级共扩候选区域。 1.3次级共扩候选区域序列特异性分析 The SNP and mutation information of the primer co-expansion candidate region recorded in the dbSNP database and the hbvar database are marked in the primary co-expansion candidate region of the corresponding primer, and the region in which the sequence has more than 18 bases without common gene mutation (<0.1%) is selected. The level coextensive candidate region. 1.3 Sequence analysis of secondary co-expansion candidate regions
将 cd和 02 (在本文中分别指 HBA1基因和 HBA2基因)的次级共扩候选区域序列 及其假基因 ψαΐ , ψοι2对应区域的序列导入 MegAlign程序, 进行序列比对分析, 将其 中的序列差异碱基位点作为序列特异性候选位点。  The sequence of the sub-co-expansion candidate region of cd and 02 (in this case, HBA1 gene and HBA2 gene, respectively) and the sequence of the corresponding region of the pseudo-gene ψαΐ and ψοι2 were introduced into the MegAlign program, and the sequence alignment was analyzed. The base site serves as a sequence-specific candidate site.
1.4特异性引物设计  1.4 specific primer design
以序列特异性候选位点为引物的 3 ' 末端设计引物, 将引物序列进行全基因组比对 (blast)分析, 除精确比对到 cd和 ct2外, 在基因组的其它位置无精确比对的引物为候 选引物; 对候选引物进行正反配对, 将满足正反候选引物对的扩增长度在 80-150bp范 围内, 且其共扩增的 cd和 ct2序列有差异碱基存在的引物对命名为候选引物对。  Primers were designed with sequence-specific candidate sites as primers at the 3' end, and primer sequences were subjected to genome-wide blast analysis. Except for exact alignment to cd and ct2, there were no precisely aligned primers at other positions in the genome. As a candidate primer; positive and negative pairing of the candidate primers, the amplification length of the pair of positive and negative candidate primer pairs is in the range of 80-150 bp, and the primer pairs of the co-amplified cd and ct2 sequences having different bases are named as Candidate primer pair.
1.5引物扩增区域涵盖所有已知缺失型  1.5 primer amplification region covers all known deletions
将候选引物对的扩增区域与 hbvar数据库记录的所有 α缺失突变的缺失区域进行比 对, 引物扩增区域涵盖所有已知缺失型的引物对为最终选定的引物对。  The amplified region of the candidate primer pair is aligned with the deleted region of all the alpha deletion mutations recorded in the hbvar database, and the primer amplification region covers all primer pairs of the known deletion type as the final selected primer pair.
根据以上引物设计原则, 最终选定的引物对为 HBA-F和 HBA-R (表 1 ) ; 其共扩 的 cd和 ct2区域见图 1, 如图 1所示, 最上方线框为 ( 珠蛋白基因序列, 其下多条直线 为( 地贫各种缺失型对应的缺失区域, 虚线框内的区域(A1和 Α2 )为各种缺失型涉及 HBA1或 ΗΒΑ2基因的共缺失区域,即 Al, Α2分别为 HBA1和 ΗΒΑ2上的共扩增区域; 图 2显示了 HBA1 , ΗΒΑ2引物候选区域的序列, 如图 2所示, 其中 HBA1-Q为 A1区 域序列, HBA2-Q为 Α2区域序列, 框内为两者的差异序列。  According to the above primer design principles, the final selected primer pair is HBA-F and HBA-R (Table 1); the co-expanded cd and ct2 regions are shown in Figure 1, as shown in Figure 1, the top wireframe is (beads) The sequence of the protein gene, under which a plurality of straight lines are (the missing regions corresponding to various deletion types of the thalassemia, and the regions within the dotted line (A1 and Α2) are various deletion regions involving the HBA1 or ΗΒΑ2 gene, ie, Al, Α2 is the co-amplified region on HBA1 and ΗΒΑ2, respectively; Figure 2 shows the sequence of HBA1, ΗΒΑ2 primer candidate regions, as shown in Figure 2, where HBA1-Q is the A1 region sequence and HBA2-Q is the Α2 region sequence, frame Inside is the sequence of differences between the two.
表 1. ΗΒΑ、 ΗΒΑ2引物序列
Figure imgf000007_0001
根据本发明的实施例, 本发明进一步提供内参引物, 即第三引物和第四引物。 该内 参引物对于内参基因是特异性的。根据本发明的一些具体示例, 内参基因可选自基因缺 失后会有明显临床表型的 FLNB (骨发育不全症) , 第三引物具有如 SEQ ID NO:3所示 的核苷酸序列, 第四引物具有如 SEQ ID NO:4所示的核苷酸序列。
Table 1. ΗΒΑ, ΗΒΑ2 primer sequences
Figure imgf000007_0001
According to an embodiment of the present invention, the present invention further provides an internal reference primer, that is, a third primer and a fourth primer. The internal reference primer is specific for the internal reference gene. According to some specific examples of the present invention, the reference gene may be selected from FLNB (bone dysplasia) having a significant clinical phenotype after gene deletion, and the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, The four primers have the nucleotide sequence as shown in SEQ ID NO: 4.
选择内参基因要求内参基因的拷贝数恒定,一般常以因发生突变或缺失会导致个体 出现明显症状或致死的基因为内参。本发明选择基因缺失后会有明显临床表型的 FLNB (骨发育不全症) 为内参基因。 针对 FLNB的序列保守区域, 设计一系列 PCR产物长 度在 80-150bp, 引物退火温度与 HBA-F和 HBA-R相同的候选引物对。 利用定量 PCR 模板浓度梯度稀释法对候选引物对和 HBA-F/HBA-R的扩增效率进行测定。最终选定与 HBA-F/HBA-R扩增效率最接近 (如图 3 ) 的 Control -F/Control -R引物对 (如表 2 ) 为 内参基因扩增引物。 表 2 内参引物序列
Figure imgf000008_0001
根据本发明的实施例, 本发明提供了特异性的标签序列。根据本发明的一些具体示 例,所述第一引物和第二引物的 5'端进一步含有标签序列,所述标签序列为选自 SEQ ID NO: 5-100的至少之一所示的核苷酸序列。 根据本发明的一些具体示例, 所述第三引物 和第四引物的 5'端进一步含有标签序列, 所述标签序列为选自 SEQ ID NO: 5-100的至 少之一所示的核苷酸序列。
The selection of the internal reference gene requires that the copy number of the internal reference gene is constant, and generally the gene which causes obvious symptoms or death due to mutation or deletion is used as an internal reference. In the present invention, FLNB (bone dysplasia) having a clinical phenotype after deletion of the selected gene is an internal reference gene. A series of candidate primer pairs with the same PCR product length as 80-150 bp and the same primer annealing temperature as HBA-F and HBA-R were designed for the sequence conserved region of FLNB. The amplification efficiency of the candidate primer pair and HBA-F/HBA-R was determined by quantitative PCR template concentration gradient dilution method. The Control-F/Control-R primer pair (see Table 2), which is the closest to the HBA-F/HBA-R amplification efficiency (Figure 2), was selected as the internal reference gene amplification primer. Table 2 Internal Reference Primer Sequence
Figure imgf000008_0001
According to an embodiment of the invention, the invention provides a specific tag sequence. According to some specific examples of the present invention, the 5' end of the first primer and the second primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 sequence. According to some specific examples of the present invention, the 5' end of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 sequence.
为了提高检测通量, 本发明基于引物标签 (primer index)标记的 DNA分子标签技 术, 实现对多样本 PCR 产物的分别标记, 在文库构建实验环节中将多个样本混合 In order to increase the detection flux, the present invention realizes separate labeling of multi-sample PCR products based on primer molecular labeling DNA molecular labeling technology, and mixes multiple samples in a library construction experiment.
(pooling) 成一个文库; 同时结合新一代测序技术的文库标签 (adaptor index) 技术, 使得一次上机测序就能检测数千份样本; 最终, 每个样本的检测结果可以通过其独特的 标签 (index) 序列找回, 从而达到简化实验操作目的。 本发明根据表 1和表 2中设计 好的引物序列, 结合引物标签设计原理设计并筛选 96套引物标签序列 (如表 3 ) 。 Pooling into a library; combined with the next-generation sequencing technology's adaptor index technology, thousands of samples can be detected by sequencing on a single machine; ultimately, each sample can pass its unique label ( Index) The sequence is retrieved to simplify the experimental operation. The present invention designs and screens 96 sets of primer tag sequences according to the designed primer sequences in Tables 1 and 2, as shown in Table 3 (see Table 3).
表 3 96套引物标签序列 Table 3 96 sets of primer tag sequences
Figure imgf000009_0001
Figure imgf000009_0001
Figure imgf000010_0001
Figure imgf000010_0001
CZZ080/CT0ZN3/X3d ■9ΐε請 ΪΟΖ OAV 引物标签的设计要求包括: CZZ080/CT0ZN3/X3d ■9ΐεPlease ΪΟΖ OAV Primer label design requirements include:
①引物标签长度为 6-8bp;  1 primer label length is 6-8bp;
②每套引物标签及反向互补序列之间至少存在 2个碱基差异;  2 there is at least 2 base differences between each set of primer tags and the reverse complement sequence;
③不出现连续 3个碱基的重复;  3 does not appear repeated 3 bases;
④不存在 ACGACG3连串的碱基;  4 There are no ACGACG3 series of bases;
① AC含量不超过 70%;  1 AC content does not exceed 70%;
② 出现关键碱基即接头 (adaptor) 的字符串;  2 A string of key bases, ie, adaptors, appears;
引物标签的使用方法:  How to use the primer label:
将表 3中不同引物标签序列与表 1和表 2中的引物序列的 5 '端相连接, 构成 96套 标签引物。 实验时, 通过 PCR在每个样本的 PCR产物两端同时引入引物标签 (primer index) ; 把多个带有不同引物标签的 PCR产物混合在一起, 用于构建测序文库。 当需 要构建多个测序文库时, 可通过添加带有不同接头标签 (adaptor index) , 来标记各个 测序文库, 并得到经引物标签和接头标签双重标记后的测序文库 (如图 4)。 文库构建完 毕后, 将带有不同接头标签标记的多个测序文库混合在一起同时进行上机测序(不同接 头标签标记的测序文库之间的引物标签可以相同) 。 测序结果出来后, 通过对测序结果 中接头标签和引物标签序列信息的筛选, 可获得每个样本的 DNA序列信息。  The different primer tag sequences in Table 3 were ligated to the 5' ends of the primer sequences in Tables 1 and 2 to form 96 sets of tag primers. In the experiment, a primer index was simultaneously introduced into the PCR product of each sample by PCR; a plurality of PCR products with different primer tags were mixed together to construct a sequencing library. When multiple sequencing libraries need to be constructed, each sequencing library can be labeled by adding a different adaptor tag (adaptor index), and a sequencing library double-labeled with the primer tag and the linker tag can be obtained (Fig. 4). After the library is constructed, multiple sequencing libraries with different linker tags are mixed and sequenced simultaneously (primer tags between sequencing libraries labeled with different adapter tags can be identical). After the sequencing results are obtained, the DNA sequence information of each sample can be obtained by screening the sequence of the linker and the primer tag in the sequencing result.
本发明采用引物标签结合多重 PCR扩增策略, 在同一个反应体系中同时完成目标 基因和内参基因扩增, 不仅消除了由 DNA起始量差异造成的结果定量不准确的问题, 同时又实现了实验操作的简便性。  The invention adopts primer label combined with multiplex PCR amplification strategy to simultaneously complete target gene and internal reference gene amplification in the same reaction system, thereby not only eliminating the problem of quantitative inaccuracy of the result caused by the difference in DNA starting amount, but also realizing the problem. The simplicity of the experimental operation.
为保证 PCR产物接近起始 DNA中待测基因和内参基因的真实含量, 消除 PCR终 点定量的结果误差,本发明将 PCR扩增循环数控制在指数扩增期的前期至中期即 22-26 个循环。  In order to ensure that the PCR product is close to the true content of the gene to be tested and the internal reference gene in the starting DNA, and to eliminate the error of the quantitative result of the PCR end point, the present invention controls the number of PCR amplification cycles in the early to mid-term of the exponential amplification period, that is, 22-26 cycle.
为消除不同引物标签可能对 PCR过程产生的影响, 同时为保证实验结果的有效性 和准确性, 每次实验时每套标签引物均设计一个正常对照及阳性对照, 即每套标签引物 均包含一个拷贝数正常和拷贝数异常的样本。其中正常对照是经 GAP-PCR和定量 PCR 检测 α珠蛋白基因拷贝数正常的样本, 阳性对照是经 GAP-PCR和定量 PCR检测缺失 两个 ( 珠蛋白基因的样本。 针对所述扩增产物, 构建测序文库; 对所述测序文库进行测序, 以便得到测序结 果, 所述测序结果由多个测序数据构成;  In order to eliminate the influence of different primer labels on the PCR process, and to ensure the validity and accuracy of the experimental results, each set of label primers is designed with a normal control and a positive control, that is, each set of label primers contains one Samples with normal copy number and abnormal copy number. The normal control is a sample with normal copy number of α-globin gene detected by GAP-PCR and quantitative PCR. The positive control is to detect two samples of globin gene by GAP-PCR and quantitative PCR. Constructing a sequencing library; sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data;
根据本发明的实施例, 对 e 珠蛋白基因进行测序的方法不受特别限制。本领域技术 人员可以根据采用的基因组测序技术的具体方案选择不同的构建测序文库的方法,关于 构建测序文库的细节, 可以参见测序仪器的厂商例如 Illumina公司所提供的规程, 例如 参见 Illumina 公司 Multiplexing Sample Preparation Guide ( Part#1005361; Feb 2010)或 Paired-End SamplePrep Guide ( Part#1005063 ; Feb 2010), 通过参照将其全文并入本文。  According to an embodiment of the present invention, the method of sequencing the e globin gene is not particularly limited. Those skilled in the art can select different methods for constructing a sequencing library according to the specific scheme of the genome sequencing technology adopted. For details of constructing the sequencing library, refer to the protocol provided by a manufacturer of the sequencing instrument, such as Illumina, for example, see Illumina Multiplexinging Sample. Preparation Guide (Part #1005361; Feb 2010) or Paired-End SamplePrep Guide (Part #1005063; Feb 2010), which is incorporated herein by reference in its entirety.
根据本发明的一个实施例, 对 ( 珠蛋白基因进行测序的仪器包括但不限于 Hiseq2000、 SOLiD、 454和单分子测序装置。 确定所述测序结果中来自于所述 α珠蛋白基因的测序数据; 以及基于所述 α珠蛋 白基因的测序数据的数目, 确定所述核酸样本中 α珠蛋白基因的拷贝数。 According to one embodiment of the invention, the instrument for sequencing the globin gene includes but is not limited to Hiseq2000, SOLiD, 454 and single molecule sequencing devices. Determining sequencing data from the alpha globin gene in the sequencing result; and determining a copy number of the alpha globin gene in the nucleic acid sample based on the number of sequencing data of the alpha globin gene.
在完成对 e 珠蛋白基因进行测序之后, 所得到的测序结果中包含了多个测序数据。 根据本发明的实施例,确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数据是通过 将所述测序结果与参照序列进行比对而得到的。本领域技术人员能够理解, 可以采用任 何已知的方法将测序结果与参照序列进行比对。  After sequencing the e-globin gene, the sequencing results obtained include multiple sequencing data. According to an embodiment of the present invention, the sequencing result is determined from the (the sequencing data of the globin gene is obtained by comparing the sequencing result with a reference sequence. Those skilled in the art can understand that it can be adopted Any known method compares the sequencing results to a reference sequence.
根据本发明的一个具体实例, 基于所述 ( 珠蛋白基因的测序数据的数目,确定所述 核酸样本中 a珠蛋白基因的拷贝数进一步包括:对测序结果中来自于 a珠蛋白基因的测 序数据进行计数, 得到数值 Η; 对测序结果中来自于内参基因的测序数据进行计数, 得 到数值 C ; 计算所述数值 Η和 C的比值, 得到第一参数 H/C , 并将所述第一参数与第 一参照值进行比较; 以及基于所述第一参数与所述第一参照值的比例, 确定所述核酸样 本中( 珠蛋白基因的拷贝数。  According to a specific example of the present invention, determining the copy number of the a-globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene further comprises: sequencing data from the a-globin gene in the sequencing result Counting, obtaining the value Η; counting the sequencing data from the reference gene in the sequencing result to obtain the value C; calculating the ratio of the value Η and C, obtaining the first parameter H/C, and the first parameter Comparing with a first reference value; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
这里所述的第一参照值是针对来自已知 a 珠蛋白基因拷贝数的个体的核酸样本进 行平行实验而得到的第一参数。特别指的是, 针对来自正常个体的核酸样本进行平行实 验而得到的第一参数。  The first reference value described herein is the first parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known a globin gene copy numbers. Specifically, it refers to a first parameter obtained by performing parallel experiments on nucleic acid samples from normal individuals.
根据本发明的一个具体实例, ( 珠蛋白基因为 HBA1和 HBA2 , 所述测序结果中来 自于所述 HBA1的测序数据数目为 Hl, 所述测序结果中来自于所述 HBA2的测序数据 数目为 H2, 其中, 基于所述( 珠蛋白基因的测序数据的数目, 确定所述核酸样本中 e 珠蛋白基因的拷贝数进一步包括:计算所述数值 H2和 H I的比值,得到第二参数 H2/H1 , 并将所述第二参数与第二参照值进行比较;以及基于所述第二参数与所述第二参照值的 比例, 确定所述核酸样本中 α珠蛋白基因的拷贝数。  According to a specific example of the present invention, (the globin genes are HBA1 and HBA2, the number of sequencing data from the HBA1 in the sequencing result is H1, and the number of sequencing data from the HBA2 in the sequencing result is H2 Wherein, determining the copy number of the e-globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene further comprises: calculating a ratio of the numerical value H2 and HI to obtain a second parameter H2/H1, And comparing the second parameter to a second reference value; and determining a copy number of the alpha globin gene in the nucleic acid sample based on a ratio of the second parameter to the second reference value.
这里所述的第二参照值是针对来自已知 ( 珠蛋白基因拷贝数的个体的核酸样本进 行平行实验而得到的第二参数。  The second reference value described herein is a second parameter obtained by performing parallel experiments on nucleic acid samples of individuals known to be globin gene copy numbers.
本发明在样本测序完成后,测序数据能够根据文库标签(adaptor)、引物标签(index ) 和引物序列 (primer) 将测序数据分配到对应样本中的每个位点进行结果分析。 本发明 基于多重判别分析和多重贝叶斯对结果进行判断, 采用相对定量原理以相对比值(待测 样本目标基因 /内参基因) 对结果进行距离转化处理, 计算待测样本相对于正常对照样 本珠蛋白基因 (HBA1和 HBA2 ) 拷贝数的比值, 得出待测样本珠蛋白基因的拷贝数, 并以珠蛋白基因内部 HBA2与 HBA1拷贝数比值 (HBA2/HBA1 ) 作为质控对结果进行 复核, 最终确定待测样本中 HBA1和 HBA2基因的拷贝数。  After the sequencing of the sample is completed, the sequencing data can assign the sequencing data to each site in the corresponding sample according to the library tag, the primer tag and the primer to perform the result analysis. The invention judges the result based on multiple discriminant analysis and multiple Bayesian, and uses the relative quantitative principle to calculate the distance of the result by the relative ratio (sample target gene/internal reference gene to be tested), and calculate the sample to be tested relative to the normal control sample. The ratio of the copy number of the protein gene (HBA1 and HBA2), the copy number of the globin gene of the sample to be tested, and the HBA2 and HBA1 copy number ratio (HBA2/HBA1) of the globin gene as a quality control to check the result, and finally The copy number of the HBA1 and HBA2 genes in the sample to be tested is determined.
根据本发明的一个具体示例, 数据分析详细步骤如下:  According to a specific example of the present invention, the detailed steps of data analysis are as follows:
1、 测序数据的分类  1. Classification of sequencing data
根据文库标签、引物标签和引物序列将测序数据分配到对应样本中的每个位点(如 图 5 ) 。 图 5显示了测序数据分类流程图, 其中, ① 文库区分: 根据文库标签序列将测序 reads分到每个文库中;The sequencing data is assigned to each site in the corresponding sample based on the library tag, primer tag, and primer sequence (Figure 5). Figure 5 shows a flow chart for sorting sequencing data, wherein 1 library discrimination: sequencing reads into each library according to the library tag sequence;
② 样本区分: 在文库内根据不同的引物标签(表 3)将测序 reads分到每个 样本中; 2 Sample differentiation: Sequencing reads are assigned to each sample in the library according to different primer labels (Table 3);
③ 位点区分: 根据引物序列(表 1与表 2)将样本内的 reads分到珠蛋白基 因和内参基因;  3-site discrimination: According to the primer sequence (Table 1 and Table 2), the reads in the sample are divided into the globin gene and the reference gene;
④ HBA1与 HBA2区分: 根据 HBA1与 HBA2内部的差异序列 (图 2) 将 4 HBA1 and HBA2 are distinguished: According to the internal difference sequence between HBA1 and HBA2 (Fig. 2)
HBA的 reads分为 HBA1与 HBA2。 The HBA reads are divided into HBA1 and HBA2.
2、 ct珠蛋白基因拷贝数分析  2, ct globin gene copy number analysis
由于正常对照存在 2个 HBA1基因、 2个 HBA2基因和 2个内参基因, 同时每个待 测样本也存在 2个内参基因, 因此将分析流程 (图 6) 分成以下几步:  Since there are 2 HBA1 genes, 2 HBA2 genes and 2 internal reference genes in the normal control, and there are also 2 internal reference genes in each sample to be tested, the analysis process (Fig. 6) is divided into the following steps:
①分别计算待测样本和正常对照目的基因与内参基因 reads数比例, 待测样本 为 Tl ( T1 = HBA1Y III|Y )和 Τ2 ( Τ2= ΗΒΑ2'""Ι ) , 正常对照 为 N1 1 Calculate the ratio of the target gene to the normal control gene and the number of reads of the internal control gene, respectively. The samples to be tested are Tl ( T1 = HBA1Y III|Y ) and Τ 2 ( Τ 2 = ΗΒΑ 2 '""Ι ), and the normal control is N1.
Control(2|||) Control2 |||J  Control(2|||) Control2 |||J
( N1 = HB„ ^N2 (N2= HBA2|2||| ( N1 = HB„ ^ N2 (N2= HBA2|2|||
C。ntrol(2 HI) C。ntrol[2 |||]  C. Ntrol(2 HI) C. Ntrol[2 |||]
②用 Nl和 N2对结果进行归一化处理, 得出待测样本相与正常对照的相对比 值 R1(R1 = ~^)禾口 R2(R2=~^); 2 Normalize the results with Nl and N2, and obtain the relative ratios of the sample phase to the normal control R1 (R1 = ~^) and R2 (R2 = ~^);
Nl N2  Nl N2
③分别计算待测样本和正常对照 HBA2 与 HBA1 的 reads 数比例 T3 3 Calculate the ratio of the number of reads of HBA2 and HBA1 in the sample to be tested and the normal control T3
= HBA2Y 1111 Y )和 N3(N3 = HBA2Y21|| Y, 设定 N3=1); 可以消除目的基因与内 HBA1YIIH Y HBA1Y2IH Y 参基因在 PCR过程中可能存在扩增效率不一致的情况; = HBA2Y 1111 Y ) and N3 (N3 = HBA2Y21|| Y, set N3 = 1); can eliminate the possibility that the target gene and the internal HBA1YIIH Y HBA1Y2IH Y reference gene may have inconsistent amplification efficiency during PCR;
④用 N3 对结果进行归一化处理, 得出待测样本与正常对照的相对比值 R3(R3 =— );  4 Normalize the result with N3, and obtain the relative ratio R3 of the sample to be tested and the normal control (R3 = - );
N3  N3
其中, 在上述公式中, 各字母的含义如下:  Among them, in the above formula, the meaning of each letter is as follows:
T1: 待测样本 HBA1与 Control的 reads数的比值;  T1: the ratio of the number of reads of HBA1 and Control to the sample to be tested;
T2: 待测样本 HBA2与 Control的 reads数的比值;  T2: the ratio of the number of HBA2 and Control reads to the sample to be tested;
T3: 待测样本 HBA2与 HBA1的 reads数的比值;  T3: ratio of the number of reads of HBA2 and HBA1 to be tested;
N1: 正常样本 HBA1与 Control的 reads数的比值;  N1: the ratio of the normal sample HBA1 to the number of reads of Control;
N2: 正常样本 HBA2与 Control的 reads数的比值;  N2: the ratio of the normal sample HBA2 to the number of reads of the Control;
N3: 正常样本 HBA2与 HBA1的 reads数的比值;  N3: the ratio of the number of reads of HBA2 to HBA1 in the normal sample;
R1:待测样本 HBA1基因拷贝数是正常对照 HBA1拷贝数的倍数,用于判断 HBA1 的拷贝数;  R1: sample to be tested HBA1 gene copy number is a multiple of the normal control HBA1 copy number, used to determine the copy number of HBA1;
R2:待测样本 HBA2基因拷贝数是正常对照 HBA2拷贝数的倍数,用于判断 HBA2 的拷贝数; R2: The HBA2 gene copy number of the sample to be tested is a multiple of the normal control HBA2 copy number and is used to judge HBA2. Number of copies;
R3 : 待测样本 HBA2/HBA1是正常样本 HBA2/HBA1 的倍数, 用于复核 HBA1与 HBA2拷贝数的准确性。  R3 : Sample to be tested HBA2/HBA1 is a multiple of the normal sample HBA2/HBA1 and is used to check the accuracy of HBA1 and HBA2 copy number.
由于 T1和 T2中的内参以及 N1和 N2中的 HBA1、 HBA2与内参的拷贝数都是已 知, 可以根据 Rl、 R2、 R3对 HBA1禾卩 HBA2的数目进行判断。 针对 HBA1禾卩 HBA2 基因目前已报道的拷贝数变异情况, 建立各种拷贝数变异情况所对应 Rl、 R2、 R3的理 论参数表(如表 4),并将每个值转化为马氏距离,组成一个马氏距离集;通过 mahalanobis 距离转化, 计算待测样本 Rl、 R2、 R3与马氏距离集中每个值的直接距离, 通过多重判 别分析选择其最短距离; 根据 Rl、 R2、 R3的最短距离判定 HBA1和 HBA2的拷贝数; 当 Rl、 R2、 R3中有一个值与另外两个距离值的判断结果不相符时, 利用贝叶斯先验值 进行调整, 修改 Rl、 R2或 R3对应的 P值, 重新计算距离, 最后判定结果。  Since the internal parameters in T1 and T2 and the copy numbers of HBA1, HBA2 and internal parameters in N1 and N2 are known, the number of HBA1 and HBA2 can be judged based on R1, R2 and R3. In view of the reported copy number variation of HBA1 and HBA2 genes, the theoretical parameter tables of Rl, R2, and R3 corresponding to various copy number variations were established (Table 4), and each value was converted to Mahalanobis distance. Form a Markov distance set; calculate the direct distance of each sample of Rl, R2, R3 and the Mahalanobis distance by transforming the mahalanobis distance, and select the shortest distance by multi-discriminant analysis; the shortest according to Rl, R2, R3 The distance is determined by the copy number of HBA1 and HBA2; when one of R1, R2, and R3 does not match the judgment result of the other two distance values, the Bayesian prior value is used to adjust, and the corresponding R1, R2, or R3 is modified. P value, recalculate the distance, and finally judge the result.
Figure imgf000014_0001
由此, 本发明提供的一种基于新一代测序平台检测 ( 珠蛋白基因拷贝数的检测方 法,能够同时对各种 ( 珠蛋白基因拷贝数变异导致的地贫类型进行检测。其具有成本低, 通量高, 准确率高以及检测过程易于实现自动化等特点。该方法可用于地贫的群体性筛 查, 如婚检、 孕检等。
Figure imgf000014_0001
Thus, the present invention provides a detection method based on a next-generation sequencing platform (a globin gene copy number detection method capable of simultaneously detecting various types of thalassemia caused by plaque gene copy number variation. High throughput, high accuracy and easy to automate the detection process. This method can be used for mass screening of thalassemia, such as marriage and pregnancy tests.
此外,本发明所采用的方法同时适用于各种 β珠蛋白基因缺失及其他具有相似模式 基因拷贝数变异的检测。 用于确定核酸样本中 α珠蛋白基因拷贝数的系统 根据本发明的又一方面,本发明还提供一种用于确定核酸样本中 Ct珠蛋白基因拷贝 数的系统。根据本发明的实施例,用于确定 ( 珠蛋白基因拷贝数的系统包括:扩增装置, 所述扩增装置用于对所述核酸样本进行扩增, 以便得到扩增产物; 文库构建装置, 所述 文库构建装置与所述扩增装置相连, 并且适于针对所述扩增产物, 构建测序文库; 测序 装置, 所述测序装置与所述文库构建装置相连, 并且适于对所述测序文库进行测序, 以 便得到测序结果, 所述测序结果由多个测序数据构成; 分析装置, 所述分析装置与所述 测序装置相连, 并且适于: 确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数据; 以及基于所述 ( 珠蛋白基因的测序数据的数目,确定所述核酸样本中 ( 珠蛋白基因的拷 贝数。 In addition, the methods employed in the present invention are equally applicable to the detection of various beta globin gene deletions and other gene copy number variations with similar patterns. System for determining the copy number of alpha globin gene in a nucleic acid sample According to still another aspect of the present invention, the present invention also provides a system for determining a copy number of a Ct globin gene in a nucleic acid sample. According to an embodiment of the present invention, a system for determining (a globin gene copy number includes: an amplification device for amplifying the nucleic acid sample to obtain an amplification product; a library construction device, The library construction device is coupled to the amplification device and is adapted to construct a sequencing library for the amplification product; a sequencing device, the sequencing device is coupled to the library construction device, and is adapted to the sequencing library Sequencing to obtain sequencing results, the sequencing results consisting of a plurality of sequencing data; an analysis device, the analysis device being coupled to the sequencing device, and adapted to: determine that the sequencing result is from the (globin) Sequencing data of the gene; and determining the copy number of the globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene.
根据本发明的一个具体示例,用于确定 e 珠蛋白基因拷贝数的系统进一步包括核酸 样本分离装置, 所述核酸样本分离装置适于从对象的血浆、 血清、 全血和口腔脱落细胞 的至少一种分离核酸样本。 其中( 珠蛋白基因为选自 HBA1基因和 HBA2基因的至少 一种。  According to a specific example of the present invention, the system for determining the copy number of the e globin gene further comprises a nucleic acid sample separation device adapted to at least one of plasma, serum, whole blood, and oral exfoliated cells from the subject A nucleic acid sample is isolated. Wherein the globin gene is at least one selected from the group consisting of an HBA1 gene and an HBA2 gene.
根据本发明的一个具体示例, 扩增装置中进一步设置有特异性引物组, 其中, 所述 特异性引物组包含第一引物和第二引物, 所述第一引物具有 SEQ ID NO: 1所示的核苷 酸序列, 所述第二引物具有如 SEQ ID NO: 2所示的核苷酸序列。  According to a specific example of the present invention, the amplification device is further provided with a specific primer set, wherein the specific primer set comprises a first primer and a second primer, the first primer having the SEQ ID NO: 1 a nucleotide sequence, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2.
根据本发明的一个具体示例, 第一引物和第二引物的至少之一的 5'端进一步含有 标签序列, 所述标签序列为选自 SEQ ID NO: 5-100的至少之一所示的核苷酸序列。  According to a specific example of the present invention, the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being a core selected from at least one of SEQ ID NOs: 5-100 Glycosidic acid sequence.
根据本发明的一个具体示例, 所述测序装置为选自 Hiseq2000、 SOLID, 454和单 分子测序装置的至少一种。  According to a specific example of the present invention, the sequencing device is at least one selected from the group consisting of Hiseq2000, SOLID, 454, and a single molecule sequencing device.
根据本发明的一个具体示例, 所述特异性引物组进一步包括第三引物和第四引物, 其中, 所述第三引物和第四引物对于内参基因是特异性的, 并且所述分析装置适于确定 所述测序结果中来自于所述内参基因的测序数据。  According to a specific example of the present invention, the specific primer set further includes a third primer and a fourth primer, wherein the third primer and the fourth primer are specific for an internal reference gene, and the analysis device is adapted Sequencing data from the internal reference gene in the sequencing results is determined.
根据本发明的一个具体示例,所述内参基因是 FLNB,所述第三引物具有如 SEQ ID NO: 3所示的核苷酸序列, 所述第四引物具有如 SEQ ID NO:4所示的核苷酸序列。  According to a specific example of the present invention, the internal reference gene is FLNB, the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, and the fourth primer has the sequence shown in SEQ ID NO: Nucleotide sequence.
根据本发明的一个具体示例, 所述第三引物和第四引物的至少之一的 5'端进一步 含有标签序列,所述标签序列为选自 SEQ ID NO: 5-100的至少之一所示的核苷酸序列。  According to a specific example of the present invention, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being at least one selected from the group consisting of SEQ ID NOs: 5-100 Nucleotide sequence.
根据本发明的一个具体示例,所述分析装置适于通过将所述测序结果与参照序列进 行比对而确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数据。  According to a specific example of the present invention, the analyzing device is adapted to determine sequencing data derived from the (globulin gene) in the sequencing result by comparing the sequencing result with a reference sequence.
根据本发明的一个具体示例,所述分析装置适于通过下列步骤确定所述核酸样本中 α珠蛋白基因的拷贝数: 对测序结果中来自于 α珠蛋白基因的测序数据进行计数, 得到 数值 Η; 对测序结果中来自于内参基因的测序数据进行计数, 得到数值 C; 计算所述数 值 Η和 C的比值, 得到第一参数 H/C, 并将所述第一参数与第一参照值进行比较; 以 及基于所述第一参数与所述第一参照值的比例,确定所述核酸样本中 ( 珠蛋白基因的拷 贝数。 According to a specific example of the present invention, the analyzing device is adapted to determine the copy number of the α- globin gene in the nucleic acid sample by: counting the sequencing data from the α-globin gene in the sequencing result to obtain a numerical value Sequencing the sequencing data from the internal reference gene in the sequencing result to obtain a value C; calculating the ratio of the value Η to C, obtaining a first parameter H/C, and performing the first parameter with the first reference value Comparing; and determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value.
根据本发明的一个具体示例, 所述 ( 珠蛋白基因为 HBA1和 ΗΒΑ2, 所述测序结果 中来自于所述 HBA1的测序数据数目为 Hl, 所述测序结果中来自于所述 HBA2的测序 数据数目为 H2, 其中, 所述分析装置适于通过下列步骤确定所述核酸样本中( 珠蛋白 基因的拷贝数; 计算所述数值 H2和 HI的比值, 得到第二参数 H2/H1 , 并将所述第二 参数与第二参照值进行比较; 以及基于所述第二参数与所述第二参照值的比例, 确定所 述核酸样本中( 珠蛋白基因的拷贝数。 According to a specific example of the present invention, the globin gene is HBA1 and ΗΒΑ2, and the sequencing result is The number of sequencing data from the HBA1 is H1, and the number of sequencing data from the HBA2 in the sequencing result is H2, wherein the analysis device is adapted to determine the nucleic acid sample by the following steps (globin a copy number of the gene; calculating a ratio of the value H2 and HI to obtain a second parameter H2/H1, and comparing the second parameter with a second reference value; and based on the second parameter and the second The ratio of the reference values determines the copy number of the globin gene in the nucleic acid sample.
PCR 扩增过程分为指数扩增期、 线性扩增期和平台期三个阶段, 其中指数扩增期 PCR产物量与 PCR起始模板量成线性相关。利用实时荧光定量 PCR仪对目的基因拷贝 数未知样本 (以下简称未知样本) 中的目的基因和内参基因在指数扩增期时 PCR产物 量比值 (目的基因 /内参基因) 进行实时检测, 检测结果与已知样本 (目的基因拷贝数 已知的样本) 在指数扩增期的 PCR产物量比值进行比较, 就可以得到各个待测样本相 对已知样本的目的基因含量, 这就是定量 PCR相对定量的原理。  The PCR amplification process is divided into three stages: exponential amplification phase, linear amplification phase and plateau phase. The amount of PCR product in the exponential amplification phase is linearly related to the amount of PCR starting template. The real-time quantitative PCR method is used to detect the ratio of the PCR product amount (target gene/internal reference gene) in the exponential amplification phase of the target gene and the reference gene in the unknown sample of the target gene (hereinafter referred to as the unknown sample), and the detection result is Known samples (samples with known copy number of target genes) In comparison with the ratio of the amount of PCR products in the exponential amplification period, the target gene content of each sample to be tested relative to the known sample can be obtained, which is the principle of quantitative quantification of quantitative PCR. .
荧光定量 PCR是以 PCR反应过程中实时累计的荧光信号强度变化来反映 PCR产 物量的变化, 即荧光信号强度与 PCR产物量的多少成线性相关。 实验研究证明当测序 读数(reads )达到一定深度时, 以 Hiseq为代表的新一代测序技术的测序起始模板量与 最终获得的测序 reads数成正比。  Fluorescence quantitative PCR is a change in the amount of PCR product reflected by the real-time accumulated fluorescence signal intensity during the PCR reaction, that is, the fluorescence signal intensity is linearly related to the amount of PCR product. Experimental studies have shown that when the sequencing reads reach a certain depth, the sequencing start template amount of the next-generation sequencing technology represented by Hiseq is directly proportional to the number of sequencing reads obtained.
基于新一代测序技术的测序起始模板量与最终获得的测序 reads 数成正比和定量 PCR相对定量的原理, 本发明利用新一代测序技术 Hiseq-2000平台, 实现了对 ct地贫 基因 (HBA1 , HBA2 ) 拷贝数的高通量、 低成本的准确检测。  Based on the principle of next-generation sequencing technology, the amount of sequencing starting template is directly proportional to the number of sequencing reads obtained and the relative quantification of quantitative PCR. The present invention utilizes the next-generation sequencing technology Hiseq-2000 platform to realize the ct thalassemia gene (HBA1, HBA2) High-throughput, low-cost accurate detection of copy numbers.
利用本发明的用于确定核酸样本中 ( 珠蛋白基因拷贝数的系统,能够有效地实施前 述用于确定核酸样本中( 珠蛋白基因拷贝数的方法。关于用于确定核酸样本中( 珠蛋白 基因拷贝数的方法, 所描述的特征和优点, 同样适用用于确定核酸样本中 e 珠蛋白基因 拷贝数的系统, 不再赘述。 下面通过具体的实施例, 对本发明进行说明, 需要说明的是这些实施例仅仅是为了 说明目的, 而不能以任何方式解释成对本发明的限制。 实施例 1 :  The method for determining a copy number of a globin gene in a nucleic acid sample of the present invention can effectively perform the aforementioned method for determining a copy number of a globin gene in a nucleic acid sample (for a method for determining a nucleic acid sample (globulin gene) The method of copy number, the features and advantages described, the same applies to the system for determining the copy number of the e-globin gene in a nucleic acid sample, and will not be described again. The present invention will be described below by way of specific examples, which need to be described. The examples are for illustrative purposes only and are not to be construed as limiting the invention in any way.
采用本发明的技术方案和检测流程对经 Gap-PCR检测后结果已知 (包括拷贝数正 常和异常) 的 950份样本进行检测, 检测结果中有 922个样本与已知结果相符, 结果符 合率为 97.1 %; 对不相符的 28例样本, 采用定量 PCR方法对 HBA1和 HBA2基因进行 定量, 同时采用特异性引物 PCR 检测两种中国人常见的 a 珠蛋白基因多拷贝类型 Anti-3.7和 Anti-4.2。 结果表明, 两种方法验证的结果与本发明检测结果一致, 表明本 发明的技术能够准确检测出待测样本中 HBA1和 HBA2基因拷贝数情况, 具有高通量、 低成本和准确等优势。 具体实施按以下步骤操作:  Using the technical scheme and detection procedure of the present invention, 950 samples with known results (including normal copy number and abnormality) after Gap-PCR detection were detected, and 922 samples in the test results were consistent with known results, and the result coincidence rate 97.1%; For the 28 samples that did not match, the HBA1 and HBA2 genes were quantified by quantitative PCR, and the specific primer PCR was used to detect the multi-copy type Anti-3.7 and Anti- of two common Chinese a-globin genes. 4.2. The results show that the results of the two methods are consistent with the detection results of the present invention, indicating that the technology of the present invention can accurately detect the copy number of HBA1 and HBA2 genes in the sample to be tested, and has the advantages of high throughput, low cost and accuracy. The specific implementation is as follows:
1. 样本提取  Sample extraction
采用磁珠法从外周血中自动化提取 DNA,每批实验提取 94个样本, 2个阴性对照, 要求 DNA浓度大于 30ng^l,体积为 100μ1, 260/280为 1.8-2.0。 The magnetic beads method was used to automatically extract DNA from peripheral blood. 94 samples were taken from each batch of experiments, and 2 negative controls were used. The DNA concentration is required to be greater than 30 ng^l, the volume is 100 μl, and the 260/280 is 1.8-2.0.
使用 KingFisher自动提取仪从 952 (其中 1个为正常对照, 1个为阳性对照) 血样 中提取 DNA。 主要步骤如下: 取出 3个 Kingfisher自动提取仪配套的深孔板及 1个浅 孔板, 根据说明书分别加入一定量配套的试剂并标记, 将所有已加好试剂的孔板按要求 置于相应的位置, 选定程序" Bioeasy_200ul BloodDNA_KF.msz"程序, 按下" star"执行该 程序进行核酸提取。 程序结束后收集 plate Elution中的 Ι ΟΟμΙ左右的洗脱产物即为提取 的 DNA, 作为下一步 PCR中的模板。  DNA was extracted from 956 (one of which was a normal control and one of which was a positive control) using a KingFisher automatic extractor. The main steps are as follows: Take out the deep hole plate and one shallow hole plate of the three Kingfisher automatic extractor. Add a certain amount of matching reagents according to the instructions and mark them. Place all the well plates with the reagents as required. Position, select the program "Bioeasy_200ul BloodDNA_KF.msz", press "star" to execute the program for nucleic acid extraction. At the end of the program, the eluted product of Ι ΙμΙ in the plate Elution was collected as the extracted DNA as a template for the next PCR.
2. PCR扩增  2. PCR amplification
将 96套标签引物对应到 96孔 PCR反应板中, 每批实验平行进行待测样本、 正常 对照和阳性对照的检测, 即每套标签引物每批实验时需要进行至少 3个以上的 PCR扩 增: 一个或多个待测样本、 一个正常对照和一个阳性对照。  96 sets of label primers are corresponding to the 96-well PCR reaction plate, and each batch of experiments is tested in parallel with the test sample, the normal control and the positive control, that is, each set of label primers requires at least 3 PCR amplifications per batch of experiments. : One or more samples to be tested, one normal control and one positive control.
把样本提取步骤中所得的 952份 DNA依次编号 1-952 (其中 951为正常对照, 952 为阳性对照), 用 96套 HBA和 Control的标签引物(表 1、 2、 3 )分别扩增 952份 DNA 样本, 其中第 96套标签引物为不添加模板的阴性对照。 PCR反应在 96孔板中进行, 其中设计 1板正常对照 (951号样本, 一个样本进行 96个反应)和阳性对照 (952号样 本, 一个样本进行 96个反应) , 共 12板, 编号分别为 Q1至 Q10, 正常对照为 N, 阳 性对照为 P, 其中 Q1对应 1-95号模板, Q2对应 96-190号模板, 如此按顺序对应命名, 且每板设计一个不添加模板的阴性对照, 实验的同时, 记录下每个样本对应的引物标签 编号。  The 952 DNAs obtained in the sample extraction step were numbered 1-952 (where 951 was the normal control and 952 was the positive control), and 96 sets of HBA and Control tag primers (Tables 1, 2, 3) were used to amplify 952 copies. DNA samples, of which the 96th set of primers were negative controls without template addition. The PCR reaction was carried out in a 96-well plate, in which a normal control (sample No. 951, one sample for 96 reactions) and a positive control (sample 952, one sample for 96 reactions) were designed, a total of 12 plates, numbered Q1 to Q10, the normal control is N, and the positive control is P, where Q1 corresponds to template 1-95 and Q2 corresponds to template 96-190, so named in order, and each plate is designed with a negative control without template added. At the same time, record the primer label number corresponding to each sample.
PCR反应体系如下:  The PCR reaction system is as follows:
Figure imgf000017_0001
Figure imgf000017_0001
PCR程序如下:  The PCR procedure is as follows:
95 °C l Omin  95 °C l Omin
95 °C 30s→60°C lmin (24个循环)  95 °C 30s→60°C lmin (24 cycles)
15 °C oo  15 °C oo
PCR反应在 Bio-Rad公司的 PTC-200 PCR仪上运行。 PCR完成后, 取 2μ1 PCR产 物经 2.0%的琼脂糖凝胶电泳检测, 如图 7 ) 。  The PCR reaction was run on a Bio-Rad PTC-200 PCR machine. After the PCR was completed, the 2μ1 PCR product was detected by 2.0% agarose gel electrophoresis, as shown in Fig. 7).
3. PCR产物混合和纯化  3. PCR product mixing and purification
经 96套标签引物 PCR扩增得到的产物进行混合, 将每一组样本 (一块 96孔 PCR 反应板) 的 PCR产物 (其引物标签各不相同) 分别混合到一个 EP管中, 经纯化。 从 Q l - 10、 N和 P这 12板剩余的 PCR产物中, 每板每个孔各取 15μ1混合做好对应 标记的 2ml EP管中 (此步骤为 pooling ) , 震荡混匀, 从中各取 1250μ1 pooling产物经 Qiagen DNA Purification kit过柱纯化 (具体纯化步骤详见说明书) , 纯化所得的 37.5 μ1 产物,经 Nanodrop 8000(Thermo Fisher Scientific公司)测定这 12板产物纯化后产物浓度, 数值如表 5所示。 The products obtained by PCR amplification of 96 sets of label primers were mixed, and each set of samples (one 96-well PCR) The PCR products of the reaction plate (the primer labels are different) were separately mixed into an EP tube and purified. From the remaining PCR products of 12 plates of Q l - 10, N and P, take 15μ1 of each well in each plate and mix them into 2ml EP tubes corresponding to the mark (this step is pooling), shake and mix, and take each from each. The 1250μ1 pooling product was purified by Qiagen DNA Purification kit (see the instructions for specific purification steps), and the obtained 37.5 μl product was purified. The concentration of the product after purification of the 12-plate product was determined by Nanodrop 8000 (Thermo Fisher Scientific). Shown.
表 5 PCR-Pooling产物纯化后 OD值  Table 5 OD values after purification of PCR-Pooling products
Figure imgf000018_0001
Figure imgf000018_0001
4. Illumina Hiseq文库构建 4. Illumina Hiseq library construction
将纯化后的产物按照新一代测序技术的文库制备流程进行文库构建, 确定文库 Raw-cluster密度, 保证对照内参基因平均测序深度达 1000乘以上, 然后上机测序。  The purified product was constructed according to the library preparation process of the next-generation sequencing technology, and the density of the raw-cluster of the library was determined, and the average sequencing depth of the reference internal reference gene was ensured to be more than 1000 times, and then sequenced on the machine.
4.1末端修复反应  4.1 end repair reaction
取 2μ§纯化产物稀释至终体积为 37.5 L后, 进行末端修复反应, 体系如下 (试剂 均购自 Enzymatics公司) : After 2 μ § of the purified product was diluted to a final volume of 37.5 L, the end-repair reaction was carried out, and the system was as follows (reagents were purchased from Enzymatics):
Figure imgf000018_0002
总体积 50 μL
Figure imgf000018_0002
Total volume 50 μL
反应条件为: Thermomixer20°C温浴 30 min。  The reaction conditions were: Thermomixer 20 ° C warm bath for 30 min.
反应产物经 Qiagen DNA Purification Kit回收纯化, 溶于 32 μΐ的 EB中  The reaction product was recovered and purified by Qiagen DNA Purification Kit and dissolved in 32 μL of EB.
4.2 3 '末端加 A反应  4.2 3 'end plus A reaction
Figure imgf000019_0001
Figure imgf000019_0001
反应条件为: Thermomixer 37°C温浴 30 min。  The reaction conditions were: Thermomixer 37 ° C warm bath for 30 min.
反应产物经 Qiagen DNA Purification Kit ( QIAGEN公司) 回收纯化, 溶于 38 μΐ的 EB中。  The reaction product was recovered and purified by Qiagen DNA Purification Kit (QIAGEN) and dissolved in 38 μM of EB.
4.3连接 Illumina Hiseq接头 ( adaptor)  4.3 connection Illumina Hiseq connector ( adaptor)
12管 DNA分别加 12种不同的文库标签, 并记录下文库标签和文库的对应关系。 体系如下 (试剂均购自 Illumina公司) :  12 tubes of DNA were added with 12 different library tags, and the correspondence between library tags and libraries was recorded. The system is as follows (reagents are all purchased from Illumina):
Figure imgf000019_0002
Figure imgf000019_0002
反应条件为: Thermomixer 16°C温浴 16h。  The reaction conditions were: Thermomixer 16 ° C warm bath for 16 h.
反应产物经 60μ1 Ampure Beads(Beckman Coulter Genomics)纯化后溶于 30 μL去离 子水, 经荧光定量 PCR ( QPCR) 检测到文库浓度结果如表 6 :  The reaction product was purified by 60 μl Ampure Beads (Beckman Coulter Genomics) and dissolved in 30 μL of deionized water. The library concentration was determined by real-time PCR (QPCR) as shown in Table 6:
表 6 QPCR定量检测文库的相对浓度  Table 6 QPCR quantitative detection of the relative concentration of the library
Figure imgf000019_0003
Figure imgf000020_0001
Figure imgf000019_0003
Figure imgf000020_0001
5. Hiseq2000测序 5. Hiseq2000 sequencing
以 QPCR所测浓度为准, 12个文库各取 lOnmol混合, 并稀释至 5pmol进行 Hiseq 2000 SE-50程序测序, Raw-cluster密度为 250万, 具体操作流程详见 Hiseq2000操作说 明书。  Based on the concentration measured by QPCR, 12 libraries were mixed with lOnmol and diluted to 5 pmol for sequencing of Hiseq 2000 SE-50. The density of Raw-cluster was 2.5 million. The specific operation procedure is detailed in the Hiseq2000 operating instructions.
6.结果分析  6. Analysis of results
下机数据按照上述 e 珠蛋白基因拷贝数分析流程进行拷贝数分析 (图 6 ) , 整个过 程由计算机自动完成。 所得结果与已知结果 97.3%相符, 对结果不相符的样本采用定量 PCR及 (X珠蛋白基因多拷贝型别 (Anti-3.7和 Anti-4.2) 的特异型引物 PCR对不相符的 样本进行验证, 验证结果见图 8和图 9。 其中, 图 8显示了结果不符样本进行 HBA1和 HBA2定量 PCR检测结果, 如图 8所示, 其中 Normal为正常对照, 包含 2个 HBA1和 2个 HBA2基因, 柱状图高低表示待测样本相对于 Normal的拷贝数比值, 其中 0.5表 示拷贝数是 Normal的 1/2,1表示拷贝数与 Normal相同, 1.5表示为拷贝数是 Normal的 1.5倍。图 9显示了结果不符样本进行 Anti-3.7和 Anti-4.2特异性引物 PCR检测的结果, 如图 9所示,其中左图为 Anti-3.7检测结果,右图为 Anti-4.2检测结果, Anti-3.7和 Anti-3.7 表示一条染色体上存在 2个 HBA1或 HBA2基因, 有条带表示存在该多拷贝变异。 以 上的验证结果 (图 8、 图 9) 与本方法检测的结果一致, 表明本研究方法比传统的检测 方法具有更大的优越性。 前 60个样本结果如下:  The down-line data was subjected to copy number analysis according to the e-globin gene copy number analysis procedure described above (Fig. 6), and the entire process was automatically performed by a computer. The results were consistent with the known results of 97.3%. The samples with inconsistent results were validated by quantitative PCR and specific primer PCR of multiple copies of the X-globin gene (Anti-3.7 and Anti-4.2). The results of the verification are shown in Fig. 8 and Fig. 9. Among them, Fig. 8 shows the results of quantitative PCR detection of HBA1 and HBA2 in the case of non-compliance samples, as shown in Fig. 8, where Normal is a normal control and contains 2 HBA1 and 2 HBA2 genes. The height of the histogram indicates the copy number ratio of the sample to be tested relative to Normal, where 0.5 indicates that the copy number is 1/2 of Normal, 1 indicates that the copy number is the same as Normal, and 1.5 indicates that the copy number is 1.5 times Normal. Figure 9 shows The results were not consistent with the results of PCR detection of Anti-3.7 and Anti-4.2 specific primers, as shown in Figure 9, where the left image shows Anti-3.7 results, the right image shows Anti-4.2 results, Anti-3.7 and Anti- 3.7 indicates that there are two HBA1 or HBA2 genes on one chromosome, and there is a band indicating that the multi-copy variation exists. The above verification results (Fig. 8, Fig. 9) are consistent with the results of the method. Method out this more advantageous than the conventional method for detecting the first 60 sample results as follows:
Figure imgf000020_0002
Figure imgf000021_0001
Figure imgf000020_0002
Figure imgf000021_0001
£ZZ080/£lOZ l3/13d ■9ΐε請 ΪΟΖ OAV S44 2 1 -α3'7/αα 1£ZZ080/£lOZ l3/13d ■9ΐεPlease ΪΟΖ OAV S44 2 1 -α 3 ' 7 /αα 1
S45 2 1 -α3'7/αα 1 S45 2 1 -α 3 ' 7 /αα 1
S46 2 1 -α3'7/αα 1 S46 2 1 -α 3 ' 7 /αα 1
S47 2 1 -α3'7/αα 1 S47 2 1 -α 3 ' 7 /αα 1
S48 2 -α3 7/αα -a3 7/aaaanti3-7 S48 2 -α 3 7 /αα -a 3 7 /aaa anti3 - 7
S49 2 1 -α3'7/αα 1 S49 2 1 -α 3 ' 7 /αα 1
S50 2 1 -α3'7/αα 1 S50 2 1 -α 3 ' 7 /αα 1
S51 2 1 -α3'7/αα 1 S51 2 1 -α 3 ' 7 /αα 1
S52 2 1 -α3'7/αα 1 S52 2 1 -α 3 ' 7 /αα 1
S53 2 1 -α3'7/αα 1 S53 2 1 -α 3 ' 7 /αα 1
S54 2 1 -α3'7/αα 1 S54 2 1 -α 3 ' 7 /αα 1
S55 2 1 -α3'7/αα 1 S55 2 1 -α 3 ' 7 /αα 1
S56 2 1 -α4'2/αα 1 S56 2 1 -α 4 ' 2 /αα 1
S57 2 1 -α4'2/αα 1 S57 2 1 -α 4 ' 2 /αα 1
S58 2 1 -α4'2/αα 1 S58 2 1 -α 4 ' 2 /αα 1
S59 2 1 -α4'2/αα 1 S59 2 1 -α 4 ' 2 /αα 1
S60 2 1 -α4'2/αα 1 S60 2 1 -α 4 ' 2 /αα 1
注: 表中文字进行加粗标记处理的的 4例样本为与 GapPCR结果不一致的样本。 工业实用性  Note: The 4 samples in the table with bold labeling are samples that are inconsistent with the GapPCR results. Industrial applicability
本发明的检测 e 珠蛋白基因拷贝数的方法和系统,能够有效地应用于检测核酸样品中 的 ( 珠蛋白基因拷贝数, 并且获得的检测结果通量高, 准确率高。 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。  The method and system for detecting the copy number of an e globin gene of the present invention can be effectively applied to detecting a copy number of a globin gene in a nucleic acid sample, and the obtained test result has a high throughput and high accuracy. Although the present invention is specific The embodiments have been described in detail, and those skilled in the art will understand that various modifications and changes can be made in the details of the present invention, all of which are within the scope of the present invention. The scope is set forth by the appended claims and their claims.
在本说明书的描述中, 参考术语"一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例"等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。  In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

权利要求书 Claim
1、 一种确定核酸样本中 ct珠蛋白基因拷贝数的方法, 其特征在于, 包括: A method for determining a copy number of a ct globin gene in a nucleic acid sample, comprising:
对所述核酸样本进行扩增, 以便得到扩增产物;  Amplifying the nucleic acid sample to obtain an amplification product;
针对所述扩增产物, 构建测序文库;  Constructing a sequencing library for the amplification product;
对所述测序文库进行测序, 以便得到测序结果, 所述测序结果由多个测序数据构成; 确定所述测序结果中来自于所述 ct珠蛋白基因的测序数据; 以及  Sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; determining sequencing data from the ct globin gene in the sequencing result;
基于所述 ( 珠蛋白基因的测序数据的数目, 确定所述核酸样本中( 珠蛋白基因的拷贝 数。  Based on the number of sequencing data of the globin gene, the copy number of the globin gene is determined in the nucleic acid sample.
2、 根据权利要求 1所述的方法, 其特征在于, 所述核酸样本是从对象的血浆、 血清、 全血和口腔脱落细胞的至少一种分离的,  2. The method according to claim 1, wherein the nucleic acid sample is separated from at least one of plasma, serum, whole blood, and oral exfoliated cells of the subject,
任选地, 所述对象为哺乳动物, 优选人,  Optionally, the subject is a mammal, preferably a human,
任选地, 所述 α珠蛋白基因为选自 HBA1基因和 ΗΒΑ2基因的至少一种,  Optionally, the α-globin gene is at least one selected from the group consisting of an HBA1 gene and a ΗΒΑ2 gene.
任选地, 使用特异性引物组对所述核酸样本进行扩增, 其中, 所述特异性引物组包含 第一引物和第二引物, 所述第一引物具有 SEQ ID NO:l所示的核苷酸序列, 所述第二引物 具有如 SEQ ID NO:2所示的核苷酸序列,  Optionally, the nucleic acid sample is amplified using a specific primer set, wherein the specific primer set comprises a first primer and a second primer, the first primer having the core represented by SEQ ID NO: a nucleotide sequence, the second primer having the nucleotide sequence set forth in SEQ ID NO: 2,
任选地, 所述第一引物和第二引物的至少之一的 5'端进一步含有标签序列, 所述标签 序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列,  Optionally, the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 Sequence,
任选地, 利用选自 Hiseq2000、 SOLID, 454和单分子测序装置的至少一种进行所述测 序。  Optionally, the sequencing is performed using at least one selected from the group consisting of Hiseq 2000, SOLID, 454 and a single molecule sequencing device.
3、 根据权利要求 2所述的方法, 其特征在于, 所述特异性引物组进一步包括第三引物 和第四引物,  3. The method according to claim 2, wherein the specific primer set further comprises a third primer and a fourth primer,
其中, 所述第三引物和第四引物对于内参基因是特异性的,  Wherein the third primer and the fourth primer are specific to the internal reference gene,
并且进一步包括: 确定所述测序结果中来自所述内参基因的测序数据,  And further comprising: determining sequencing data from the internal reference gene in the sequencing result,
任选地, 所述内参基因是 FLNB, 所述第三引物具有如 SEQ ID NO: 3所示的核苷酸序 列, 所述第四引物具有如 SEQ ID NO: 4所示的核苷酸序列,  Optionally, the internal reference gene is FLNB, the third primer has the nucleotide sequence set forth in SEQ ID NO: 3, and the fourth primer has the nucleotide sequence set forth in SEQ ID NO: ,
任选地, 所述第三引物和第四引物的至少之一的 5'端进一步含有标签序列, 所述标签 序列为选自 SEQ ID NO: 5-100的至少之一所示的核苷酸序列,  Optionally, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 Sequence,
任选地, 确定所述测序结果中来自于所述 ct珠蛋白基因的测序数据是通过将所述测序 结果与参照序列进行比对而得到的。  Optionally, sequencing data from the ct globin gene in the sequencing result is determined by aligning the sequencing result with a reference sequence.
4、 根据权利要求 3所述的方法, 其特征在于, 基于所述 ct珠蛋白基因的测序数据的数 目, 确定所述核酸样本中 α珠蛋白基因的拷贝数进一步包括:  4. The method according to claim 3, wherein determining the copy number of the α-globin gene in the nucleic acid sample based on the number of sequencing data of the ct globin gene further comprises:
对测序结果中来自于 ct珠蛋白基因的测序数据进行计数, 得到数值 Η;  Sequencing data from the ct globin gene in the sequencing result is counted to obtain a value Η;
对测序结果中来自于内参基因的测序数据进行计数, 得到数值 C;  Sequencing data from the internal reference gene in the sequencing result is counted to obtain a value C;
计算所述数值 Η和 C的比值, 得到第一参数 H/C, 并将所述第一参数与第一参照值进 行比较; 以及 基于所述第一参数与所述第一参照值的比例, 确定所述核酸样本中 ( 珠蛋白基因的拷 贝数, Calculating a ratio of the values Η and C, obtaining a first parameter H/C, and comparing the first parameter with a first reference value; Determining a copy number of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value,
任选地, 所述第一参照值是针对来自已知 ( 珠蛋白基因拷贝数的个体的核酸样本进行 平行实验而得到的第一参数,  Optionally, the first reference value is a first parameter obtained by performing parallel experiments on nucleic acid samples from individuals known to be globin gene copy numbers,
任选地, 所述第一参照值是针对来自正常个体的核酸样本进行平行实验而得到的第一 参数,  Optionally, the first reference value is a first parameter obtained by performing parallel experiments on nucleic acid samples from a normal individual,
任选地, 所述 α珠蛋白基因为 HBA1和 ΗΒΑ2 , 所述测序结果中来自于所述 HBA1的 测序数据数目为 HI, 所述测序结果中来自于所述 HBA2的测序数据数目为 H2,  Optionally, the alpha globin gene is HBA1 and ΗΒΑ2, and the number of sequencing data from the HBA1 in the sequencing result is HI, and the number of sequencing data from the HBA2 in the sequencing result is H2,
其中, 基于所述 ( 珠蛋白基因的测序数据的数目, 确定所述核酸样本中( 珠蛋白基因 的拷贝数进一步包括:  Wherein, based on the number of sequencing data of the globin gene, determining the copy number of the globin gene further comprises:
计算所述数值 H2和 HI的比值, 得到第二参数 H2/H1 , 并将所述第二参数与第二参照 值进行比较; 以及  Calculating a ratio of the values H2 and HI, obtaining a second parameter H2/H1, and comparing the second parameter with a second reference value;
基于所述第二参数与所述第二参照值的比例, 确定所述核酸样本中 ( 珠蛋白基因的拷 贝数,  Determining the number of copies of the globin gene in the nucleic acid sample based on a ratio of the second parameter to the second reference value,
任选地, 所述第二参照值是针对来自已知 a珠蛋白基因拷贝数的个体的核酸样本进行 平行实验而得到的第二参数。  Optionally, the second reference value is a second parameter obtained by performing parallel experiments on nucleic acid samples from individuals of known a-globin gene copy numbers.
5、 一种引物组合物, 其特征在于, 包含第一引物和第二引物, 所述第一引物具有 SEQ ID NO: l所示的核苷酸序列, 所述第二引物具有如 SEQ ID NO:2所示的核苷酸序列,  A primer composition comprising a first primer having a nucleotide sequence of SEQ ID NO: l and a second primer having SEQ ID NO: The nucleotide sequence shown in 2,
任选地, 所述第一引物和第二引物的至少之一的 5 '端进一步含有标签序列, 所述标签 序列为选自 SEQ ID NO : 5-100的至少之一所示的核苷酸序列,  Optionally, the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NO: 5-100 Sequence,
任选地, 其特征在于, 进一步包括第三引物和第四引物,  Optionally, further comprising a third primer and a fourth primer,
其中, 所述第三引物具有如 SEQ ID NO: 3所示的核苷酸序列, 所述第四引物具有如 SEQ ID NO:4所示的核苷酸序列,  Wherein the third primer has a nucleotide sequence as shown in SEQ ID NO: 3, and the fourth primer has a nucleotide sequence as shown in SEQ ID NO:
任选地, 所述第三引物和第四引物的至少之一的 5 '端进一步含有标签序列, 所述序列 为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列。  Optionally, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the sequence being a nucleotide sequence selected from at least one of SEQ ID NOs: 5-100 .
6、 一种标签组合物, 其特征在于, 由 SEQ ID NO:5-100所示的标签构成。  A label composition comprising a label represented by SEQ ID NO: 5-100.
7、 权利要求 5所述的引物组合物在确定核酸样本中 a珠蛋白基因拷贝数中的用途。 7. Use of the primer composition of claim 5 for determining the copy number of a globin gene in a nucleic acid sample.
8、 一种确定核酸样本中 ( 珠蛋白基因拷贝数的系统, 其特征在于, 包括: 8. A system for determining a copy number of a globin gene in a nucleic acid sample, characterized by comprising:
扩增装置, 所述扩增装置用于对所述核酸样本进行扩增, 以便得到扩增产物; 文库构建装置, 所述文库构建装置与所述扩增装置相连, 并且适于针对所述扩增产物, 构建测序文库;  An amplification device for amplifying the nucleic acid sample to obtain an amplification product; a library construction device, the library construction device being coupled to the amplification device, and adapted to Adding products, constructing a sequencing library;
测序装置, 所述测序装置与所述文库构建装置相连, 并且适于对所述测序文库进行测 序, 以便得到测序结果, 所述测序结果由多个测序数据构成;  a sequencing device, the sequencing device being coupled to the library construction device, and adapted to sequence the sequencing library to obtain a sequencing result, the sequencing result being composed of a plurality of sequencing data;
分析装置, 所述分析装置与所述测序装置相连, 并且适于:  An analysis device, the analysis device being coupled to the sequencing device and adapted to:
确定所述测序结果中来自于所述 e 珠蛋白基因的测序数据; 以及  Determining sequencing data from the e-globin gene in the sequencing result;
基于所述 ( 珠蛋白基因的测序数据的数目, 确定所述核酸样本中( 珠蛋白基因的拷贝 数。 Determining a copy of the globin gene in the nucleic acid sample based on the number of sequencing data of the globin gene Number.
9、 根据权利要求 8所述的系统, 其特征在于, 进一步包括核酸样本分离装置, 所述核 酸样本分离装置适于从对象的血浆、 血清、 全血和口腔脱落细胞的至少一种分离核酸样本, 任选地, 所述 α珠蛋白基因为选自 HBA1基因和 ΗΒΑ2基因的至少一种,  9. The system according to claim 8, further comprising a nucleic acid sample separation device adapted to separate nucleic acid samples from at least one of plasma, serum, whole blood and oral exfoliated cells of the subject Optionally, the α-globin gene is at least one selected from the group consisting of an HBA1 gene and a ΗΒΑ2 gene.
任选地, 所述扩增装置中设置有特异性引物组,  Optionally, a specific primer set is disposed in the amplification device,
其中, 所述特异性引物组包含第一引物和第二引物, 所述第一引物具有 SEQ ID ΝΟ:1 所示的核苷酸序列, 所述第二引物具有如 SEQ ID NO:2所示的核苷酸序列,  Wherein the specific primer set comprises a first primer having a nucleotide sequence of SEQ ID: 1 and a second primer having the nucleotide sequence shown in SEQ ID NO: 2 Nucleotide sequence,
任选地, 所述第一引物和第二引物的至少之一的 5'端进一步含有标签序列, 所述标签 序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列,  Optionally, the 5' end of at least one of the first primer and the second primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 Sequence,
任选地, 其特征在于, 所述测序装置为选自 Hiseq2000、 SOLID, 454和单分子测序装 置的至少一种,  Optionally, the sequencing device is at least one selected from the group consisting of Hiseq2000, SOLID, 454 and single molecule sequencing devices,
任选地, 所述特异性引物组进一步包括第三引物和第四引物,  Optionally, the specific primer set further comprises a third primer and a fourth primer,
其中, 所述第三引物和第四引物对于内参基因是特异性的,  Wherein the third primer and the fourth primer are specific to the internal reference gene,
并且所述分析装置适于确定所述测序结果中来自于所述内参基因的测序数据, 任选地, 所述内参基因是 FLNB, 所述第三引物具有如 SEQ ID N0:3所示的核苷酸序 列, 所述第四引物具有如 SEQ ID N0:4所示的核苷酸序列,  And the analyzing device is adapted to determine sequencing data from the internal reference gene in the sequencing result, optionally, the internal reference gene is FLNB, and the third primer has a core as shown in SEQ ID NO: a nucleotide sequence, the fourth primer having a nucleotide sequence as shown in SEQ ID NO:
任选地, 所述第三引物和第四引物的至少之一的 5'端进一步含有标签序列, 所述标签 序列为选自 SEQ ID NO:5-100的至少之一所示的核苷酸序列。  Optionally, the 5' end of at least one of the third primer and the fourth primer further comprises a tag sequence, the tag sequence being a nucleotide selected from at least one of SEQ ID NOs: 5-100 sequence.
10、 根据权利要求 8所述的系统, 其特征在于, 所述分析装置适于通过将所述测序结 果与参照序列进行对比而确定所述测序结果中来自于所述 ( 珠蛋白基因的测序数据,  10. The system according to claim 8, wherein said analyzing means is adapted to determine said sequencing result from said sequencing data of said globin gene by comparing said sequencing result with a reference sequence ,
任选地, 所述分析装置适于通过下列步骤确定所述核酸样本中 ( 珠蛋白基因的拷贝数: 对测序结果中来自于 ( 珠蛋白基因的测序数据进行计数, 得到数值 H;  Optionally, the analysis device is adapted to determine in the nucleic acid sample by the following steps (copy number of globin gene: counting the sequencing data from the globin gene in the sequencing result to obtain a value H;
对测序结果中来自内参基因的测序数据进行计数, 得到数值 C;  Sequencing the sequencing data from the internal reference gene in the sequencing result to obtain a value C;
计算所述数值 H和 C的比值, 得到第一参数 H/C, 并将所述第一参数与第一参照值进 行比较; 以及  Calculating a ratio of the values H and C, obtaining a first parameter H/C, and comparing the first parameter with the first reference value;
基于所述第一参数与所述第一参照值的比例, 确定所述核酸样本中 ( 珠蛋白基因的拷 贝数,  Determining the number of copies of the globin gene in the nucleic acid sample based on a ratio of the first parameter to the first reference value,
任选地, 所述 α珠蛋白基因为 HBA1和 ΗΒΑ2, 所述测序结果中来自所述 ΗΒΑ2的测 序数据数目为 Η2,  Optionally, the alpha globin genes are HBA1 and ΗΒΑ2, and the number of sequence data from the ΗΒΑ2 in the sequencing result is Η2,
其中,  among them,
所述分析装置适于通过下列步骤确定所述核酸样本中 ( 珠蛋白基因的拷贝数; 计算所述数值 Η2和 HI的比值, 得到第二参数 H2/H1 , 并将所述第二参数与第二参照 值进行比较; 以及  The analyzing device is adapted to determine in the nucleic acid sample (copy number of the globin gene; calculate a ratio of the value Η2 and HI, obtain a second parameter H2/H1, and obtain the second parameter Comparing the two reference values;
基于所述第二参数与所述第二参照值的比例, 确定所述核酸样本中 ( 珠蛋白基因的拷 贝数。  Based on the ratio of the second parameter to the second reference value, the number of copies of the globin gene in the nucleic acid sample is determined.
PCT/CN2013/080223 2012-08-06 2013-07-26 METHOD AND SYSTEM FOR DETECTING α-GLOBIN GENE COPY NUMBER WO2014023167A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210277141.9A CN102952877B (en) 2012-08-06 2012-08-06 Method and system for detecting alpha-globin gene copy number
CN201210277141.9 2012-08-06

Publications (1)

Publication Number Publication Date
WO2014023167A1 true WO2014023167A1 (en) 2014-02-13

Family

ID=47762258

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/080223 WO2014023167A1 (en) 2012-08-06 2013-07-26 METHOD AND SYSTEM FOR DETECTING α-GLOBIN GENE COPY NUMBER

Country Status (2)

Country Link
CN (1) CN102952877B (en)
WO (1) WO2014023167A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3289502A4 (en) * 2014-12-29 2018-09-12 Counsyl, Inc. Method for determining genotypes in regions of high homology
CN111051529A (en) * 2017-08-04 2020-04-21 十亿至一公司 Homologous genomic regions for characterization related to biological targets
US11519024B2 (en) 2017-08-04 2022-12-06 Billiontoone, Inc. Homologous genomic regions for characterization associated with biological targets

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102952877B (en) * 2012-08-06 2014-09-24 深圳华大基因研究院 Method and system for detecting alpha-globin gene copy number
CN104560697A (en) * 2015-01-26 2015-04-29 上海美吉生物医药科技有限公司 Detection device for instability of genome copy number
CN104694384B (en) * 2015-03-20 2017-02-08 上海美吉生物医药科技有限公司 Mitochondrial DNA copy index variability detecting device
CN110268044B (en) * 2017-03-07 2022-08-02 深圳华大生命科学研究院 Method and device for detecting chromosome variation
US20190287646A1 (en) * 2018-03-13 2019-09-19 Grail, Inc. Identifying copy number aberrations
CN108715891B (en) * 2018-05-31 2021-09-24 福建农林大学 Expression quantification method and system for transcriptome data
CN108796054A (en) * 2018-09-14 2018-11-13 华大生物科技(武汉)有限公司 Kit and its application for detecting thalassemia genic mutation type and deletion form simultaneously
CN109584957B (en) * 2019-01-21 2020-04-17 明码(上海)生物科技有限公司 Detection kit for capturing α thalassemia related gene copy number
CN110511988A (en) * 2019-07-20 2019-11-29 河南科技学院 The identification method of LMW-GS gene copy number in wheat Plant Genome based on PacBio sequencing
CN110791554A (en) * 2019-11-07 2020-02-14 上海韦翰斯生物医药科技有限公司 Quantitative method of trace DNA
CN110724731A (en) * 2019-11-22 2020-01-24 上海冰缘医疗科技有限公司 Method for adding internal reference quantity of nucleic acid copy number in multiplex PCR system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102409088A (en) * 2011-09-22 2012-04-11 郭奇伟 Method for detecting gene copy number variation
CN102605088A (en) * 2012-03-30 2012-07-25 南方医科大学 Method for rapidly detecting copy number variation of alpha-globin gene cluster
CN102952877A (en) * 2012-08-06 2013-03-06 深圳华大基因研究院 Method and system for detecting alpha-globin gene copy number

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102220411A (en) * 2010-04-16 2011-10-19 中山大学达安基因股份有限公司 Kit for integrated detection of alpha and beta mutant type thalassemias
CN102409045B (en) * 2010-09-21 2013-09-18 深圳华大基因科技服务有限公司 Tag library constructing method based on DNA (deoxyribonucleic acid) adapter connection as well as used tag and tag adapter
CN102409049B (en) * 2010-09-21 2013-10-23 深圳华大基因科技服务有限公司 DNA(deoxyribonucleic acid) index library building method based on PCR (polymerase chain reaction)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102409088A (en) * 2011-09-22 2012-04-11 郭奇伟 Method for detecting gene copy number variation
CN102605088A (en) * 2012-03-30 2012-07-25 南方医科大学 Method for rapidly detecting copy number variation of alpha-globin gene cluster
CN102952877A (en) * 2012-08-06 2013-03-06 深圳华大基因研究院 Method and system for detecting alpha-globin gene copy number

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3289502A4 (en) * 2014-12-29 2018-09-12 Counsyl, Inc. Method for determining genotypes in regions of high homology
CN111051529A (en) * 2017-08-04 2020-04-21 十亿至一公司 Homologous genomic regions for characterization related to biological targets
EP3658688A4 (en) * 2017-08-04 2021-06-30 BillionToOne, Inc. Homologous genomic regions for characterization associated with biological targets
US11519024B2 (en) 2017-08-04 2022-12-06 Billiontoone, Inc. Homologous genomic regions for characterization associated with biological targets
US11646100B2 (en) 2017-08-04 2023-05-09 Billiontoone, Inc. Target-associated molecules for characterization associated with biological targets

Also Published As

Publication number Publication date
CN102952877B (en) 2014-09-24
CN102952877A (en) 2013-03-06

Similar Documents

Publication Publication Date Title
WO2014023167A1 (en) METHOD AND SYSTEM FOR DETECTING α-GLOBIN GENE COPY NUMBER
Logsdon et al. Long-read human genome sequencing and its applications
TWI793586B (en) Single-molecule sequencing of plasma dna
Evrony et al. Cell lineage analysis in human brain using endogenous retroelements
CN107190329B (en) Fusion based on DNA is quantitatively sequenced and builds library, detection method and its application
WO2019114146A1 (en) Method for enriching gene target regions and library construction kit
CN106591441B (en) Alpha and/or beta-thalassemia mutation detection probe, method and chip based on whole gene capture sequencing and application
CN111440896B (en) Novel beta coronavirus variation detection method, probe and kit
JP2018514205A (en) Prediction method of rejection of organ transplantation using next-generation nucleotide sequence analysis technique
EP2828218A1 (en) Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
CN111052249B (en) Methods of determining predetermined chromosome conservation regions, methods of determining whether copy number variation exists in a sample genome, systems, and computer readable media
CN111979307B (en) Targeted sequencing method for detecting gene fusion
US20210095393A1 (en) Method for preparing amplicon library for detecting low-frequency mutation of target gene
CN113337639B (en) Method for detecting COVID-19 based on mNGS and application thereof
WO2018184495A1 (en) Method for constructing amplicon library through one-step process
WO2015042980A1 (en) Method, system, and computer-readable medium for determining snp information in a predetermined chromosomal region
CN110527714B (en) Method for detecting integration site of HPV in host genome
CN110846408A (en) Primer combination for detecting TTN gene mutation and application thereof
CN108103143B (en) Method for constructing multiple PCR and rapid library in target region
TW201321520A (en) Method and system for virus detection
CN112442530B (en) Method for detecting CAH related true and false gene
CN112795654A (en) Method and kit for organism fusion gene detection and fusion abundance quantification
CN105838826B (en) Double-color fluorescent PCR primer, probe and method for rapidly distinguishing canine parvovirus vaccine strain and wild strain
CN115948607B (en) Method and kit for simultaneously detecting multiple pathogen genes
CN111172248B (en) General kit for verifying copy number variation based on fragment analysis technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13828215

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 26/06/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13828215

Country of ref document: EP

Kind code of ref document: A1