CN117925808A

CN117925808A - Method, kit and system for determining gene copy number

Info

Publication number: CN117925808A
Application number: CN202211318904.XA
Authority: CN
Inventors: 孙田依; 何逖
Original assignee: Shanghai Jikai Medical Laboratory Co ltd
Current assignee: Shanghai Jikai Medical Laboratory Co ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2024-04-26

Abstract

The invention relates to a method for determining gene copy number, a kit and application thereof. Wherein the method comprises: (1) Amplifying the target gene, the reference gene and the competing substrates of the target gene and the reference gene by using an amplification primer to obtain an amplification product; (2) Carrying out an extension reaction on the amplification product obtained in the step (1) by using an extension primer to obtain an extension product; (3) Mass spectrum detection is carried out on the extension product in the step (2) by using a mass spectrometer through a matrix-assisted laser desorption ionization time-of-flight mass spectrometry technology, and the copy number of the target gene is calculated through the peak heights of all detection sites; wherein the reference genes comprise at least three genes without CNV variation.

Description

Method, kit and system for determining gene copy number

Technical Field

The invention relates to a method for detecting genes by using a time-of-flight mass spectrometry nucleic acid analysis method, which can be used for diagnosing and screening rare genetic diseases such as human spinal muscular atrophy, du's muscular dystrophy and individual drug metabolism differences and can detect copy number variation of genes such as SMN1, SMN2, CYP2D6, DMD and the like.

Background

The basic principle of the Massary technical system is a matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) technology, and the Massary ARRAY technical system has extremely high specificity and sensitivity. The system uses matrix assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectrometry to accurately detect DNA molecules. Genetic variation is distinguished by analyzing the individual mass, eliminating the need for fluorescence or labeling.

CYP2D6 is an enzyme coded by human CYP2D6 genes, is mainly expressed in the liver, is one of important members of CYP enzyme families, is one of enzymes most related to drug metabolism, and takes part in 20% -25% of the metabolic processes of drugs including antidepressants, antiarrhythmics, antipsychotics and the like, although accounting for 2% -9% of the total amount of liver enzymes. CYP2D6 has a high degree of polymorphism, a number of allelic variations of which have been found to either lead to an increase or decrease in enzymatic activity, and its metabolism into four types, ultrafast metabolism (ultrarapid metabolizers, UMs), fast metabolism (extensive metabolizers, EMs), medium metabolism (INTERMEDIATE METABOLIZERS, IMs) and slow metabolism (poormetabolizers, PMs), which widely exist genetic polymorphisms that are responsible for differences in drug metabolism among individuals.

Du's muscular dystrophy DMD (Duchenne Muscular Dystrophy), also known as pseudohypertrophic muscular dystrophy. DMD gene is located at xp21.2-3, the largest gene known to humans, and has the main role of producing anti-dystrophin protein (Dystrophin), which is expressed predominantly on the inner face of skeletal and cardiac membrane, a cytoskeletal protein, which has the main role of stabilizing and protecting muscle fibers. DMD is transmitted through a transmission linked recessive genotype, since the causative gene is located on the X chromosome, whereas men have only one X chromosome, one genetic mutation is sufficient to cause disease; women have two X chromosomes, and when two copies of the gene are mutated at the same time, the female X chromosomes carry two pathogenic genes rarely, so men are more prone to Du muscular dystrophy than women. About 400-500 cases of DMD infants are born every year in China, and about 7 thousands of people are diagnosed as DMD, so that the infant DMD is one of the most number of the patients in the world.

SMA (spinal muscular atrophy ) is an autosomal recessive inherited progressive motor neuron disease, characterized primarily by progressive degeneration of spinal cord anterior horn cells and brain stem motor brain nuclei. Clinically, progressive, symmetric muscle weakness and atrophy are seen with proximal ends being heavier than distal ends and lower limbs being heavier than upper limbs. The patient accounts for about 1/10000 in newborns, and the pathogenic gene carrier accounts for about 1/50. Type 4, i.e., less than 6 months of onset spinal muscular atrophy type I (spinal muscular atrophy type I, SMN1, also known as Werdnig-Hoffmann disease), 6-18 months of onset spinal muscular atrophy type II (spinal muscular atrophy type II, SMN 2), childhood or adolescence onset spinal muscular atrophy type III (spinal muscular atrophy type III, SMA3, also known as KugelHerg-WELANDER DISEASE), and adult onset spinal muscular atrophy type IV (spinal muscular atrophy type IV, SMA 4) can be classified according to age of onset and clinical manifestations. The SMA pathogenic genes of the 4 subtypes are identical and are motor neuron survival gene 1 (survival motor neuronl, SMN 1) [ OMIM600354]. The SMN1 gene is located on chromosome 5 and has a total length of about 20kb and 9 exons. The SMN2 and SMN1 in its immediate vicinity are highly homologous, differing by only 5 nucleotides. SMN2 is a regulator gene whose copy number is inversely proportional to the severity of SMA disease.

CN110468192A provides a flight time mass spectrum nucleic acid analysis method for detecting SMA gene mutation, and the relative copy number of genes is calculated according to the peak area values of two internal reference and target genes of a sample to be detected and a control sample by quantitatively detecting the copy number of related sequences of SMN1, SMN2, NAIP, H4F5 and GTF2H2 genes. CN111020023a provides a method for accurate quantification of gene copy number by subsequent multi-step correction, detected by massaray in combination with single base extension reaction, competitive PCR (real-competitive PCR) technique. CN114107451a provides a method for simultaneous amplification of a gene of interest and its competing substrates using a pair of amplification primers modified with locked nucleic acids.

The invention directly detects SMN1 and SMN2 genes, does not need to detect the change of NAIP, H4F5 and GTF2H2 genes, calculates copy number by using peak height, introduces 3 internal reference genes and 1 external reference substance, adopts a locked nucleic acid modification to an amplification primer, and adopts a UNG-dUTP anti-pollution system to improve the accuracy of results. The external reference substance is the purchased SMN1 copy number of the normal human gDNA with the confirmation of 2, so that the influence of factors such as inconsistent degradation rates of different samples in the extraction and sample addition processes, volume errors in the sample addition process and the like can be reduced; the reference gene is usually a housekeeping gene which can be stably expressed in a human body, and the difference of absolute template quantity caused by sample degradation or sample addition errors can be reduced as much as possible by calculating the ratio of SMN1 to the reference gene.

Diagnosis and screening of partial SMA is currently possible based on MLPA(multiplex ligation-dependent probe amplification)、RT-PCR(Reverse Transcription-Polymerase Chain Reaction) and NGS (Next-generation Sequencing) technologies, but there are also limitations in terms of:

1. although MLPA can detect SMN2 copy number of a patient at the same time, the MLPA has high requirement on template quality, the amplified probe is a fluorescent probe, the signal intensity is obtained by collecting fluorescence, the cost is high, signal deviation is easy to cause, adjacent fragments need to be considered when CNV is interpreted, meanwhile, the flux is limited, and the flow is complex.

2. RT-PCR has low requirements on equipment and high detection flow and speed, but the SMN1 point mutation cannot be detected, and the SMN2 copy number of a patient needs to be additionally designed to be detected. The signal interpretation is to use fluorescence signal intensity, and interference of similar sites is required to be eliminated by using methods such as a competitive probe, so that interference is difficult to be effectively eliminated, data deviation is caused, and interpretation difficulty is high. Furthermore, one reaction tube of RT-PCR can only detect one fragment, and multiple detection requires multiple reaction tubes. The difference between the reaction holes is easy to be caused, the cost is low, and the detection flux is limited.

3. The NGS can detect SMA directionally and cover similar myopathy of other related phenotypes at the same time, but can only be used as a primary screen, suspected positive results also need to be verified by MLPA and other methods, and the NGS has low accuracy and reliability and higher cost in SMN2 copy number analysis.

Disclosure of Invention

In view of the above, the invention provides a method and a kit for determining gene copy number by using Massarray, which comprise the detection of genes such as SMN1, SMN2, CYP2D6, DMD and the like, and particularly can realize the simultaneous accurate detection of the copy number of the SMN1 and the SMN2 aiming at the genes of the SMN1 and the SMN2, further increase the template DNA feeding range and facilitate the operation. According to the method provided by the invention, multiple groups of detection can be realized, the detection flux is large, and a large amount of commercial detection requirements can be conveniently realized.

Specifically, the key point of the invention is to design a plurality of amplification primers and extension primers for amplifying and extending target genes, three or more reference genes and competing substrates of the three reference genes and the reference genes to obtain amplification products and extension products, and detect the amplification products and the extension products through mass spectrum to obtain peak heights of detection sites so as to calculate the copy numbers of SMN1 and SMN 2.

In the conventional method for determining the absolute copy number of a gene to be detected by using a single reference or two reference genes, when the single reference experimental result cannot be determined, the copy number of the gene to be detected cannot be accurately determined, or when the 2 reference results are inconsistent, the risk of which reference to select cannot be determined. Therefore, the method of the invention uses 3 internal reference genes without CNV variation, calculates absolute copy number interval value of the gene to be detected by using the average value thereof, can eliminate the risk that the experimental result caused by single internal reference or two internal references cannot be determined or is wrong as much as possible, and can obviously reduce the gray area of data interpretation, so that the result is more accurate. Meanwhile, the template DNA feeding range is enlarged, so that the accurate measurement can be realized in the whole blood sample extraction DNA feeding amount ranging from about 10 ng to about 30ng (NanoDrop 2000 measurement), and the operation is easy.

Furthermore, the method of the invention also adds locked nucleic acid modification on the PCR primer, sets differential bases on the competitive substrate, reduces the amplification efficiency of the competitive substrate, and enables the competitive substrate to be fed and detected in a stable concentration range. Meanwhile, a UNG-dUTP anti-pollution system is adopted to thoroughly eliminate pollution sources and avoid false negative results. Compared with the common Massary, the method adds the competitive substrate of the CNV detection site in the amplification link, and can accurately distinguish the conditions of SMN 1=0, 1 and SMN1 being more than or equal to 2 copies by detecting the peak height ratio with the competitive substrate, so that the detection result is refined, the detection result is more accurate, and the commercial mass detection requirement is conveniently realized.

The term "CNV" as used in the present invention refers to copy number variation.

The term "QC" as used in the present invention is a competing substrate.

The term "IV" as used in the present invention is an absolute copy number interval value.

Specifically, according to a first aspect of the present invention, there is provided a method for determining the copy number of a gene, comprising: 1) Amplifying target genes, reference genes and competing substrates of the target genes and the reference genes by using amplification primers to obtain amplification products; 2) Performing an extension reaction in the amplification product of the step 1) by using an extension primer to obtain an extension product; 3) Mass spectrum detection is carried out on the extension product in the step 2) by using a mass spectrometer through a matrix-assisted laser desorption ionization time-of-flight mass spectrometry technology, and the copy number of the target gene is calculated through the peak height of each detection site; wherein the reference genes comprise at least three genes without CNV variation.

In one embodiment, the gene of interest comprises SMN1, SMN2, CYP2D6, DMD.

In one embodiment, the reference genes include RPP40, ATP7A, and COL4A5.

In one embodiment, the amplification primer is a locked nucleic acid modified amplification primer.

In one embodiment, the amplification reaction of step 1) employs a UNG-dUTP anti-contamination system.

In one embodiment, the amplification primers for the gene of interest are: amplification primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: 1-2.

In one embodiment, the extension primer of the gene of interest is: extension primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: shown at 9.

In one embodiment, the amplification primers for the reference gene are selected from the group consisting of: i) Amplification primer of Intron 6 of RPP40, as shown in SEQ ID NO: 3-4; ii) an amplification primer for Exon 1 of ATP7A, as set forth in SEQ ID NO: 5-6; iii) The amplification primer of Exon 41 of COL4A5 is shown in SEQ ID NO: 7-8.

In one embodiment, the extension primer of the reference gene is selected from the group consisting of: i) Extension primer of Intron 6 of RPP40, as shown in SEQ ID NO:10 is shown in the figure; ii) extension primer of Exon 1 of ATP7A, as shown in SEQ ID NO: 11; iii) Extension primer of Exon 41 of COL4A5, as shown in SEQ ID NO: shown at 12.

In one embodiment, the competing substrate is selected from the group consisting of: i) A competing substrate for SMN1, as set forth in SEQ ID NO: 13; ii) a competing substrate for RPP40, as set forth in SEQ ID NO: 14; iii) A competing substrate for ATP7A, as set forth in SEQ ID NO: 15; iii) a competing substrate for COL4A5, as set forth in SEQ ID NO: shown at 16.

In one embodiment, the formula for calculating SMN1 and SMN2 copy numbers is:

Wherein: h _N-SMN is the peak height of the sample SMN1-2E7 to be detected; h _{N_QC-SMN} is the peak height of the sample SMN1-2E7 corresponding to QC; h _N-RPP40 is the peak height of the sample corresponding to the RPP40 of the sample to be detected; h _{N_QC-RPP40} is the peak height of the sample RPP40 corresponding to QC; h _N-ATP7A is the peak height of the sample ATP7A of the sample to be detected; h _{N_QC-ATP7A} is the peak height of the sample to be detected ATP7A corresponding to QC; h _N-COL4A5 is the peak height of the sample COL4A5 corresponding to the sample to be detected; h _{N_QC-COL4A5} is the peak height of the sample COL4A5 to be detected corresponding to QC; h _gDNA-SMN is the peak height of the sample corresponding to gDNA SMN1-2 E7; h _{gDNA_QC-SMN} is the peak height of gDNA SMN1-2E7 corresponding to QC; h _gDNA-RPP40 is the peak height of the sample corresponding to gDNA RPP 40; h _{gDNA-QC-RPP40} is the peak height of gDNA RPP40 corresponding to QC; h _gDNA-ATP7A is the peak height of the gDNA ATP7A corresponding sample; h _{gDNA_QC-ATP7A} is the peak height of gDNA ATP7A corresponding to QC; h _gDNA-COL4A5 is the peak height of the sample corresponding to gDNA COL4A 5; h _{gDNA_QC-COL4A5} is the peak height of gDNA COL4A5 corresponding to QC.

In one embodiment, the final concentration of the competing substrate of RPP40 is 0.5-8 ng/. Mu.L.

According to a preferred embodiment, the final concentration of competing substrate of RPP40 is 0.6-7.5 ng/. Mu.L.

According to a preferred embodiment, the final concentration of competing substrate of RPP40 is 0.7-7 ng/. Mu.L.

According to a preferred embodiment, the final concentration of competing substrate of RPP40 is 0.8-6.5 ng/. Mu.L.

According to a preferred embodiment, the final concentration of competing substrate of RPP40 is 0.81-6.06 ng/. Mu.L.

In one embodiment, the final concentration of the competing substrate for ATP7A is 1.5-18 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of ATP7A is 1.5-17.5 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of ATP7A is 1.55-17 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of ATP7A is 1.6-16.5 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of ATP7A is 1.63-16.26 ng/. Mu.L.

In one embodiment, the COL4A5 has a final concentration of competitor substrate of 1.5-10 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of COL4A5 is 1.55-9.5 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of COL4A5 is 1.6-9 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of COL4A5 is 1.65-8.5 ng/. Mu.L.

According to a preferred embodiment, the final concentration of the competing substrate of COL4A5 is 1.73-8.49 ng/. Mu.L.

According to a second aspect of the present invention there is provided a kit for determining the copy number of a gene comprising: amplification primers and extension primers for a gene of interest, and competing substrates for the gene of interest; and amplification primers and extension primers for the reference gene, and a competing substrate for the reference gene, wherein the reference gene comprises at least three genes without CNV variation.

In one embodiment, the gene of interest comprises SMN1, SMN2, CYP2D6, DMD.

In one embodiment, the reference genes include RPP40, ATP7A, and COL4A5.

In one embodiment, the kit is for detecting spinal muscular atrophy in a subject.

In one embodiment, the kit is for detecting duchenne muscular dystrophy in a subject.

According to a third aspect of the present invention there is provided a system for determining the copy number of a gene comprising: 1) Amplification module: amplifying the target gene, the reference gene and the competing substrates of the target gene and the reference gene by using an amplification primer to obtain an amplification product; 2) Extension module: performing an extension reaction on the amplification product obtained in the step 1) by using an extension primer to obtain an extension product; 3) Mass spectrometry module: mass spectrum detection is carried out on the extension product in the step 2) by using a mass spectrometer through a matrix-assisted laser desorption ionization time-of-flight mass spectrometry technology, and the copy number of the target gene is calculated through the peak height of each detection site; wherein the reference genes comprise at least three genes without CNV variation.

In one embodiment, the gene of interest comprises SMN1, SMN2, CYP2D6, DMD.

In one embodiment, the reference genes include RPP40, ATP7A, and COL4A5.

In one embodiment, the formula for calculating SMN1 and SMN2 copy numbers is:

Wherein: h _N-SMN is the peak height of the sample SMN1-2E7 to be detected; h _{N_QC-SMN} is the peak height of the sample SMN1-2E7 corresponding to QC; h _N-RPP40 is the peak height of the sample corresponding to the RPP40 of the sample to be detected; h _{N_QC-RPP40} is the peak height of the sample RPP40 corresponding to QC; h _N-ATP7A is the peak height of the sample ATP7A of the sample to be detected; h _{N_QC-ATP7A} is the peak height of the sample to be detected ATP7A corresponding to QC; h _N-COL4A5 is the peak height of the sample COL4A5 corresponding to the sample to be detected; h _{N_QC-COL4A5} is the peak height of the sample COL4A5 to be detected corresponding to QC; h _gDNA-SMN is the peak height of the sample corresponding to gDNA SMN1-2 E7; h _{gDNA_QC-SMN} is the peak height of gDNA SMN1-2E7 corresponding to QC; h _gDNA-RPP40 is the peak height of the sample corresponding to gDNA RPP 40; h _{gDNA-QC-RPP40} is the peak height of gDNA RPP40 corresponding to QC; h _gDNA-ATP7A is the peak height of the gDNA ATP7A corresponding sample; h _{gDNA_QC-ATP7A} is the peak height of gDNA ATP7A corresponding to QC; h _gDNA-COL4A5 is the peak height of the sample corresponding to gDNA COL4A 5; h _{gDNA_QC}-_COL4A5 is the peak height of gDNA COL4A5 corresponding to QC.

According to a fourth aspect of the present invention there is provided the use of the system in the preparation of a kit for detecting rare diseases.

In one embodiment, the rare disorder is spinal muscular atrophy or duchenne muscular dystrophy.

The excellent technical effects of the method, the system and the kit of the invention are mainly that:

1. The absolute copy number of the gene to be detected is calculated by using the average value of 3 CNV-free variation reference genes, so that the problem that experimental results caused by one or two reference genes cannot be determined or are wrong as far as possible can be eliminated, and the results are more accurate. Meanwhile, the influence of other factors besides the copy number of the sample, such as the difference of absolute template quantity caused by sample degradation or sample addition errors, can be eliminated. Furthermore, the gray area of the data interpretation can be significantly reduced relative to a single reference, calculated with a 3-reference average. Furthermore, the template DNA feeding range can be increased, in a single reference system, the problem of crossing and gray areas exists in IV values for DNA samples with different concentrations, for example, 1 copy (carrier) and 2+ copy (normal person) of the DNA sample with the concentration of 10ng/ul and the DNA sample with the concentration of 30ng/ul cannot be distinguished, 3 reference calculation is introduced, the problem of crossing exists between the maximum IV value of 1 copy sample and the minimum IV value of 2+ copy for each concentration of DNA, and the feeding amount of the extracted DNA of the whole blood sample is about 10-30ng (measured by NanoDrop 2000).

2. Because the competitive substrate is artificially synthesized, the competition substrate method is adopted to judge the condition of the self-reference gene of the testee and the human factors in the extraction process are not needed to be relied on, so that the batch-to-batch difference is very small. On the other hand, the sequence of the competitive substrate is almost identical with that of the gene to be detected, the result is easy to be interpreted, and an internal standard method is adopted, namely, the detection signal peak and the signal peak of the object to be detected are carried out in the same amplification reaction with the specimen to be detected, and the detection signal peak and the signal peak of the object to be detected are in one interpretation frame without searching in other interpretation frames. Since the signal values are derived simultaneously, the accuracy of the conversion to the CNV values is more reliable.

3. The anti-pollution system of UNG-dUTP is adopted in the amplification reaction, that is, UNG enzyme is added in the PCR reaction system, dUTP and dTTP are simultaneously added according to a certain proportion, and the effect of preventing aerosol pollution of the PCR product can be achieved. Since the copy number of the PCR product is very large, contamination of the PCR product with an extremely small amount may cause false positive results. The method has the advantage of thoroughly eliminating pollution sources, UNG treatment can be carried out in the same reaction tube with PCR, and the operation is simple and convenient.

4. Because the introduced competitive substrate is the artificial synthetic sequence and the genome DNA are in the same reaction hole, the artificial synthetic sequence has no amplification steric hindrance. By introducing the locked nucleic acid modification into the amplification primer, the amplification efficiency of the artificially synthesized sequence can be reduced, and the stability of competing substrates is improved under the condition that the genome and the genome are regulated to the same copy number.

5. For whole blood specimens with the concentration of more than 10 ng/. Mu.L, the SMA can be well detected by using the Massarray technology, and the method is suitable for common screening. The single base extension is adopted, the difference of the molecular weight among different bases is utilized to distinguish detection sites, the base type of the detected detection site can be directly and accurately detected, and the specificity is strong; and fluorescent probes are not used, so that fluorescent interference of similar sites is avoided, the detection result is accurate, and the cost is low.

Brief description of the drawings

FIGS. 1A-C: the mass spectrum of the detection site on SMN1-2 Exon7, wherein fig. 1A represents the massaray profile of the 0 copy sample (patient), wherein smn1=0, smn2=2, fig. 1B represents the massaray profile of the 1 copy sample (carrier), wherein smn1=1, smn2=2, and fig. 1C represents the massaray profile of the 2+ copy sample (normal), wherein smn1=2, smn2=2.

Fig. 2A-C: the mass spectrum of the detection site on the RPP40 gene, wherein fig. 2A shows the massaray profile of the 2+ copy sample (normal) at 3 different competitive substrate concentrations, wherein smn1=2, smn2=2, RPP40 competitive substrate concentrations are 0.5ng/ul,4.0 ng/ul, 8.0 ng/ul, respectively, fig. 2B shows the massaray profile of the 1 copy sample (carrier) at 3 different competitive substrate concentrations, wherein smn1=1, smn2=2, RPP40 competitive substrate concentrations are 0.5ng/ul,4.0 ng/ul, 8.0 ng/ul, respectively, and fig. 2C shows the massaray profile of the 0 copy sample (patient) at 3 different competitive substrate concentrations, wherein smn1=0, smn2=2, RPP40 competitive substrate concentrations are 0.5ng/ul,4.0 ng/ul, 8.0 ng/ul, respectively.

Fig. 3A-C: the mass spectrum of the detection site on the ATP7A gene, wherein fig. 3A shows the massaray profile of the 2+ copy sample (normal) at 3 different competitive substrate concentrations, wherein smn1=2, smn2=2, atp7a competitive substrate concentrations are 1.5ng/μl,9.0ng/μl,18.0ng/μl, respectively, and fig. 3B shows the massaray profile of the 1 copy sample (carrier) at 3 different competitive substrate concentrations, wherein smn1=1, smn2=2, atp7a competitive substrate concentrations are 1.5ng/μl,9.0ng/μl,18.0ng/μl, respectively, and fig. 3C shows the massaray profile of the 0 copy sample (patient) at 3 different competitive substrate concentrations, wherein smn1=0, smn2=2, atp7a competitive substrate concentrations are 1.5ng/μl,9.0ng/μl,18.0ng/μl, respectively.

Fig. 4A-C: the mass spectrum of the detection site on the COL4A5 gene, wherein fig. 4A shows the massaray profile of the 2+ copy sample (normal) at 3 different competitive substrate concentrations, wherein smn1=2, smn2=2, the col4a5 competitive substrate concentrations are 1.5ng/μl,5.0ng/μl,10.0ng/μl, respectively, and fig. 4B shows the massaray profile of the 1 copy sample (carrier) at 3 different competitive substrate concentrations, wherein smn1=1, smn2=2, the col4a5 competitive substrate concentrations are 1.5ng/μl,5.0ng/μl,10.0ng/μl, respectively, and fig. 4C shows the massaray profile of the 0 copy sample (patient) at 3 different competitive substrate concentrations, wherein smn1=0, smn2=2, the col4a5 substrate concentrations are 1.5ng/μl,5.0ng/μl,10.0ng/μl, respectively.

Fig. 5: the gray area for data interpretation can be significantly reduced by calculating the average value of 3 internal references relative to the single internal reference.

Fig. 6: compared with the MLPA of the gold standard, the method of Massary has higher sensitivity, specificity and yin-yang judgment rate compared with the ROC curve of the MLPA of the Massary.

Detailed Description

The following detailed description of the preferred embodiments of the application, taken in conjunction with the accompanying drawings, is given by way of illustration and not limitation, and any other similar situations are intended to fall within the scope of the application.

Example 1: design of primers and competitor substrates

In this example, according to the pathological characteristics of SMA disease and the sequence information of SMN1 and SMN2 genes, amplification primers, competitive substrates and extension primers for the corresponding fragments of the 7 th exon of SMN1, SMN2, and amplification primers, competitive substrates and extension primers for the RPP40, ATP7A and COL4A5 genes as internal reference genes were designed.

Specifically, the information of the amplification primers designed and used is shown in Table 1. Wherein [ X ] represents that the base is modified by a locked nucleic acid, and X may be A, T, C or G. The purpose of PCR is to obtain target DNA.

Table 1: amplification primer information table

Information on the extension primers designed and used is shown in Table 2 below. The extension primers were taken from a portion of the PCR amplified sequence, and the purpose of MassArray was to detect copy number variation of SMA.

Table 2: extension primer information table

The information on the competing substrates designed and used is shown in table 3 below.

Table 3: competitive substrate information table

Wherein, for example, t represents that the sequence of the competing substrate is different from the target gene sequence at the base t position, and the different base introduced is a genotype which does not appear on the human gene.

The sequence information of the amplified products of the target gene and the competing substrate is shown in Table 4 below:

Table 4: amplification product sequence information table

Wherein, the black bolded is PCR amplified primer sequence, the italic is extended primer sequence, the underlined bolded is detection site or extended base, and the bolded is the extension product of sample DNA and competitive substrate respectively.

Example 2: determination of copy number of SMN1 and SMN2 Using MassArray

This example uses the primers and competitor substrates described in example 1 to make a MassArray assay for the copy number of SMN1 and SMN 2.

2.1 Preparation of reaction Mixed solution

The PCR primer mixture, QC MIX reaction mixture, PCR reaction mixture, SAP reaction mixture, extension primer mixture, and extension reaction mixture used in this example were prepared as shown in tables 5 to 10 below.

Table 5: PCR primer mixture preparation

Table 6: preparing QC MIX reaction mixture, and respectively preparing 3 concentrations of competing substrates

Table 6-1:

table 6-2:

table 6-3:

Table 7: PCR reaction mixture preparation

Table 8: SAP reaction mixture configuration

ddH2O	1.53μL
		SAP Buffer	0.17μL
SAP Enzyme	0.3μL
		Total amount of	2μL

Table 9: extension primer iPLEX Primer MIX configuration table:

Table 10: extension reaction mixture configuration

2.2 Genomic DNA acquisition

Fresh or frozen anticoagulated blood samples are used for extracting genome DNA by using a CRICHO magnetic bead method blood genome DNA extraction kit.

GDNA was a2 copy control, human Genomic DNA purchased from Promega, cat# G1471.

The SMN1 gene, the SMN2 gene, the RPP40 gene and the ATP7A, COL A5 gene are all 2 copies.

2.3 Multiplex amplification Engles

2.3.1PCR amplification Link

Mixing the mixed liquid of the PCR reaction prepared by 2.1, split charging the mixed liquid into the corresponding reaction holes of 384-hole plates, split charging 4 mu L of each hole, and then adding 1 mu L of sample to be detected. Each test required 2 copies of control gDNA and blank water. Sticking a film, placing the film on a gene amplification instrument after transient centrifugation, and amplifying according to the following PCR program:

2.3.2SAP purification procedure

To each well of PCR amplification reaction product after 3.1PCR was added 2. Mu.L of SAP reaction mixture (2.1 configuration), attached to a film, slightly centrifuged, and placed on a gene amplification apparatus for purification according to the following SAP procedure:

37℃	40min
		85℃	5min
8℃	Hold

2.3.3 extension reaction Link

To each well of SAP purification reaction product from 3.2SAP was added 2. Mu.L of extension reaction mixture (2.1 configuration), applied a film, and after transient centrifugation, placed on a gene amplification apparatus, and extended according to the following extension reaction procedure:

2.4 desalination and Mass Spectrometry

After the extension reaction procedure was completed, the pellet was centrifuged transiently. Adding 16 mu L of sterilized double distilled water into each hole, sealing the film, and performing instantaneous centrifugation; plaiting, placing 384 plates and chips, and typing.

2.5 Data analysis and detection result interpretation

2.5.1SNP determination of genotype results at detection sites

Xml file, path View-PLATE DATA PANE of the original file exported by instrument matching software Typer 4.0.0; the Call column results are SNP detection locus genotype results.

2.5.2 Analysis of copy number detection results

Xml file, path View-PLATE DATA PANE of original file exported by instrument matched software Typer 4.0;

calculated by peak HEIGHT (heighth) of each detection site.

The calculation formula is as follows:

Wherein:

H _N-SMN is the peak height of the sample SMN1-2 E7 to be detected;

H _{N_QC-SMN} is the peak height of the sample SMN1-2 E7 corresponding to QC;

h _N-RPP40 is the peak height of the sample corresponding to the RPP40 of the sample to be detected;

h _{N_QC-RPP40} is the peak height of the sample RPP40 corresponding to QC;

H _N-ATP7A is the peak height of the sample ATP7A of the sample to be detected;

H _{N_QC-ATP7A} is the peak height of the sample to be detected ATP7A corresponding to QC;

H _N-COL4A5 is the peak height of the sample COL4A5 corresponding to the sample to be detected;

h _{N_QC-COL4A5} is the peak height of the sample COL4A5 to be detected corresponding to QC;

h _gDNA-SMN is the peak height of the sample corresponding to gDNA SMN1-2 E7;

h _{gDNA_QC-SMN} is the peak height of gDNA SMN1-2 E7 corresponding to QC;

h _gDNA-RPP40 is the peak height of the sample corresponding to gDNA RPP 40;

h _{gDNA-QC-RPP40} is the peak height of gDNA RPP40 corresponding to QC;

h _gDNA-ATP7A is the peak height of the gDNA ATP7A corresponding sample;

H _{gDNA_QC-ATP7A} is the peak height of gDNA ATP7A corresponding to QC;

H _gDNA-COL4A5 is the peak height of the sample corresponding to gDNA COL4A 5;

H _{gDNA_QC-COL4A5} is the peak height of gDNA COL4A5 corresponding to QC.

The numerical interpretation ranges are as follows:

Results: the mass spectra of detection sites on the SMN1-2 Exon7, RPP40 and ATP7A, COL A5 genes are shown in figures 1-4, and as can be seen from the figures, the reaction systems of 3 concentrations of competing substrates can effectively distinguish 0 copy (patient), 1 copy (carrier) and 2+ copy (normal person); the 98 samples SMN1 and SMN2 copy number results related thereto were tested by massaray method and compared with the MLPA gold standard method, as shown in table 11 below.

Table 11: comparison of detection results of Massary method and MLPA gold-labeled method

To evaluate the sensitivity, specificity and yin-yang judging rate of the massaray method of the present invention compared with the gold standard MLPA, the detection results of the copy numbers of the samples SMN1 and SMN2 of 53 cases as shown in table 12 were increased, the sensitivity, specificity and yin-yang judging rate of the massaray method as shown in table 13 were obtained by analyzing the detection results of the samples of 151 cases (including the 98 cases related to table 11 and the 53 cases related to table 12), and the ROC curve of the massaray method and the gold standard MLPA comparison as shown in fig. 6 was further drawn, and as can be seen from table 13 and fig. 6, the sensitivity, specificity and yin-yang judging rate of the massaray method were all close to the gold standard MLPA.

Table 12: comparison of detection results of Massary method and MLPA gold-labeled method

Table 13: sensitivity, specificity and yin-yang judgment rate of Massary method

Index (I)	Numerical value
		Sensitivity of	96％
Specificity (specificity)	100％
		Positive judgment rate	100％
Negative judgment rate	98％

Example 3: comparative experiments with single reference RPP40 and multiple references

In this embodiment, the sample experimental data corresponding to table 11 is used to calculate the single internal reference RPP40 IV value, so as to obtain the copy number.

calculated by peak HEIGHT (heighth) of each detection site.

The calculation formula is as follows:

Wherein:

H _N-SMN is the peak height of the sample SMN1-2 E7 to be detected;

H _{N_QC-SMN} is the peak height of the sample SMN1-2 E7 corresponding to QC;

h _{N_QC-RPP40} is the peak height of the sample RPP40 corresponding to QC;

h _gDNA-SMN is the peak height of the sample corresponding to gDNA SMN1-2 E7;

h _{gDNA_QC-SMN} is the peak height of gDNA SMN1-2 E7 corresponding to QC;

h _gDNA-RPP40 is the peak height of the sample corresponding to gDNA RPP 40;

h _{gDNA-QC-RPP40} is the peak height of gDNA RPP40 corresponding to QC;

Results: as shown in FIG. 5, the maximum IV value of the positive sample and the minimum IV value of the negative sample are marked by dotted lines respectively, and it can be seen from the graph that the single internal reference RPP40 has the problems of crossing and gray areas for different concentrations of DNA samples, for example, the problem that 1 mu L of DNA is fed, the concentration of 10 ng/. Mu.L of DNA sample cannot be distinguished from the 1 copy (carrier) and the 2+ copy (normal person) of DNA sample with the concentration of 30 ng/. Mu.L, three internal reference calculation is introduced, the problem that the maximum IV value of 1 copy sample and the minimum IV value of 2+ copy have no crossing is solved, the total blood sample extraction DNA feeding amount can be detected in the range of about 10-30ng (NanoDrop 2000 measurement), and experimental results show that the gray area for data reading can be obviously reduced by using the three internal reference average calculation compared with the single internal reference calculation, the problem that the IV value has crossing for different concentrations of DNA samples in the single internal reference system is solved, and the template DNA feeding range can be enlarged.

Therefore, the method of the invention introduces 3 internal reference genes as references, can reduce the difference of absolute template quantity caused by sample degradation or sample addition errors as much as possible, increases the template DNA feeding range, and is easy to operate. In addition, by adding locked nucleic acid modification on the PCR primer, differential bases are arranged on the competitive substrate, so that the amplification efficiency of the competitive substrate is reduced, and the competitive substrate can be fed and detected in a stable concentration range. The competitive PCR technology can eliminate environmental differences such as temperature, pressure and the like among different reaction holes, and the detection result is more accurate; the application of the Massary technology can realize the detection of multiple PCR products in a single hole, and has simple operation and lower cost. The combination of the two can finish different groups of detection in different reaction holes of PCR, can realize multiple groups of detection at the same time, has large detection flux, and is convenient for realizing commercial mass detection requirements. Compared with the common Massary, the method adds the competitive substrate of the CNV detection site in the amplification link, and detects the peak height ratio through the competitive substrate, so that the conditions of SMN1=0, 1, SMN1 not less than 2, SMN2=0, 1 and SMN2 not less than 2 copy numbers can be accurately distinguished, and the detection result can be refined, so that the detection result is more accurate.

It should be understood that while the present invention has been described by way of example in terms of its preferred embodiments, it is not limited to the above embodiments, but is capable of numerous modifications and variations by those skilled in the art. The primers, reaction conditions, etc., used in the amplification and extension reactions may be adjusted and varied accordingly to the particular needs. It will thus be appreciated that those skilled in the art will be able to devise numerous alternative arrangements which, although not explicitly described herein, embody the principles of the invention and are included within its spirit and scope.

Claims

1. A method of determining gene copy number comprising:

1) Amplifying the target gene, the reference gene and the competing substrates of the target gene and the reference gene by using an amplification primer to obtain an amplification product;

2) Carrying out an extension reaction on the amplification product obtained in the step 1) by using an extension primer to obtain an extension product;

3) Mass spectrum detection is carried out on the extension product in the step 2) by using a mass spectrometer through a matrix-assisted laser desorption ionization time-of-flight mass spectrometry technology, and the copy number of the target gene is calculated through the peak height of each detection site;

wherein the reference genes comprise at least three genes without CNV variation.

2. The method of claim 1, wherein the gene of interest comprises SMN1, SMN2, CYP2D6, DMD.

3. The method of claim 1 or 2, wherein the reference genes comprise RPP40, ATP7A and COL4A5.

4. The method of any one of claims 1-3, wherein the amplification primer is a locked nucleic acid modified amplification primer.

5. The method of any one of claims 1-4, wherein the amplification reaction of step 1) employs a UNG-dUTP anti-contamination system.

6. The method of any one of claims 1-5, wherein the amplification primers for the gene of interest are: amplification primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: 1-2.

7. The method of any one of claims 1-6, wherein the extension primer of the gene of interest is: extension primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: shown at 9.

8. The method of any one of claims 1-7, wherein the amplification primers of the reference gene are selected from the group consisting of:

i) Amplification primer of Intron 6 of RPP40, as shown in SEQ ID NO: 3-4;

ii) an amplification primer for Exon 1 of ATP7A, as set forth in SEQ ID NO: 5-6;

iii) The amplification primer of Exon 41 of COL4A5 is shown in SEQ ID NO: 7-8.

9. The method of any one of claims 1-8, wherein the extension primer of the reference gene is selected from the group consisting of:

i) Extension primer of Intron 6 of RPP40, as shown in SEQ ID NO:10 is shown in the figure;

ii) extension primer of Exon 1 of ATP7A, as shown in SEQ ID NO: 11;

iii) Extension primer of Exon 41 of COL4A5, as shown in SEQ ID NO: shown at 12.

10. The method of any one of claims 1-9, wherein the competing substrate is selected from the group consisting of:

i) Competing substrates of SMN1 and SMN2, as set forth in SEQ ID NO: 13;

ii) a competing substrate for RPP40, as set forth in SEQ ID NO: 14;

iii) A competing substrate for ATP7A, as set forth in SEQ ID NO: 15;

iii) a competing substrate for COL4A5, as set forth in SEQ ID NO: shown at 16.

11. The method of any of claims 3-10, wherein the formula for calculating SMN1 and SMN2 copy numbers is:

Wherein:

H _N-SMN is the peak height of the sample SMN1-2 E7 to be detected;

H _{N_QC-SMN} is the peak height of the sample SMN1-2 E7 corresponding to QC;

h _{N_QC-RPP40} is the peak height of the sample RPP40 corresponding to QC;

H _N-ATP7A is the peak height of the sample ATP7A of the sample to be detected;

h _gDNA-SMN is the peak height of the sample corresponding to gDNA SMN1-2 E7;

h _{gDNA_QC-SMN} is the peak height of gDNA SMN1-2 E7 corresponding to QC;

h _gDNA-RPP40 is the peak height of the sample corresponding to gDNA RPP 40;

h _{gDNA-QC-RPP40} is the peak height of gDNA RPP40 corresponding to QC;

h _gDNA-ATP7A is the peak height of the gDNA ATP7A corresponding sample;

H _{gDNA_QC-ATP7A} is the peak height of gDNA ATP7A corresponding to QC;

H _gDNA-COL4A5 is the peak height of the sample corresponding to gDNA COL4A 5;

H _{gDNA_QC-COL4A5} is the peak height of gDNA COL4A5 corresponding to QC.

12. The method of any one of claims 3-11, wherein the final concentration of competing substrate for RPP40 is 0.5-8ng/μl.

13. The method of any one of claims 3-12, wherein the final concentration of the competing substrate for ATP7A is 1.5-18ng/μl.

14. The method of any one of claims 3-13, wherein the final concentration of competing substrate of COL4A5 is 1.5-10ng/μl.

15. A kit for determining the copy number of a gene comprising:

Amplification primers and extension primers for a gene of interest, and competing substrates for the gene of interest; and amplification primers and extension primers for the reference gene, and a competing substrate for the reference gene, wherein the reference gene comprises at least three genes without CNV variation.

16. The kit of claim 15, wherein the gene of interest comprises SMN1, SMN2, CYP2D6, DMD.

17. The kit of claim 15 or 16, wherein the reference genes comprise RPP40, ATP7A and COL4A5.

18. The kit of any one of claims 15-17, wherein the amplification primer is a locked nucleic acid modified amplification primer.

19. The kit of any one of claims 15-18, wherein the amplification primers for the gene of interest are: amplification primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: 1-2.

20. The kit of any one of claims 15-19, wherein the extension primer of the gene of interest is: extension primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: shown at 9.

21. Kit according to any one of claims 15 to 20, wherein the amplification primers of the reference gene are selected from the following:

i) Amplification primer of Intron 6 of RPP40, as shown in SEQ ID NO: 3-4;

iii) The amplification primer of Exon 41 of COL4A5 is shown in SEQ ID NO: 7-8.

22. Kit according to any one of claims 15 to 21, wherein the extension primer of the reference gene is selected from the group consisting of:

ii) extension primer of Exon 1 of ATP7A, as shown in SEQ ID NO: 11;

iii) Extension primer of Exon 41 of COL4A5, as shown in SEQ ID NO: shown at 12.

23. The kit of any one of claims 15-22, wherein the competing substrate is selected from the group consisting of:

i) Competing substrates of SMN1 and SMN2, as set forth in SEQ ID NO: 13;

ii) a competing substrate for RPP40, as set forth in SEQ ID NO: 14;

iii) A competing substrate for ATP7A, as set forth in SEQ ID NO: 15;

iii) a competing substrate for COL4A5, as set forth in SEQ ID NO: shown at 16.

24. The kit of any one of claims 15-23 for use in detecting spinal muscular atrophy in a subject.

25. The kit of any one of claims 15-23 for use in detecting duchenne muscular dystrophy in a subject.

26. A system for determining gene copy number, comprising:

1) Amplification module: amplifying the target gene, the reference gene and the competing substrates of the target gene and the reference gene by using an amplification primer to obtain an amplification product;

2) Extension module: performing an extension reaction on the amplification product obtained in the step 1) by using an extension primer to obtain an extension product;

3) Mass spectrometry module: mass spectrum detection is carried out on the extension product in the step 2) by using a mass spectrometer through a matrix-assisted laser desorption ionization time-of-flight mass spectrometry technology, and the copy number of the target gene is calculated through the peak height of each detection site;

27. The system of claim 26, wherein the gene of interest comprises SMN1, SMN2, CYP2D6, DMD.

28. The system of claim 26 or 27, wherein the reference genes comprise RPP40, ATP7A and COL4A5.

29. The system of any one of claims 26-28, wherein the amplification primer is a locked nucleic acid modified amplification primer.

30. The system of any one of claims 26-29, wherein the amplification reaction of step 1) employs a UNG-dUTP anti-contamination system.

31. The system of any one of claims 26-30, wherein the amplification primers for the gene of interest are: amplification primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: 1-2.

32. The system of any one of claims 26-31, wherein the extension primer of the gene of interest is: extension primers of Exon 7 of SMN1 and SMN2, as set forth in SEQ ID NO: shown at 9.

33. The system of any one of claims 26-32, wherein the amplification primers of the reference gene are selected from the group consisting of:

i) Amplification primer of Intron 6 of RPP40, as shown in SEQ ID NO: 3-4;

iii) The amplification primer of Exon 41 of COL4A5 is shown in SEQ ID NO: 7-8.

34. The system of any one of claims 26-33, wherein the extension primer of the reference gene is selected from the group consisting of:

ii) extension primer of Exon 1 of ATP7A, as shown in SEQ ID NO: 11;

iii) Extension primer of Exon 41 of COL4A5, as shown in SEQ ID NO: shown at 12.

35. The system of any one of claims 26-34, wherein the competing substrate is selected from the group consisting of:

i) Competing substrates of SMN1 and SMN2, as set forth in SEQ ID NO: 13;

ii) a competing substrate for RPP40, as set forth in SEQ ID NO: 14;

iii) A competing substrate for ATP7A, as set forth in SEQ ID NO: 15;

iii) a competing substrate for COL4A5, as set forth in SEQ ID NO: shown at 16.

36. The system of any one of claims 28-35, wherein the final concentration of competing substrate for RPP40 is 0.5-8ng/μl.

37. The system of any one of claims 28-36, wherein the final concentration of competing substrate for ATP7A is 1.5-18ng/μl.

38. The system of any one of claims 28-37, wherein the final concentration of competing substrate of COL4A5 is 1.5-10ng/μl.

39. Use of the system of claims 26-38 for the preparation of a kit for detecting rare diseases.

40. The use of claim 39, wherein the rare disorder is spinal muscular atrophy or duchenne muscular dystrophy.