CN110628890A - Sequencing quality control standard product and application and product thereof - Google Patents

Sequencing quality control standard product and application and product thereof Download PDF

Info

Publication number
CN110628890A
CN110628890A CN201911084920.5A CN201911084920A CN110628890A CN 110628890 A CN110628890 A CN 110628890A CN 201911084920 A CN201911084920 A CN 201911084920A CN 110628890 A CN110628890 A CN 110628890A
Authority
CN
China
Prior art keywords
quality control
control standard
sequencing quality
sequencing
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911084920.5A
Other languages
Chinese (zh)
Other versions
CN110628890B (en
Inventor
倪铭
刘红洁
李鹏
林彦锋
宋宏彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Pla Center For Disease Control & Prevention
Institute of Pharmacology and Toxicology of AMMS
Academy of Military Medical Sciences AMMS of PLA
Original Assignee
Chinese Pla Center For Disease Control & Prevention
Institute of Pharmacology and Toxicology of AMMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Pla Center For Disease Control & Prevention, Institute of Pharmacology and Toxicology of AMMS filed Critical Chinese Pla Center For Disease Control & Prevention
Priority to CN201911084920.5A priority Critical patent/CN110628890B/en
Publication of CN110628890A publication Critical patent/CN110628890A/en
Application granted granted Critical
Publication of CN110628890B publication Critical patent/CN110628890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/166Oligonucleotides used as internal standards, controls or normalisation probes

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the field of biological sequencing quality control, and particularly provides a sequencing quality control standard product and application and a product thereof. The invention provides a sequencing quality control standard, which sequentially comprises a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2, wherein the length of the sequencing quality control standard is 1000-10000bp, the homology with a natural sequence is 0, the GC content is 40-60%, a single base repeating element not less than 5bp is not contained, a GC or CG double base repetition not less than 4 is not contained, the length of the primer region 1 is 30-300bp, the length of the primer region 2 is 30-300bp, and the length of the homologous region is 30-40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2. The sequencing quality control standard has strong universality, can be directly doped into a sample to be sequenced without influencing the downstream analysis operation, and can meet the requirements of sample tracking and cross contamination assessment.

Description

Sequencing quality control standard product and application and product thereof
Technical Field
The invention relates to the field of biological sequencing quality control, in particular to a sequencing quality control standard product and application and a product thereof.
Background
High throughput sequencing and single molecule sequencing of DNA/RNA sequences plays a great role in scientific research and practical applications. However, sample confusion or cross contamination may occur in the sequencing process, which causes loss, mainly due to the following three factors:
(1) the sample to be sequenced needs to be subjected to complex DNA/RNA extraction and library construction work, and relates to a large number of molecular biological operations, wherein the risks of sample confusion (such as wrong numbering of a middle test tube) or cross contamination (such as pollution of a pipette tip or aerosol pollution) exist;
(2) at present, the throughput of a sequencing instrument is huge, a plurality of samples are mixed on one sequencing chip, and data are split by a bioinformatics method, wherein the situation of data split confusion caused by information filling errors can also occur;
(3) the limitations of sequencers and reagents, such as the partial sequencing platform of Illumina company, have the disadvantage of high cross contamination during sequencing.
Currently, one approach to sample tracking is to obtain forensic Human identification genomic sites (e.g., commercial kit, xGen Human ID Research Panel v1.0, http:// www.nanodigmbio.com/product-item-17.html) simultaneously during sequencing, so that sequencing data can be linked to individuals, and the approach can be generalized to other species. However, there is a great limitation that the confusion of the sample cannot be found effectively, but only by other means (such as not meeting clinical criteria, etc.), the possible confusion is suggested, and the original sample is re-tested and confirmed.
Another more efficient approach is to incorporate a small amount of standard in the sample to be sequenced. The incorporated standards are a series of standard DNAs with fixed numbers. When a sample is prepared, a small amount of standard DNA is doped into extracted sample nucleic acid, after library construction and sequencing, whether the corresponding sequence of the standard sample nucleic acid in the sample is consistent with the expectation is detected during data analysis, and if the corresponding sequence is not consistent, confusion occurs during experiment and data splitting, and re-experiment is needed.
In gene microarray chips, the incorporation of sample-tracking standards is a common protocol, such as CytoChip, available from Illumina for its gene chipsTMOligospike-in controls, and Agilent in its application description (Sampth)&Chip detection standards are provided in Kishawi, Use of Spike-ins for Sample tracking in Agilent Array CGH, Agilent technologies, Inc.,2016Published in the USA, March 15, 20165991-. However, these standards are short (about 400bp) and carry fluorophores and cannot be used for sequencing experiments. Chen et al in the paper (Chen K, Hu Z, Xia Z, ZHao D, Li W, Tyler JK.2016.the orthogonal linkage effect: fundamental gene for spike-in control for viral genome-with analysis systems. MolCellbiol 36: 662-667. doi:10.1128/MCB.00970-14) discussed the incorporation of standards in sequencing and other genome related experiments in the necessity and discussion of incorporation of design principles, but mainly for high throughput sequencing experiments.
On one hand, the sequencing fragment length of single-molecule sequencing is far higher than that of high-throughput sequencing, the sequencing of a shorter sequence is difficult, and the sequencing error rate is high (can reach more than 10%). Therefore, how to make the doped standard substance be effectively sequenced and identified, avoid the influence caused by sequencing errors, and carry out sample tracing and cross contamination assessment is a problem to be solved. In addition, the application scene of the doped standard substance in the prior art is narrow, the universality is poor, different standard substances need to be designed according to different sequencing schemes, and the doped standard substance which can be used for high-throughput sequencing and single-molecule sequencing is blank.
On the other hand, besides sample tracking and cross-contamination assessment, accuracy assessment of mutation detection is also an important aspect of quality control. Parallel experiments are currently performed using independent samples of standards, such as the numerous independent standards with various mutations available from horizon discovery, and the independent standards for genetic testing (e.g., BRCA gene mutation) available from the chinese food and drug testing institute. Since the independent standard substance is generally expensive and each sample can only be subjected to a small amount of experiments, the independent standard substance is often used for detecting the mutation discovery capability (sensitive specificity) of a laboratory or a detection mechanism, and the quality control of each detected sample is difficult.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The present invention aims to solve at least one of the above technical problems.
The invention aims to provide a sequencing quality control standard, which is used for solving the problems that the application scene of the standard is narrow, the universality is poor and different standards need to be designed aiming at different sequencing schemes in the prior art.
The second purpose of the invention is to provide the application of the sequencing quality control standard in single molecule sequencing and/or high-throughput sequencing.
The third objective of the present invention is to provide a single-molecule sequencing quality control kit or a high-throughput sequencing quality control kit, so as to alleviate the technical problem of the lack of effective quality control products in the prior art.
In order to achieve the above purpose of the present invention, the following technical solutions are adopted:
a sequencing quality control standard comprises a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2 in sequence;
the length of the sequencing quality control standard is 1000-10000 bp;
the homology of the sequencing quality control standard substance and a natural sequence is 0;
the GC content of the sequencing quality control standard is 40-60%;
the sequencing quality control standard does not contain a single base repetitive element of more than or equal to 5 bp;
the sequencing quality control standard does not contain GC or CG double-base repeats of more than or equal to 4;
the length of the primer region 1 is 30-300 bp;
the length of the primer area 2 is 30-300 bp;
the length of the homologous region is 30-40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2.
Further, 1-4 mutation sites exist in each 100bp of the homologous region.
Further, the length of the sequencing quality control standard is 1000-5000bp, preferably 1000-3000 bp.
Further, the homology detection of the sequencing quality control standard and the natural sequence adopts MEGABLAST, and the result is 0;
preferably, the databases for homology testing include the nucleic acid database of NCBI (nt bank), the human genome bank of NCBI, and the mouse genome bank of NCBI.
Further, the GC content of the sequencing quality control standard is 45% -55%, and preferably 50%.
Further, the sequencing quality control standard comprises at least one of SEQ ID NO.1, SEQ ID NO.2, SEQ ID NO.3, SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO. 6;
preferably, the PCR primers for sequencing the quality control standard comprise:
F:5’-GTGTGCAACCTATGGCGACAG-3’(SEQ ID NO.7)
R:5’-CACATAGCTCTCAGAGTCGCGG-3’(SEQ ID NO.8)。
the sequencing quality control standard is applied to single-molecule sequencing and/or high-throughput sequencing.
Further, the addition amount of the sequencing quality control standard is 0.5-10 w/w% of the sample to be detected.
A single-molecule sequencing quality control kit comprises the sequencing quality control standard substance.
A high-throughput sequencing quality control kit comprises the sequencing quality control standard substance.
Compared with the prior art, the invention has the beneficial effects that:
the sequencing quality control standard can meet the requirements of single-molecule sequencing and high-throughput sequencing simultaneously through specific limitation, and has strong universality. Moreover, since the sequence of the standard has no homology with the natural sequence, the standard can be directly incorporated into a sample to be sequenced without affecting the downstream analysis operation thereof. In addition, the sequencing quality control standard with a specific sequence can completely meet the requirements of sample tracking and cross contamination assessment, the result is accurate and reliable, and meanwhile, the sequencing quality control standard is low in cost and can be widely applied. In addition, by reasonably limiting the lengths of the primer region, the specific region and the homologous region, the sequencing quality control standard can be rapidly prepared, and the requirements of strong universality, sample tracking, cross contamination assessment, mutation assessment and the like can be met.
The single-molecule sequencing quality control kit or the high-throughput sequencing quality control kit provided by the invention has the advantages due to the application of the sequencing quality control standard substance provided by the application, can realize batch production, is low in cost and can be widely applied.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic structural diagram of a sequencing quality control standard provided by the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention. The examples, in which specific conditions are not specified, were conducted under conventional conditions or conditions recommended by the manufacturer.
Unless otherwise defined, technical and scientific terms used herein have the same meaning as is familiar to those skilled in the art. In addition, any methods or materials similar or equivalent to those described herein can also be used in the present invention.
A sequencing quality control standard comprises a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2 in sequence;
wherein, the length of the sequencing quality control standard is 1000-10000bp, the homology with a natural sequence is 0, the GC content is 40-60%, a single base repetitive element which is not less than 5bp is not contained, and GC or CG double base repetition which is not less than 4 is not contained;
furthermore, primer region 1 is 30-300bp in length, primer region 2 is 30-300bp in length, and the length of the homologous region is 30-40% of the sum of the lengths of specific region 1, the homologous region and specific region 2.
The sequencing quality control standard can meet the requirements of single-molecule sequencing and high-throughput sequencing at the same time through specific limitation, and has strong universality. Moreover, since the sequence of the standard has no homology with the natural sequence, the standard can be directly incorporated into a sample to be sequenced without affecting the downstream analysis operation thereof. In addition, the sequencing quality control standard with a specific sequence can completely meet the requirements of sample tracking and cross contamination assessment, the result is accurate and reliable, and meanwhile, the sequencing quality control standard is low in cost and can be widely applied. The sequencing quality control standard provided by the invention needs to be incorporated into a sample to be detected before fragmentation.
The length of the sequencing quality control standard is limited, so that the universality of the sequencing quality control standard is better; the homology with the natural sequence is 0, so that all the sequences to be detected in the detection sample can be identified, and the analysis of the sample which should be carried out is not influenced; the content of GC is reasonably controlled to avoid the preference of sequencing; the single base repeat element with the length of more than or equal to 5bp and the GC or CG double base repeat with the length of more than or equal to 4 are not contained, so that the influence of the higher error rate of the sequencing technology on the sites on the accuracy of quality control analysis can be avoided. In addition, by reasonably limiting the lengths of the primer region, the specific region and the homologous region, the sequencing quality control standard can be rapidly prepared, and the requirements of strong universality, sample tracking, cross contamination assessment, mutation assessment and the like can be met.
It should be noted that the sequencing quality control standard can also meet the requirement of mutation evaluation, for example, but not limited to, 1-4 mutation sites can exist in the homologous region of the sequencing quality control standard every 100bp, more than two standard sequences can be obtained according to the principle, the standard sequences have high homology, and after being mixed, the standard sequences are simultaneously mixed into a sample to be sequenced, so that the requirement of mutation detection quality control can be met on the basis of meeting the requirements of sample tracking and cross contamination evaluation.
It is understood that the length of the sequencing quality control standard can be, but is not limited to, 1000bp, 2000bp, 3000bp, 4000bp, 5000bp, 6000bp, 7000bp, 8000bp, 9000bp or 10000 bp; the GC content of the sequencing quality control standard can be, but is not limited to, 40%, 42%, 44%, 46%, 48%, 50%, 52%, 54%, 56%, 58%, or 60%; a native sequence refers to a nucleic acid sequence that is naturally owned by an organism under natural conditions; the sequencing quality control standard does not contain a single base repetitive element of more than or equal to 5bp, which means that the sequence of the standard can have, for example, A, AA, AAA or AAAA sequence, but can not contain continuous single base repetitive elements of 5bp or more such as AAAAA, AAAAAA and the like; the sequencing quality control standard does not contain GC or CG double-base repeats of more than or equal to 4 means that the sequence of the standard can be GC, GCGC, GCGCGC, CG, CGCG or CGCGCGCG, but the standard can not contain continuous 4 or more GC or CG double-base repeats such as GCGCGCGC or CGCGCGCG; the length of primer region 1 can be, but is not limited to, 30bp, 50bp, 100bp, 150bp, 200bp, 250bp, or 300 bp; the length of primer region 2 can be, but is not limited to, 30bp, 50bp, 100bp, 150bp, 200bp, 250bp, or 300 bp; the length of the homologous region can be, but is not limited to, 30%, 32%, 34%, 36%, 38% or 40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2.
In a preferred embodiment, there are 1-4 mutation sites per 100bp in the homologous region.
In a preferred embodiment, the length of the sequencing quality control standard is 1000-. The sequencing quality control standard product cannot meet the requirements of multiple application scenes or multiple standard quality control due to too short length, and the production cost is unnecessarily increased due to too long length.
In a preferred embodiment, the homology test of the sequencing quality control standard with the natural sequence uses MEGABLAST, and the result is 0. Specifically, the megablast program was selected using the BLAST online alignment tool (website address https:// blast.ncbi.nlm.nih.gov/blast.cgi), and no alignment results were obtained for the databases Human genome + transcriptome (Human genomic + transcript), Mouse genome + transcriptome (Mouse genomic + transcript), and nucleotide repertoire (nucleotide repertoire/nt). All standard sequences have no homology to the native sequences in the database, so that all spiked sequences can be identified without affecting the analysis that the sample should have performed.
In a preferred embodiment, the database of homology tests comprises the nucleic acid database of NCBI (nt bank), the human genome bank of NCBI, and the mouse genome bank of NCBI.
In a preferred embodiment, the GC content of the sequencing quality control standard is 45% to 55%, preferably 50%. The standard sequence has a high balance of GC content, with the GC content of the entire sequence and various portions of the sequence at or near 50%, which avoids sequencing bias.
In a preferred embodiment, the sequencing quality control standard comprises at least one of SEQ ID No.1, SEQ ID No.2, SEQ ID No.3, SEQ ID No.4, SEQ ID No.5 and SEQ ID No. 6.
The structure (shown in figure 1) of the 6 specific sequencing quality control standard sequences provided by the invention is divided into three parts, namely a consensus primer region, a specific region and a homologous region with a specific mutation site. Specifically, in each standard sequence, a consensus primer region of 400bp (200bp primer region 1, 200bp primer region 2), a specific region of 1200bp (600bp specific region 1, 600bp specific region 2), and a homologous region with a specific mutation site of 600bp are provided. The primer region is common to all sequences, and can be conveniently prepared by using the same primer. The sequencing quality control standard has no homology in all natural sequences, can be used for high-confidence identification and is used for sample tracking and cross contamination assessment. In addition, the homologous region of 600bp is highly homologous between the sequences of the 6 standards, but different mutation sites are designed, at least 2 of the 6 standards can be mixed to produce mutation sites with target frequency, and then the mixed standards are mixed into a sample for quality control of mutation site detection.
In a preferred embodiment, the PCR primers for sequencing the quality control standards comprise:
F:5’-GTGTGCAACCTATGGCGACAG-3’(SEQ ID NO.7)
R:5’-CACATAGCTCTCAGAGTCGCGG-3’(SEQ ID NO.8)。
the sequencing quality control standard is applied to single-molecule sequencing and/or high-throughput sequencing.
In a preferred embodiment, the sequencing quality control standard is added in an amount of 0.5-10 w/w% of the sample to be tested. Wherein "w/w%" means mass percentage; the addition amount of the sequencing quality control standard is typically, but not limited to, 0.5 w/w%, 2 w/w%, 4 w/w%, 6 w/w%, 8 w/w% or 10 w/w% of the sample to be detected.
A single molecule sequencing quality control kit comprises the sequencing quality control standard substance.
A high-throughput sequencing quality control kit comprises the sequencing quality control standard.
The single-molecule sequencing quality control kit or the high-throughput sequencing quality control kit provided by the invention has the advantages due to the application of the sequencing quality control standard substance provided by the application, can realize batch production, is low in cost and can be widely applied.
The invention is further illustrated by the following specific examples, which, however, are to be construed as merely illustrative, and not limitative of the remainder of the disclosure in any way whatsoever.
Example 1
The sequence design considers three requirements of sample tracking, cross contamination evaluation and mutation evaluation, and meets a series of technical indexes to adapt to the requirements of sequencing technology and data analysis, and the method specifically comprises the following steps:
(1) the structure of the sequence (see FIG. 1) is divided into three parts, namely a consensus primer region, a specific region and a homologous region with a specific mutation site. Wherein, the common primer area is 400bp, the specific area is 1200bp, and the homologous area with the specific mutation site is 600 bp. The primer region is common to all sequences, and can be conveniently prepared by using the same primer. The sequencing quality control standard has no homology in all natural sequences, can be used for high-confidence identification and is used for sample tracking and cross contamination assessment. In addition, the homologous region of 600bp is provided, different doped sequences are highly homologous, but different mutation sites are designed, different sequences can be mixed to manufacture the mutation sites with target frequency, and then mixed standard substances are doped into a sample for quality control of mutation site detection.
(2) All sequences had no homology to the native sequences in the database. Specifically, alignment of nucleic acid databases from NCBI (nt bank), as well as the human and mouse genomes using MEGABLAST, was found to be non-homologous. In this way, all spiking sequences can be identified without affecting the analysis that the sample should have performed.
(3) The sequence has a balance of high GC content. The GC content of the entire sequence and of various portions of the sequence is at or near 50%. To avoid bias in sequencing.
(4) The sequence has no long single base repetitive element (more than or equal to 5bp) and GC or CG double base repetitive more than or equal to 4, so as to avoid the influence of the sequencing technology on quality control analysis at higher error rate of the sites.
Example 2
After the sequencing quality control standard substance sequence is designed by software, 6 standard substances with the sequences of SEQ ID NO.1, SEQ ID NO.2, SEQ ID NO.3, SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO.6 are obtained, 6 sequencing quality control standard substance sequences (SEQ ID NO.1-SEQ ID NO.6 sequences) are respectively synthesized in a de novo synthesis mode, the synthesized sequences are respectively connected to Puc57 plasmids to obtain 6 recombinant plasmids, then the 6 recombinant plasmids are respectively transferred into an escherichia coli TOP10 strain, the recombinant plasmids are continuously amplified along with the propagation of the strain, and 6 large quantities of recombinant plasmids can be respectively obtained after DNA extraction. Then, 6 (SEQ ID NO.1-SEQ ID NO.6) sequences embedded in the recombinant plasmid are respectively amplified by adopting a PCR mode, then spike-in sequences are separated from other impurities by agarose gel electrophoresis, a band belonging to the size of the spike-in sequences is cut, DNA is recovered by using an agarose gel DNA recovery kit, and finally, the DNA is eluted by using a nucleic-Free Water. And performing quality detection by using sanger sequencing to obtain 6 target sequencing quality control standard substances, and quantitatively storing the standard substances in a refrigerator at the temperature of-20 ℃ for later use.
The primers for PCR were as follows:
F:5’-GTGTGCAACCTATGGCGACAG-3’(SEQ ID NO.7)
R:5’-CACATAGCTCTCAGAGTCGCGG-3’(SEQ ID NO.8)。
the reaction conditions of PCR were: denaturation (94 ℃, 30s), annealing (60 ℃, 30s), extension (72 ℃, 30s), cycle number: 30 times.
Example 3
And respectively adding 6 prepared sequencing quality control standard substances into 6 DNA samples to be sequenced, and recording, wherein the doping proportion is 4 w/w%. Then, these 6 samples were sequenced after pooling using a MinION sequencer from Oxford Nanopore Technologies. The 6 samples were sequenced for mixing on one chip.
After sequencing is completed, data is split by adopting software recommended by a sequencer company, and then the information of the sequencing quality control standard in each sample data is analyzed respectively. Results the average proportion of the standard sequences found in 6 samples was 3.7% with a variance of 0.9%. Of the 5 samples, over 99% of the standard sequences were from only one standard, and the remaining 97.6% of the samples were from one standard. The result is consistent with the expected added sequencing quality control standard, which shows that the sequencing quality control standard of the invention plays a role in sample tracking, and the experiment has no sample confusion. On the other hand, the results also directly yield the cross-contamination rate: of these 5 samples all had < 1% cross-contamination, while the other sample had a cross-contamination rate of 2.4%. In addition, since the addition of sequencing quality control standards is in little demand and has no homology with the native sequence, the normal analysis (genome splicing) of these samples is not affected.
The result shows that the standard substance of the invention can meet the requirements and provide good quality control for sequencing samples.
Example 4
Mixing the doped standard substances with the sequences of SEQ ID NO.1, SEQ ID NO.3 and SEQ ID NO.2 according to the ratio of 1:1:2 to form a mixed standard substance, wherein 18 mutations are contained, and the frequency is divided into two types; in addition, three spiked standards with sequences SEQ ID NO.4, SEQ ID NO.5 and SEQ ID NO.6 were mixed at a ratio of 50:45:5 to form a second mixed standard with 18 mutations in total, with expected frequencies of 3 classes. Then, these two mixed standards were added to sample 1 and sample 2, respectively, at an expected incorporation ratio of 5 w/w%, and after pooling the two samples separately, sequencing was performed using a MinION sequencer from Oxford Nanopore technologies. 2 samples for a chip on the mixed sequencing.
And after the sequencing is finished, splitting the data by adopting software recommended by a sequencer company, and then analyzing the composition and proportion information of the sequencing quality control standard in the two samples. As a result, the mixed incorporation standard ratios in the two samples were found to be 4.6% and 5.07%, respectively, and the composition and ratio of each sequencing quality control standard in the two samples were also the same as expected.
Mutations and their frequency in the homologous regions of all sequencing quality control standards were then analyzed in both samples. The result proves that when the lowest mutation frequency detection limit is set at 0.2, the detection of the mutation site with higher accuracy can be obtained, and the detection rate of the mutation with the frequency of more than or equal to 0.2 is 94.4 percent (17/18) in the sample 1 and 100 percent (12/12) in the sample 2; false positive sites were present in 1 in sample 1 and not in sample 2. And when the detection limit of mutation frequency is less than 0.2, a large number of false positive sites appear, which indicates that the mutation analysis below the detection limit is unreliable.
The result shows that the standard substance of the invention can be used for evaluating mutation detection, including detection rate, false positive sites and noise distribution, and provides quality control standard reference for detection and analysis of sequencing sample mutation.
While particular embodiments of the present invention have been illustrated and described, it would be obvious that various other changes and modifications can be made without departing from the spirit and scope of the invention. It is therefore intended to cover in the appended claims all such changes and modifications that are within the scope of this invention.
SEQUENCE LISTING
<110> military medical research institute of military science institute of people's liberation force of China
Disease prevention control center for Chinese people liberation force
<120> sequencing quality control standard substance and application and product thereof
<160> 8
<170> PatentIn version 3.5
<210> 1
<211> 2200
<212> DNA
<213> Artificial sequence
<400> 1
gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60
aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120
cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180
gcgtggcgca gtcagcgtaa tctatgacgg tgataggcac tggcgcaaga tacggtcgca 240
atgtgacttc atgactctgg cgcgacacta tgattgcagt tgatactgcg agcgttatta 300
gatcatatat gcatgctcca atcagtatgc cgctagacgc ctatcgactg tatcggaccg 360
cgccgtgcca tgtcgagtaa tcgaatagag tctggcatct ccgagcggat caacgtacat 420
tctgcacgca taacgacgat atacgtgact atacaatagt ccattgttga atgatcgtct 480
tcagtgcgta acaacgagat gctcgatcgg tcctaataac gccagccgct ctagatgtgg 540
cgcataatac acactacgtt cgacgccgaa gctaatccgc tcggcggtgc ctaggagatg 600
tactattgtt ctactcatac taacaatgcg caccgtgccg ctagacaatg gcgaggatcg 660
agtcagagac ctgtgcatag ctacggcgaa taacgcttac tccatagcgg taacctgcgg 720
aacttatacc agattatgaa gcggcaatac ttagagccac tagttgtgcc agcgagtata 780
ttcggaccat ccaccaggag gtagctacga acgctagcgc tgagactgat acgcgtccta 840
cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900
ttagctacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960
cgtacttatg tgagttcgcc actaagtctc tcttaggata gactgaatgc taccatacag 1020
gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080
gtgttggtac cacaggacga cacgagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140
gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200
gatcaacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260
gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta accgaacaat cgcgttgtat 1320
gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380
gacagtgagt cgtgctatac cagtggcata ctgtgcgcgt tatcgtgact gtagagctat 1440
cgagactgga acacggtaga gtatatccag ccactaatct cggtgcagcc gcggattctc 1500
atcttagcct gcgacctcta gctataatcc ttacttgagt ggtatacgtc atacgagtta 1560
gacaagtatc acgcgaatag catactcgaa taccgcggac acgcctcgct acatatatca 1620
gtgtatggct agctaggttg tagaacgcgc ctgtccgctg tagatgacag cctcgtgctc 1680
atggaagatc caaggcgaag gcttagcacg tgcactacaa ccgcatccgt acctatccga 1740
ttagataact agtccgcttg gtcctattgc taaggagtag ttggagtact ggttcaatag 1800
cgaaccgcta tcctcagcta ctcagtacgc aagcctgccg ttacgtgtcg acgtcatgtg 1860
tgctatgcgt catgaataag cattgaactg aagataatca gttagcgcat tgagctctaa 1920
tggaaccact ggtacgtctc catacttatt cgtgatgata gcatgccagc aggcgccatg 1980
actcgacagc cagacacgtt aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040
tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100
tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160
ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200
<210> 2
<211> 2200
<212> DNA
<213> Artificial sequence
<400> 2
gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60
aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120
cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180
gcgtggcgca gtcagcgtaa ggtaagcagt ctctgtggtg gtgtatacgc tgcatgacat 240
cgtactgcac ctattgacgt gttctccgtg aactgcagta tatacgcggt aacggacaag 300
cgctctatcg catgttgcgg tagacgccac tatattatgg caacgccatg tattggcata 360
gcgataccag tacagcttct ccgactgtac actatccgcc ggcagaatca tatattatct 420
gcgaagtact tgtcgctagt catcgcctcg gaatcgtagt gtagcctggt tggttcgtac 480
tatccgtgac ctatgaccta ttggcgaggc ggtagcaaca cgaacggtat gcagcaatgc 540
acgcttatct gcacaagcct attatgagat ctggtcgtta tcgacttcgc gacgcgacga 600
gctggtgtta ggctggacag ccgacctacc tgatggagtc agcggacgct tggtgagcta 660
ttcgtacact gcttgatatt atgcgatatt acattatgct cctcgtaaca cacgcatacc 720
ttccaggatc acgtgtagcg tcgaggacgg agttctatac ctcaagatag cgtcagcgac 780
agaatcattg gtgaacatac gaacctacga acgctagcgc tgagactgat acgcgtccta 840
cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900
attgctacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960
cgtacttatg tgagttcgcc actaagtctc tcttaggata cagtgaatgc taccatacag 1020
gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080
gtgttggtac cacaggacgt ctcgagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140
gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200
caacaacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260
gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tcggaacaat cgcgttgtat 1320
gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380
gacagtgagt cgtgctatac ggacaacgtg tgtggtattg gtgagacaag tattactcgc 1440
gcttgaggac ggcgcagata ctgcaatcaa gtgcagcagc gcgtacggtt gcgatgaact 1500
tccgtgcctg atcctgacga tgtcgttata tccgaagaca cacttatcgg tcaacagttc 1560
gacttgtcac tgtcgtcgca caggactatc atgaatgcaa cgtcaatgcg gattcctcgc 1620
acggcataat ccataatgta gctcatggcg gtgcggctag gctagtaagt cgcatcgcct 1680
gttatatcct tggcggtcat gattgtatcg tacaataaga ggtggttaga gcgcgagcac 1740
attctgctat ggctgatcct taccttctaa gtcctctgcg gctgaagtta gactgcggca 1800
acgcttgatg ataaccgcct acgagatact cctgaacggt gtataggctc ataatcctcg 1860
atggctcgag ctcgttcggc ggatacgaag ccattatcgt gcatagcgtc ctctatggtg 1920
cgatagagca cttatccaga ctcagcgaac aatggttcgt gacgagatac cagtgaacag 1980
atcgccatcg gacactctac aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040
tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100
tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160
ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200
<210> 3
<211> 2200
<212> DNA
<213> Artificial sequence
<400> 3
gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60
aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120
cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180
gcgtggcgca gtcagcgtaa ctgataatcc atggcgtgcc gacgaagtat ggtacagtgc 240
agcttattat accgactgag ctaaggactg gaggataggt tgtgtgcaga aggacaagga 300
atagacgccg catcgccgcc gtcatacctc agtatcttga agatagccgt gctcaacgca 360
ataatctgga gcaatctagt cgtatctcca gttatggtca gttgcgatca gctcaggact 420
cggactgcta tctatggaag agctacctgc gctcttagct attgaacaat cactaacact 480
cctcaccaca aggatacggt atcggagcga tggaccgcac tatattactt ccaactatgc 540
ggctacggaa ggctctattg cgacatgcgg atacttcgct caggttcgcc gatacacatt 600
ccaataacta atacaaggtg gtcgatactg tgcgagcgag gacacttatc atggctcgaa 660
taccgcggct cattcggctt gctgtcagtg gtcgtcgtcc tatcgagaag cgacaggagc 720
aacactgtat tcgagtatac ctctgtctgc cacctatcca ggtggaatat agccatatgt 780
gcgagaactt cgaggataag gaagcaacga acgctagcgc tgagactgat acgcgtccta 840
cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900
ataggtacca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960
cgtacttatg tgagttcgcc actaagtctc tcttaggata cactcaatgc taccatacag 1020
gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080
gtgttggtac cacaggacgt caccagcagg attcttgacg ctgcgatgcg ttcgcttgta 1140
gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200
catctacaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260
gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgtacaat cgcgttgtat 1320
gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380
gacagtgagt cgtgctatac gattgagtag cctcgcgctc aagagagact agagtaagac 1440
ttccatcacg agcgatctct tactggacgc cgtattgaca cctgcatatg gaatcacatc 1500
gccgttggat agtgcagtaa tatcactgcg tgcaacttgt gcacagagcc gcgtactatc 1560
gtgtctatga gaccttacgt ccgacgctct acggtccata tatcgtatcg tatatcgcct 1620
ctcacgatac ataagttctc tctatcgcac actggtactc gaccgtctcc gtgcgtataa 1680
gcgagtactc ctaaccaagt atattcgctc gcaacgcgcc tggacatcgc gatcgttatc 1740
tggagcgctc ggagtgcgca tgcaagatta caaccgcatt ggatagactc cattgtgtcc 1800
gtcggtgcgc agtgcgctac tcttgctagc gctaagacca gagacacgaa ggctatagta 1860
atagtggacg cctatcaact caacatgcga agaagagcag tggtatactg ttctcgtgta 1920
ggtacgcaat cgataccgta gttctgcgct gttgtaccga tacacaccta cataggcgct 1980
tgccaatacg atggttggtc aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040
tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100
tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160
ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200
<210> 4
<211> 2200
<212> DNA
<213> Artificial sequence
<400> 4
gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60
aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120
cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180
gcgtggcgca gtcagcgtaa cgtgcggtgc acgcgtcatg atagacgata taacggccgc 240
gtctcaggtc tcaagtaaga ttgcgttggt cagacaatac gctcgaaggc gcagtcatat 300
accattaata gcacgtgtag agcgcactac tatgaggtat caggtgagag tatgatatca 360
tagagtcctt gagtgcgtct tatacgcgtt cctagattga gcgtgtatcg cacaagacgc 420
tatatatgaa tacatgcgtc tcgagattgt ataactcgtc agctagccgt catatgcctt 480
ctcaagtgcg ttatgtcgca caacgtagac tgtgagtgac gcgtgctgtg aggtctatat 540
aagtcatcac gcacaacgcc tatcaagcca cttgtggacg ctagcgtgct gcacagcgag 600
tagctcgcgg cagagacaca tcgagtatac ctaggatagt cttgatactc cacgtggtat 660
gcggcactat cttacacata tcaggcgtcc tggaagcgct accaattagc gtcgctgcgt 720
tactgcaagc agcgaccagg caactcatat gccggcacgc gctatcgcgt aaggcggtaa 780
cgctaacata ttgatattat gaagctagga acgctagcgc tgagactgat acgcgtccta 840
cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900
atagcttcca ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960
cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgattgc taccatacag 1020
gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080
gtgttggtac cacaggacgt cacgaccagg attcttgacg ctgcgatgcg ttcgcttgta 1140
gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200
catcaagaag agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260
gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaagaat cgcgttgtat 1320
gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380
gacagtgagt cgtgctatac aactgcgtat actagagatc cgcgtctcat gctatctcgg 1440
cgccttcgcg cagtgctacg taacggcgct atgccatgct aacatagttg cgtatcctat 1500
gatcctgcat aacgtcacgc gtgacctccg ttctacttcg cgatgcgtat cgctatatcg 1560
tgaagtctat atggaatata ggaacagcat tagcgcagcg gaggtaatca catacagtat 1620
atcgtgcggc atacgtcata ttgcactcag tcgccgcata tatcggtaga aggcagtacc 1680
gtgcgcatat tgctgtgctg cagttataca gacgagtact gtcgaggtat ggcgcagtcg 1740
ctataattca atccgtatat atgatgccat atgcgccgac agctactcgc catctgtgtg 1800
gtaggtggcg gtgagttgca ttcatccaga gtgcggaatt catgatatag cgtcgtagat 1860
ctgacgcacc accgaaccac attgagacgc caactgtgcg catcatatca ctgcatatat 1920
tacctctagg actgctccag aacgcgtatg tcattggagc ctgtcggcca attcacaccg 1980
agccatacac gcgcttgatt aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040
tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100
tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160
ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200
<210> 5
<211> 2200
<212> DNA
<213> Artificial sequence
<400> 5
gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60
aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120
cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180
gcgtggcgca gtcagcgtaa catgtaatgt atcgtaacta gacatgattc tgcatatcgc 240
tactcgtggt cgcttgctcc agcgcttcat ctctggagca tagtcttgac tagtatatac 300
gagactgatc tcagtccgta tggccgatca cactgccagc ggagacgagc acaacgacag 360
cgtcgcggac tcgcatatct cagactatat tcattccgta tgtatatctc cgacaaggag 420
ctgaaggatc atgttctcac tcaccattac tgctgacaat aggcgcacat accagtatgc 480
gcggccggca cttctacaca cattgctgct aacatatgta gtcgaaccta tcttcaagca 540
tctcgctgta gcgaacgcgt cgcacgtagc gagctaatac gcgtccagcg cgaattgtat 600
actattatat attgcgctga gcgccagccg acgcgctctg cttatattat aatattgatg 660
gtcggtgctc aagcgtgcac agtgaagttc cttcataccg tgatgcgcgg cgcgtacgtc 720
gacgcctata tgttagaagg ccaatgtcgc attgttatct tccagcttgg taagatcctt 780
ggcagcgtca tatgaactcg gaagctacgt acgctagcgc tgagactgat acgcgtccta 840
cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900
atagctacga ttgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960
cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgaatcc taccatacag 1020
gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080
gtgttggtac cacaggacgt cacgagctgg attcttgacg ctgcgatgcg ttcgcttgta 1140
gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200
catcaacatg agccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260
gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaacatt cgcgttgtat 1320
gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380
gacagtgagt cgtgctatac tcagttagcg caattaatgt gacctaatca caccagtcag 1440
gcacatatga ctataagcgc atgctgcgaa gtctagacat cctgacaact cgtacgcagg 1500
cgtgtatata ccgtatataa gaatcttcgg acgcatagcg actgcaacct acagcatcat 1560
gcagcctcga ggcgtgcagc gcacatatat ccgcggatat gcaataagca gcgtgccgtc 1620
ctggtggtgg ctgctggtat acagcattct tatattcaat gacgtcagcc ttcctcgccg 1680
cgtgaattag agacggtcct tgcttaggtc ctcctggttg acggtcatag taactataag 1740
gtgacagcgc ggttcagaag cgcgactata tccgacgaga tatattaacg cctatcaaca 1800
tagaatgcaa gaggtacagg tccatggtcg cgtacgacga taagcgtgcg agaacgtgcc 1860
gtcatatacc gaggatatac tcgcagctgg cggcgaccag gtatatcgtc ttatctgata 1920
tcatggactt acatactata tcagcgtgtt atggcgcgag cacgacagct gtatactgag 1980
gaggcgcaat gccgtataac aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040
tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100
tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160
ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200
<210> 6
<211> 2200
<212> DNA
<213> Artificial sequence
<400> 6
gtgtgcaacc tatggcgaca gtctgtcgaa tccgccgttg tatgcactat attcttgctg 60
aagacgattg cattagtcat agagatccgc ctacagtcgc ttgccatact gtacggcata 120
cgagatacct tgtcgatact catgctggaa ctgagatgag cgcggtatac gtgagagtat 180
gcgtggcgca gtcagcgtaa gtgtatcaac ctaagtacgc tataccacaa tattctgagc 240
gttgcgacgg tctagccgtg cactatcacc tacagtgccg tccgaaggcg gaagcgacca 300
tcagcttgtc aatacagaac ctaagacgct gcatgccgat cgatacgacc atgcgaataa 360
ggtagcgaca tactgtcgca actgatatgt ggtataagtg cgcagtctaa cgcatatgtc 420
ttcgtacaca ccttaggagt accgccagcg tgctatgcac gcgagagcgc acaggtataa 480
tatagatcgc gcacattcaa gaagtcagcg cgtatagccg agtatatgta tatctcaccg 540
ctactgcaag ttgcgtgcgc ttacggtacg ctatatgagc ttggtatatt gatggacgct 600
tggacgatgc atatgagacc gcgacctgtt cgctctgcca agatgaagga gctcctcact 660
tcaagtcgac cggacgcctc gcgctgtcct tgtcaacaac atagtatcac tcgcaggtca 720
tataagcgtg ccttctagcc tcaagagcca tatgacctga ctctagctct tatatacgca 780
gatgcttaac taggtacatt gaagctacga aggctagcgc tgagactgat acgcgtccta 840
cacgatactg gtccgcgagt caatccagta acgtacgaga ttgtacactg ctagcatcgg 900
atagctacca atgcacgtac gtcggcacgc catgcgtgtc tagcgtcaac cttcagtatt 960
cgtacttatg tgagttcgcc actaagtctc tcttaggata cactgaatgc aaccatacag 1020
gcgcaataga tggtgacata gttcgtcgcc aaccgttggt atctgttgat cgcagttagc 1080
gtgttggtac cacaggacgt cacgagcagc attcttgacg ctgcgatgcg ttcgcttgta 1140
gacgcctgtt caccgctaat accataggac ggctatcgta tcgctcctca atcctgcaac 1200
catcaacaag tgccaatgta gctggcgcaa tctccgtcgt aggtacgtcc atataacagc 1260
gtgcaacgcc gatctgcgaa ggtagtgtcc tgatctgcta tccgaacaat ggcgttgtat 1320
gaacgtattc gacggacaat agcggcaagc gaatcacagg tatgcttagc cactgttgat 1380
gacagtgagt cgtgctatac aacgaagtaa gcggagctat ccttatcacc acgatcacta 1440
gtcgcgcaag cacctcaatc tattcgatcg acgcgcatta gcgcacgctc ttcgcaggca 1500
tgattgcata cctagttcct gcgacatgat atatgaacat caggtcaaca gtgaaggata 1560
tgtgcggacg cacacgcgtt gtagcgcggc acatattgag tatgaacttc catgatagca 1620
acgcgtgtgt gtgacaccgc tacaatccac atgacagtat gctcggcctt gtggctggtc 1680
ttgtggacgc atatatcgac tgtgccatat agtcatatca gagtgcgcat ccgataatcc 1740
ttgctgacac cacacgcgcc ttagcgctat tagatgccgt tacctaggca tgtaatggtc 1800
tctacgtcaa gttgataggc ttgacgccga atccgaatta attgcagacg attgcgtcct 1860
cgcgtatatg gattgactac gcggccacct atgtgtcata gcagacacgc atagtacgcc 1920
tggaggcgag ttaccgcgtc atagacgtct gttaatcaga tatgctcctc attatgcaca 1980
tcagcacttg cgcttcgtgg aagtagtata gacgagaata tcgtgacaga agcgcatgaa 2040
tgtctgtgag atatatcacg gaccgaacgt aggtccagaa ccagcagtaa gatgcgagcg 2100
tggttagatg cggtatagct ccgtacagga cacagtgcag tacaaggatc actccagtct 2160
ctaagagcgc aagcctatcc gcgactctga gagctatgtg 2200
<210> 7
<211> 21
<212> DNA
<213> Artificial sequence
<400> 7
gtgtgcaacc tatggcgaca g 21
<210> 8
<211> 22
<212> DNA
<213> Artificial sequence
<400> 8
cacatagctc tcagagtcgc gg 22

Claims (10)

1. The sequencing quality control standard is characterized by sequentially comprising a primer region 1, a specific region 1, a homologous region, a specific region 2 and a primer region 2;
the length of the sequencing quality control standard is 1000-10000 bp;
the homology of the sequencing quality control standard substance and a natural sequence is 0;
the GC content of the sequencing quality control standard is 40-60%;
the sequencing quality control standard does not contain a single base repetitive element of more than or equal to 5 bp;
the sequencing quality control standard does not contain GC or CG double-base repeats of more than or equal to 4;
the length of the primer region 1 is 30-300 bp;
the length of the primer area 2 is 30-300 bp;
the length of the homologous region is 30-40% of the sum of the lengths of the specific region 1, the homologous region and the specific region 2.
2. The sequencing quality control standard according to claim 1, wherein 1-4 mutation sites exist in each 100bp of the homologous region.
3. The sequencing quality control standard according to claim 1, wherein the length of the sequencing quality control standard is 1000-5000bp, preferably 1000-3000 bp.
4. The sequencing quality control standard according to claim 1, wherein the homology detection of the sequencing quality control standard with a natural sequence is performed by MEGABLAST detection, and the result is 0;
preferably, the databases for homology testing include the nucleic acid database of NCBI (nt bank), the human genome bank of NCBI, and the mouse genome bank of NCBI.
5. The sequencing quality control standard according to claim 1, wherein the GC content of the sequencing quality control standard is 45-55%, preferably 50%.
6.The sequencing quality control standard according to any one of claims 1-5, wherein the sequencing quality control standard comprises at least one of SEQ ID No.1, SEQ ID No.2, SEQ ID No.3, SEQ ID No.4, SEQ ID No.5, and SEQ ID No. 6;
preferably, the PCR primers for sequencing the quality control standard comprise:
F:5’-GTGTGCAACCTATGGCGACAG-3’(SEQ ID NO.7)
R:5’-CACATAGCTCTCAGAGTCGCGG-3’(SEQ ID NO.8)。
7. use of the sequencing quality control standard of any one of claims 1-6 in single molecule sequencing and/or high throughput sequencing.
8. The use of claim 7, wherein the sequencing quality control standard is added in an amount of 0.5-10 w/w% of the sample to be tested.
9. A single-molecule sequencing quality control kit, which is characterized by comprising the sequencing quality control standard substance of any one of claims 1 to 6.
10. A high-throughput sequencing quality control kit, which is characterized by comprising the sequencing quality control standard substance of any one of claims 1 to 6.
CN201911084920.5A 2019-11-07 2019-11-07 Sequencing quality control standard product and application and product thereof Active CN110628890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911084920.5A CN110628890B (en) 2019-11-07 2019-11-07 Sequencing quality control standard product and application and product thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911084920.5A CN110628890B (en) 2019-11-07 2019-11-07 Sequencing quality control standard product and application and product thereof

Publications (2)

Publication Number Publication Date
CN110628890A true CN110628890A (en) 2019-12-31
CN110628890B CN110628890B (en) 2020-11-13

Family

ID=68979192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911084920.5A Active CN110628890B (en) 2019-11-07 2019-11-07 Sequencing quality control standard product and application and product thereof

Country Status (1)

Country Link
CN (1) CN110628890B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111500691A (en) * 2020-04-24 2020-08-07 中国食品药品检定研究院 Quality control standard substance and quality control method for microbial high-throughput DNA sequencing data
CN111850016A (en) * 2020-07-06 2020-10-30 深圳泛因医学有限公司 Immune repertoire standard substance sequence and design method and application thereof
CN112680534A (en) * 2021-01-21 2021-04-20 哈尔滨医科大学 Mycobacterium tuberculosis sRNA fluorescent quantitative PCR standard substance for identifying false positive reaction and application thereof
CN112853001A (en) * 2021-02-06 2021-05-28 浙江树人学院(浙江树人大学) Quality control product for detecting RNA virus by metagenome sequencing and application thereof
CN113897354A (en) * 2021-08-27 2022-01-07 海宁麦凯医学检验有限公司 Internal standard for sequencing correction and application thereof
CN117867086A (en) * 2024-03-12 2024-04-12 北京雅康博生物科技有限公司 Standard substance for quantitative high-throughput sequencing library and preparation method and application thereof
CN117887812A (en) * 2024-03-14 2024-04-16 北京雅康博生物科技有限公司 Library for high-throughput sequencing quality control, and preparation method and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573125A (en) * 2018-04-19 2018-09-25 上海亿康医学检验所有限公司 Method for detecting genome copy number variation and device comprising same
CN109504770A (en) * 2018-10-17 2019-03-22 艾普拜生物科技(苏州)有限公司 A kind of kit and method that detection Heterozygosity missing being sequenced based on amplicon

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108573125A (en) * 2018-04-19 2018-09-25 上海亿康医学检验所有限公司 Method for detecting genome copy number variation and device comprising same
CN109504770A (en) * 2018-10-17 2019-03-22 艾普拜生物科技(苏州)有限公司 A kind of kit and method that detection Heterozygosity missing being sequenced based on amplicon

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EVARIST PLANET ET AL.: "High-throughput sequencing quality control,processing and visualization in R.", 《BIOINFORMATICS》 *
孟珍等: "一种基因序列测序数据质量控制方案", 《科研信息化技术与应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111500691A (en) * 2020-04-24 2020-08-07 中国食品药品检定研究院 Quality control standard substance and quality control method for microbial high-throughput DNA sequencing data
CN111850016A (en) * 2020-07-06 2020-10-30 深圳泛因医学有限公司 Immune repertoire standard substance sequence and design method and application thereof
CN112680534A (en) * 2021-01-21 2021-04-20 哈尔滨医科大学 Mycobacterium tuberculosis sRNA fluorescent quantitative PCR standard substance for identifying false positive reaction and application thereof
CN112853001A (en) * 2021-02-06 2021-05-28 浙江树人学院(浙江树人大学) Quality control product for detecting RNA virus by metagenome sequencing and application thereof
CN113897354A (en) * 2021-08-27 2022-01-07 海宁麦凯医学检验有限公司 Internal standard for sequencing correction and application thereof
CN117867086A (en) * 2024-03-12 2024-04-12 北京雅康博生物科技有限公司 Standard substance for quantitative high-throughput sequencing library and preparation method and application thereof
CN117867086B (en) * 2024-03-12 2024-06-25 北京雅康博生物科技有限公司 Standard substance for quantitative high-throughput sequencing library and preparation method and application thereof
CN117887812A (en) * 2024-03-14 2024-04-16 北京雅康博生物科技有限公司 Library for high-throughput sequencing quality control, and preparation method and application thereof

Also Published As

Publication number Publication date
CN110628890B (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN110628890B (en) Sequencing quality control standard product and application and product thereof
CN110628880B (en) Method for detecting gene variation by synchronously using messenger RNA and genome DNA template
EP1546345B1 (en) Genome partitioning
KR101095220B1 (en) DNA marker associated with resistance of cabbage clubroot disease and uses thereof
WO2018147438A1 (en) Pcr primer set for hla gene, and sequencing method using same
CN113811618B (en) Sequencing library construction based on methylated DNA target region, system and application
CN111334573A (en) Gene detection kit for hypertension medication and use method
CN111826426A (en) Method for detecting molecular marker based on KASP technology
CN110923325B (en) Primer Blocker group, kit and method for detecting EGFR gene mutation
CN106995845B (en) Method for mining allelic variation of gene in polyploid by using third generation sequencing platform (PacBio RS II)
CN112442530B (en) Method for detecting CAH related true and false gene
CN105695581B (en) Medium-flux gene expression analysis method based on second-generation test platform
CN114790484B (en) MNP (MNP) marking site of xanthomonas oryzae, primer composition, kit and application of MNP marking site
CN109706247B (en) Method for monitoring genetic quality of inbred mouse by using microsatellite technology
CN115961007A (en) Method for developing whole genome molecular marker by using degenerate primer amplification
CN102943109B (en) Method for detecting copy number variation based on multiple internal controls in series
CN108642190B (en) Forensic medicine composite detection kit based on 14 autosomal SNP genetic markers
CN113025702A (en) Early screening method and kit for ankylosing spondylitis susceptibility genes
CN108707684B (en) SNP (Single nucleotide polymorphism) marker related to millet flag leaf length and detection primer and application thereof
CN114807302B (en) Amplicon library construction method and kit for thalassemia mutant and deletion type gene detection
CN112280884B (en) InDel marker suitable for corn genotyping and application thereof
CN111235293B (en) Rye 6RL chromosome arm specific KASP molecular marker and application thereof
WO2024106109A1 (en) Gene detection using modified substrate that modifies mobility of electrophoresis
CN108728566B (en) SNP (Single nucleotide polymorphism) marker related to thousand grain weight traits of millet as well as detection primer and application thereof
CN112080806B (en) Plasma free DNA library construction method of capillary 96-well plate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant