CN113969311B - Method for detecting mutation after gene editing - Google Patents
Method for detecting mutation after gene editing Download PDFInfo
- Publication number
- CN113969311B CN113969311B CN202111220895.6A CN202111220895A CN113969311B CN 113969311 B CN113969311 B CN 113969311B CN 202111220895 A CN202111220895 A CN 202111220895A CN 113969311 B CN113969311 B CN 113969311B
- Authority
- CN
- China
- Prior art keywords
- seqkit
- file
- seq
- sequence
- sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000010362 genome editing Methods 0.000 title claims abstract description 25
- 230000035772 mutation Effects 0.000 title claims abstract description 15
- 239000012634 fragment Substances 0.000 claims abstract description 52
- 230000037431 insertion Effects 0.000 claims abstract description 50
- 238000003780 insertion Methods 0.000 claims abstract description 50
- 238000012163 sequencing technique Methods 0.000 claims abstract description 48
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 238000007672 fourth generation sequencing Methods 0.000 claims abstract description 14
- 239000013612 plasmid Substances 0.000 claims description 13
- 108091093088 Amplicon Proteins 0.000 claims description 12
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 108091092584 GDNA Proteins 0.000 claims description 2
- 238000001514 detection method Methods 0.000 abstract description 12
- 230000037430 deletion Effects 0.000 abstract description 11
- 238000012217 deletion Methods 0.000 abstract description 11
- 230000007547 defect Effects 0.000 abstract description 3
- 238000003752 polymerase chain reaction Methods 0.000 description 25
- 210000004027 cell Anatomy 0.000 description 16
- 108020004414 DNA Proteins 0.000 description 11
- 230000001404 mediated effect Effects 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 10
- 238000000605 extraction Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 102000053602 DNA Human genes 0.000 description 6
- 230000003321 amplification Effects 0.000 description 6
- 235000019506 cigar Nutrition 0.000 description 6
- 230000011559 double-strand break repair via nonhomologous end joining Effects 0.000 description 6
- 238000003199 nucleic acid amplification method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008439 repair process Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 101000579123 Homo sapiens Phosphoglycerate kinase 1 Proteins 0.000 description 3
- 238000009004 PCR Kit Methods 0.000 description 3
- KJWZYMMLVHIVSU-IYCNHOCDSA-N PGK1 Chemical compound CCCCC[C@H](O)\C=C\[C@@H]1[C@@H](CCCCCCC(O)=O)C(=O)CC1=O KJWZYMMLVHIVSU-IYCNHOCDSA-N 0.000 description 3
- 102100028251 Phosphoglycerate kinase 1 Human genes 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 230000005782 double-strand break Effects 0.000 description 3
- 238000000684 flow cytometry Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 101150115146 EEF2 gene Proteins 0.000 description 2
- 102100031334 Elongation factor 2 Human genes 0.000 description 2
- 206010064571 Gene mutation Diseases 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 208000027418 Wounds and injury Diseases 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004520 electroporation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001502 gel electrophoresis Methods 0.000 description 2
- 208000014674 injury Diseases 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- PZNPLUBHRSSFHT-RRHRGVEJSA-N 1-hexadecanoyl-2-octadecanoyl-sn-glycero-3-phosphocholine Chemical compound CCCCCCCCCCCCCCCCCC(=O)O[C@@H](COP([O-])(=O)OCC[N+](C)(C)C)COC(=O)CCCCCCCCCCCCCCC PZNPLUBHRSSFHT-RRHRGVEJSA-N 0.000 description 1
- 108091033409 CRISPR Proteins 0.000 description 1
- 238000010354 CRISPR gene editing Methods 0.000 description 1
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 1
- 108020005004 Guide RNA Proteins 0.000 description 1
- 101001105486 Homo sapiens Proteasome subunit alpha type-7 Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 102100021201 Proteasome subunit alpha type-7 Human genes 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008970 bacterial immunity Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 108091092356 cellular DNA Proteins 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 210000002304 esc Anatomy 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- -1 iPSC) Proteins 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 210000004986 primary T-cell Anatomy 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000007671 third-generation sequencing Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Medical Informatics (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for detecting mutation after gene editing. The invention develops a set of long fragment analysis flow, uses long fragment PCR, nanopore sequencing and python script to analyze sequencing data, can automatically shunt the sequencing data according to barcode information, extract corresponding long fragment data, then automatically uses minimap2 to compare shunted files, and can analyze the HDR insertion rate and the large fragment deletion rate of the appointed files, thereby not only making up the defect of analyzing the large fragment by the second generation sequencing, but also being superior to the method based on the second generation sequencing in the aspect of short fragment detection.
Description
Technical Field
The invention belongs to the technical field of biology, and relates to a method for detecting mutation after gene editing.
Background
The CRISPPR-Cas 9 system is an endonuclease system that is now widely used in the field of gene editing (see: jink M, et al A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity science,2012.337 (6096): p.816-21.). Typically, a gene will repair a double strand break generated by the CRISPR-Cas9 system on gene cleavage via both NHEJ or HDR pathways. However, it has been found that various gene mutations such as point mutations, deletion or insertion of large gene fragments, etc., are generated at the target site during gene repair. Therefore, accurate detection of the results of gene editing is also required before the CRIPSR-Cas9 system formally goes to clinical application.
The current detection technology is mainly based on second generation sequencing, and detects small fragment gene mutation mediated by NHEJ or HDR in two repair modes. Like CRISPResso2, insertion of the target site HDR can be detected (see: clement K, et al CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol,2019.37 (3): p.224-226.) however, the maximum length of the second generation sequencing reads is typically limited to 150bp each at both ends, which makes it unusable for detecting large fragment insertions or deletions.
The advent of more advanced third generation sequencing technologies such as nanopore sequencing (Nanopore sequencing) breaks the read length limit and opens the era of long fragment sequencing. Three generations of sequencing can directly read a single DNA molecule, real-time sequencing can be achieved, and the read length of each reading can reach tens of thousands of bases. Three generations of sequencing technology are reported to be based on the use of molecular tags to target original DNA molecules of a target gene in a sample, followed by amplification of the labeled molecules by PCR for sequencing. The data analysis algorithm VAULT (variant analysis with unique molecular identifier for long-read technology) mated to the technology can discover and extract molecular tags from sequencing results, and then use the molecular tags to group the sequenced data (reads), each group representing one DNA molecule in the original system. Thus, sequencing error correction can be performed using all the data within the same group, resulting in highly accurate mutation results, enabling quantitative detection of rare mutations in samples, including single base mutations, as well as a wide range of complex structural variations (see: yuan B, et al, long-read Ind device-molecule Sequencing Reveals CRISPR-induced Genetic Heterogeneity in Human ESCs 2020.). However, the method is complex in operation, high in difficulty and high in cost.
In summary, how to provide a method for precisely detecting mutation after gene editing, which is simple to operate and low in cost, is a problem to be solved in the field of gene editing.
Disclosure of Invention
Aiming at the defects and actual demands of the prior art, the invention provides a method for detecting mutation after gene editing, which can simultaneously and accurately analyze short fragment mutation and long fragment mutation, has simple operation and low cost, and has important significance for the field of gene editing.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the invention provides a method for detecting mutation after gene editing, which comprises the following steps:
(1) Performing PCR on the cell gDNA after gene editing, introducing a barcode sequence through a PCR primer, and performing nanopore sequencing on a PCR product to obtain a sequencing file;
(2) Removing the connector sequence in the sequencing file by using Porehop (v.0.2.4; https:// gitub.com/rrwick/Porehop), reversely complementing the sequence in the file, and merging the reversely complemented sequence and the forward sequence into a new file by using python script;
(3) Writing the full-length sequence in the new file into a reference. Txt file by using seqkit (v.0.11.0; https:// bioif. Benewei. Me/seqkit /), generating a left-end GREPseq file and a right-end GREPseq file, extracting PCR product sequences of the same site, manufacturing a BC-primer-seq file, extracting a single PCR product sequencing file, and obtaining a shunt file;
(4) The split files were aligned to a reference sequence using minimap2 (v.2.17; github-lh3/minimap2: A versatile pairwise aligner for genomic and spliced nucleotide sequences) and samtools (v.1.10; github-samtools/samtools: tools (written in C using htslib) for manipulating next-generation sequencing data) and analyzed.
The invention creatively designs a long fragment analysis flow (step (2) to step (4)): the flow chart is shown in figure 1, sequencing data is analyzed by using long fragment PCR, nanopore sequencing and python script, the GREPore-seq can automatically shunt the sequencing data according to the barcode information, corresponding long fragment data is extracted, and then the shunted files are automatically compared by using minimum 2, and meanwhile, the HDR insertion rate and the large fragment deletion rate of the files can be analyzed.
In the method, the data of the same site are distinguished through different barcode, so that a plurality of samples are mixed simultaneously in one-time nanopore sequencing, and the sequencing cost of a single sample is greatly reduced; when the barcode is used for distinguishing the same site and different experimental data, further, a sequence of about 70bp of the site is used as GrepL and GrepR to distinguish the experimental data of different sites, so that in single nanopore sequencing, a plurality of samples can be mixed at the same site, samples at different sites can be mixed for sequencing, and meanwhile, the samples can be efficiently extracted, and the sequencing cost of a single experiment is greatly reduced.
In the method, when the same site data is extracted, the sequencing data are combined in the forward and reverse directions firstly, then the left and right Grep files are used for extraction, as much as possible, correct sequencing data are extracted, and in the conventional software analysis flow, the step of combining the sequencing data in the forward and reverse directions firstly is omitted, the sequencing data are combined in the forward and reverse directions firstly and then are extracted, so that the required data can be extracted as much as possible, and the efficiency and the sensitivity are improved.
In the method, when single site data are shunted, single PCR product sequencing data are extracted by using a method of firstly manufacturing BC-primer-seq, in a general process, single barcode sequences are usually used for shunting sequencing data, and a plurality of k-primer sequences which are partially overlapped are respectively manufactured for each barcode sequence and each primer sequence, and the same barcode file is shunted by using the plurality of k-primer sequences, so that the data amount obtained by shunting is greatly increased while the accuracy is ensured.
Preferably, the instruction for removing the adaptor sequence in the sequencing file in step (2) is-adapter_threshold 85-extra_end_trim 0.
Preferably, the instruction for writing the reference. Txt file in the step (3) is seqkit fx2tab reference. Txt|seqkittab2fx|seqkit seq-u|seqkit reference-p "| -s > reference.
Preferably, the instruction For generating the left-end GREPseq file in step (3) is seqkit subseq-r 20:90reference.fasta|seqkit sliding-W15-s 5|seqkit fx2tab > > reference-left. Tmp, for/F "keys=2"% A in (reference-left. Tmp) do echo%A > > Grepseq-left. Txt.
Preferably, the instruction For generating the right-end GREPseq file in the step (3) is seqkit subseq-r-90: -20reference.fasta|seqkit sliding-W15-s 5|seqkit fx2tab > > reference-right. Tmp, for/F "keys=2"% A in (reference-right. Tmp) do echo-right. Txt.
Preferably, the instruction for extracting the PCR product sequence of the same site in the step (3) is seqkit grep-s-f15_5F-4 Grepseq-left. Txt-FR.fastq. Gz|seqkit grep-s-f Grepseq-right. Txt-otarget. Fastq. Gz.
Preferably, the instructions for extracting the single PCR product sequencing file in step (3) are:
(1’)seqkit fx2tab BC-Primer-seq.fa.txt|seqkit tab2fx|seqkit seq-u|seqkit replace-p"|-"-s>BC-Primer-seq.fasta;
(2’)Seqkit subseq-r 1:13BC-Primer-seq.fasta|seqkit sliding-W 9-s 1|Seqkit fx2tab>>BC-Primer-seq.temp;
(3’)For/F"tokens=2"%A in(BC-Primer-seq.temp)do echo%A>>BC-Primer-seq.txt;
(4’)seqkit grep-s-R 1:20-i-r-f BC-Primer-seq.txt target.fastq.gz-oamplicon.fastq.gz。
preferably, the instructions for comparing the split file with the reference sequence in step (4) are as follows:
(1”)samtools view-bS amplicon.sam>amplicon.bam;
(2”)samtools sort-O bam-o amplicon.sorted.bam-T temp amplicon.bam;
(3”)samtools index amplicon.sorted.bam。
preferably, the analysis of step (4) comprises any one or a combination of at least two of a dsODN insertion analysis, a DNA fragment insertion or deletion analysis, an HDR efficiency analysis, or a plasmid backbone insertion analysis.
Preferably, the system of gene editing comprises a CRISPR-Cas system.
In the invention, a section of double-stranded oligonucleotide fragment (dsODN) is introduced in the gene editing process of a CRISPR-Cas system, when a cell repairs DNA double-strand break injury through an NHEJ path, double-stranded DNA is inserted into a cutting site with high probability, the GREPore-seq is an analysis method based on long-fragment sequencing data, but still has the function of detecting short double-stranded DNA insertion, and the accuracy is higher than that of software CRISPResso2 based on second-generation sequencing data analysis, firstly, an extraction file is prepared for a dsODN sequence to be detected, the GREPore-seq automatically prepares a DSgrep file according to the sequence, the file comprises a plurality of sections of k-mers, namely, one section of dsODN sequence is intercepted, and step is set to be 1, namely, each two sections of adjacent k-mers are overlapped. Using DSgrep files containing this information, seqkit extraction is used on the files to be analyzed, with the instructions: the seqkit grep-s-f DSgrep-seq amplicon. Fastq. Gz-o output, and finally, the obtained reading number is compared with the analyzed file, so that the insertion rate of the dsODN can be obtained.
In the invention, GREPore-seq calculates the HDR mediated gene insertion efficiency by counting soft-clip data in a CIGAR character string in a bam file, the comparison result file (bam file) output by comparison software (minimum ap2 or bwa) contains the comparison condition of each read in a reference sequence, the information is written in the CIGAR character string, the CIGAR is generally formed by combining numbers and immediately following characters, the characters represent various comparison conditions, the previous numbers represent the sequence length of the conditions, the deletion of which is generally less than 50bp is included in the CIGAR-D, the deleted fragment is generally longer than 1Kbp for reading the reference sequence which can not be perfectly matched due to the deletion of the fragment, the comparison software generally includes the comparison result file into the CIGAR-S, the sequence data after gene editing is shunted, and the ratio of the CIGAR-S1000 in the result file (bam) is counted after the comparison of the sequence is carried out by using the comparison software and a template containing the HDR insertion sequence. The ratio of the edited data was then compared to the ratio of the wild-type (WT) data, again by calculation, resulting in HDR-mediated large fragment insertion efficiency.
In the invention, the GREPore-seq has high sensitivity, and can detect not only HDR mediated insertion, but also NHEJ mediated large fragment insertion with extremely low incidence.
Preferably, the PCR of step (1) is a long fragment PCR.
Preferably, the long fragment PCR is a long fragment amplification of cellular DNA using a long fragment PCR kit, and different barcode is added to different PCR products during the amplification process.
Preferably, the length of the barcode sequence in step (1) is 8-20 nt, including but not limited to 9nt, 10nt, 11nt, 12nt, 13nt, 15nt, 16nt, 18nt, 19nt, etc.
According to the sequence information of the gene editing site, a segment of 8-20 nt of barcode is added at the 5' end of a primer used for long-segment PCR, so that a segment of specific barcode sequence can be introduced before a product in the PCR amplification process, then Nanopore sequencing is carried out, and a flow chart of cell editing, long-segment PCR and Nanopore sequencing is shown in figure 2.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a method for detecting mutation after gene editing based on Nanopore sequencing, which develops a set of long fragment analysis flow, uses long fragment PCR, nanopore sequencing and python script to analyze sequencing data, can automatically shunt the sequencing data according to the barcode information to extract corresponding long fragment data, then automatically uses minimum 2 to compare shunted files, and can analyze HDR insertion rate and large fragment deletion rate of the appointed files, thereby not only making up the defect of large fragment analysis of second-generation sequencing, but also ensuring that the detection result is superior to the method based on second-generation sequencing in terms of short fragment detection.
Drawings
FIG. 1 is a block diagram of a GREPore-seq analysis;
FIG. 2 is a schematic diagram of a GREPore-seq workflow;
FIG. 3 is a gel electrophoresis diagram of a long fragment PCR product;
FIG. 4 is a schematic diagram of GREpore-seq detection dsODN insertion;
FIG. 5 is a graph of dsODN insertion rate results;
FIG. 6 is a schematic diagram of the HDR insertion of PGK1 site;
FIG. 7A is a graph showing the results of flow cytometry detection of vector insertion efficiency;
FIG. 7B is a graph showing the results of GREPore-seq detection of vector insertion efficiency;
FIG. 8A is a schematic diagram of the forward and reverse insertion of plasmid backbone sequences in iPSC cells;
FIG. 8B is a graph of the results of the insertion of the plasmid backbone into the IGVs in iPSC cells;
FIG. 8C is a graph showing the results of sequencing the sequence containing the backbone insert after GREPore-seq extraction;
FIG. 8D is a graph showing the frequency of forward and reverse insertion of plasmid backbone in iPSC cells.
Detailed Description
The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.
Example 1
In this example, long fragment PCR was performed at different gene editing sites, and cell DNA was amplified using a long fragment PCR kit.
A number of different sites (AAVS 1, B2M, BCL A-1, BCL11A-2, EEF2, TRAC and TRBC) were amplified, with specific experimental procedures: cultured cells (K562, HSPC, iPSC), gRNA was designed for different sites using CHOPCHOP (http:// CHopchop. Cbu. Uib. No) tool, and the proposed procedure was used with the designed gRNASpCas9 nucleic V3 protein binds to form RNP complex (ribonucleoprotein complex), and the RNP complex is introduced into cells by electroporation to edit genes in the cells. In K562 cellsFor example, using a Lonza 2B electroporator, T-016 program, the RNP complex was introduced into cells by electroporation, the cells were three days after electrotransfer into the RNP complex, genomic DNA was extracted using QIAamp DNAMini Kit (Qiagen) kit, amplified using long fragment PCR kit already in commercial use, KAPA-HiFi, nileHiFi and PrimeSTAR GXL, respectively, and the amplified products were subjected to gel electrophoresis according to instructions, as shown in fig. 3, KAPAHiFi showed similar amplification results to nilehhifi, 2 out of 14 PCR products could not be amplified, in contrast, primeSTAR GXL successfully amplified all sites, indicating that the kit PrimeSTAR GXL long fragment amplification effect was best, primeSTAR GXL kit 10 μl long fragment PCR reaction system: 100ng of genomic DNA, 2x premix and 0.3. Mu.L of primer (10. Mu.M); the long fragment PCR procedure was: amplification was performed at 98℃for 10 seconds, 60℃for 15 seconds, 68℃for 1 minute/kb for 30 cycles.
Example 2
This example performs dsODN insertion analysis.
In the gene editing process of the CRISPR-Cas system, a 29bp double-stranded oligonucleotide fragment (dsODN) is introduced, when a cell repairs DNA double-strand break injury through an NHEJ path, double-stranded DNA is inserted into a cleavage site, and the GREPore-seq provided by the invention is an analysis method based on long-fragment sequencing data, but still has the function of detecting short double-stranded DNA insertion, and has higher detection accuracy compared with software CRISPResso2 based on second-generation sequencing data analysis. Taking EEF2 locus as an example, performing gene editing on a human primary T cell, after long-fragment PCR (polymerase chain reaction) is carried out on a Nanopore sequence (norcereal origin), firstly, making an extraction file for a dsODN sequence to be detected, automatically making a DSgrep file according to the sequence by GREPore-seq, wherein the file comprises a plurality of sections of k-mers with the length of 13nt, namely, intercepting a section of dsODN sequence with the length of 13nt, and step is 1, namely, 12nt overlap exists between every two adjacent sections of k-mers (as shown in figure 4), and using the DSgrep file containing the information to extract the file to be analyzed by using seqkit, wherein the instructions are as follows: the insertion rate of dsODN was obtained by comparing the final read number with the analyzed file, and the second generation sequencing data was analyzed (grepgs) by the same method, and as shown in fig. 5, data1 to data10 represent 10 representative data, negative controller represents Negative control (without gene editing), dsODN insertion rate of the conventional crisp 2 analysis was significantly lower than grepgs (1.5 times of CRISPREsso 2) and grecore-seq of the present invention (1.4 times of CRISPREsso 2), which was probably because CRISPREsso2 could not effectively recognize the insertion of truncated dsODN, and the result of the analysis of grep-seq was similar to that of grepgs data, and the effectiveness of grecore-seq of the present invention was verified.
Example 3
This example performed an HDR-mediated gene insertion efficiency analysis.
The GREPore-seq of the present invention calculates HDR mediated gene insertion efficiency by counting soft-clip data in CIGAR strings in bam files. The comparison result file (bam file) output by the comparison software (minimum 2) contains the comparison condition of each read in the reference sequence, the information is written into a CIGAR character string, the CIGAR character string is generally formed by combining numbers and characters which follow the numbers, and the characters represent various comparison conditions; the preceding number represents the sequence length for this case. Deletions of typically less than 50bp will be included in cigs-D, and for reads containing large deletions that cannot perfectly match the reference sequence, the deleted fragments will typically be more than 1Kbp long, and alignment software will typically include them in cigs-S.
The invention knocks in a section of mNanGreen green fluorescent protein gene (an insertion schematic diagram is shown in figure 6) at the PGK1 site of the iPSC cell through a double-cut plasmid donor template, firstly, the insertion efficiency is analyzed by flow cytometry as a control, the result is shown in figure 7A, then, nanopore sequencing is carried out, and the insertion efficiency is analyzed by using the GREPore-seq of the invention. Specifically, after template alignment using alignment software with sequences containing HDR insertions, the ratio of cigs ar-S1000 in the results file (bam) is counted, and then the edited sample cigs ar-S1000 ratio is compared with the unedited sample cigs ar-S1000 ratio, resulting in an HDR mediated large fragment (about 1.5 Kbp) insertion efficiency. As shown in fig. 7B, the result shows that the HDR insertion efficiency obtained by using the grecore-seq detection is positively correlated with the result of the flow cytometry, which indicates that the grecore-seq detection of the present invention has higher accuracy in the HDR insertion efficiency.
Example 4
The GREPore-seq of the present invention has high sensitivity, and can detect not only HDR mediated insertion, but also insertion of plasmid vector backbone with extremely low incidence. In this example, taking the double-cut plasmid donor template skeleton (BB) inserted during PGK1 locus gene editing in the iPSC cell in example 3 as an example, the GREPore-seq firstly makes a BBgrep-seq file for the plasmid skeleton, namely, in the full-length sequence of the plasmid skeleton, a 15 nt-length fragment is taken as a k-mer every 100nt distance, and the 15nt sequence fragments form the BBgrep-seq file. BBgrep-seq files are respectively prepared for the forward sequence and the reverse complementary sequence of the plasmid skeleton, and the BBgrep-seq files are divided into two insertion cases of the plasmid skeleton in the forward direction and the reverse direction. The GREPore-seq will extract the files to be analyzed by using the BBgrep-seq files with the two insertion conditions of the positive and negative generated above, and the extraction is set in such a way that two or more k-mers need to exist in each reading sequence at the same time. The number of reads of the extraction result is compared with the original number of reads, and the ratio of plasmid backbone insertions can be obtained. FIGS. 8A-8C show the insertion of the HDR edited plasmid backbone in iPSC cells, and FIG. 8A is a schematic diagram showing the forward insertion and reverse insertion of the backbone sequence; FIG. 8B is an IGV visualization result with little to no insertion of the framework sequences; FIG. 8C shows the results of sequencing the sequence containing the backbone insert after GREPore-seq extraction, and shows that the backbone insert was more clearly detected after GREPore-seq extraction, and that 0.03% NHEJ mediated insert was detected by the lowest GREPore-seq, as shown in FIG. 8D, in the sequence data of the non-genetically Edited WT control versus the genetically Edited iPSC cells (Edited iPSC).
In summary, the invention creatively designs a long fragment analysis flow: the GREPore-seq uses long fragment PCR, nanopore sequencing and python script to analyze sequencing data, the GREPore-seq can automatically shunt the sequencing data according to barcode information to extract corresponding long fragment data, and then the shunted files are automatically compared by using mini ap2, and meanwhile, the HDR insertion rate and the large fragment deletion rate of the files can be analyzed.
The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.
Claims (4)
1. A method for detecting a mutation after gene editing, comprising the steps of:
(1) Performing PCR on the cell gDNA after gene editing, introducing a barcode sequence through a PCR primer, and performing nanopore sequencing on a PCR product to obtain a sequencing file;
(2) Removing the connector sequence in the sequencing file by using Porehop, reversely complementing the sequence in the file, and merging the reversely complemented sequence and the forward sequence into a new file by using python script;
the instruction for removing the connector sequence in the sequencing file is-adapter_threshold 85-extra_end_trim 0;
(3) Writing the full-length sequence in the new file into a reference. Txt file by using seqkit, generating a left-end GREPseq file and a right-end GREPseq file, extracting PCR product sequences of the same sites, manufacturing a BC-primer-seq file, extracting a single PCR product sequencing file, and obtaining a shunting file;
the instruction of extracting the PCR product sequence of the same site in the step (3) is seqkit grep-s-f15_5F-4 Grepseq-left. Txt-FR.fastq. Gz|seqkit grep-s-f Grepseq-right. Txt-o target. Fastq. Gz;
the instruction for extracting the single PCR product sequencing file is as follows:
(1’)seqkit fx2tab BC-Primer-seq.fa.txt | seqkit tab2fx | seqkit seq -u | seqkit replace -p " |-" -s > BC-Primer-seq.fasta;
(2’)Seqkit subseq -r 1:13 BC-Primer-seq.fasta | seqkit sliding -W 9 -s 1 | Seqkit fx2tab >> BC-Primer-seq.temp;
(3’)For /F "tokens=2" %A in (BC-Primer-seq.temp) do echo %A>> BC-Primer-seq.txt;
(4’)seqkit grep -s -R 1:20 -i -r -f BC-Primer-seq.txt target.fastq.gz -o amplicon.fastq.gz;
the instruction written into the reference. Txt file in the step (3) is seqkit fx2tab reference. Txt|seqkit tab2fx|seqkit seq-u|seqkit replacement-p "| -s > reference. Fasta;
the instruction For generating the left-end GREPseq file in the step (3) is seqkit subseq-r 20:90reference.fasta |seqkit sliding-W15-s5|seqkit fx2tab > > reference-left. Tmp, for/F "keys=2"% A in (reference-left. Tmp) do% A > > Grepseq-left. Txt;
the instruction For generating the right-end GREPseq file in the step (3) is seqkit subseq-r-90, -20reference.fasta |seqkit sliding-W15-s 5|seqkit fx2tab > > reference-right. Tmp, for/F "keys=2"% A in (reference-right. Tmp) do echo-right. Txt;
(4) Aligning the split file with a reference sequence using minimap2 and samtools and analyzing; the step (4) is that the instruction for comparing the split file with a reference sequence is as follows:
(1’’)samtools view -bS amplicon.sam > amplicon.bam;
(2’’)samtools sort -O bam -o amplicon.sorted.bam -T temp amplicon.bam;
(3’’)samtools index amplicon.sorted.bam。
2. the method of claim 1, wherein the analysis of step (4) comprises any one or a combination of at least two of a dsODN insertion analysis, an HDR efficiency analysis, or a plasmid backbone insertion analysis.
3. The method of claim 1, wherein the PCR of step (1) is long fragment PCR.
4. The method of claim 1, wherein the length of the barcode sequence of step (1) is 8-20 nt.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111220895.6A CN113969311B (en) | 2021-10-20 | 2021-10-20 | Method for detecting mutation after gene editing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111220895.6A CN113969311B (en) | 2021-10-20 | 2021-10-20 | Method for detecting mutation after gene editing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113969311A CN113969311A (en) | 2022-01-25 |
CN113969311B true CN113969311B (en) | 2024-02-23 |
Family
ID=79588116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111220895.6A Active CN113969311B (en) | 2021-10-20 | 2021-10-20 | Method for detecting mutation after gene editing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113969311B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112322714A (en) * | 2020-11-04 | 2021-02-05 | 上海交通大学 | Method for detecting gene editing efficiency and gene editing mode and application thereof |
CN113416730A (en) * | 2021-07-07 | 2021-09-21 | 中国医学科学院血液病医院(中国医学科学院血液学研究所) | Method for reducing large fragment deletion mutation of cell generated by gene editing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021076876A1 (en) * | 2019-10-18 | 2021-04-22 | Zymergen Inc. | Genotyping edited microbial strains |
-
2021
- 2021-10-20 CN CN202111220895.6A patent/CN113969311B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112322714A (en) * | 2020-11-04 | 2021-02-05 | 上海交通大学 | Method for detecting gene editing efficiency and gene editing mode and application thereof |
CN113416730A (en) * | 2021-07-07 | 2021-09-21 | 中国医学科学院血液病医院(中国医学科学院血液学研究所) | Method for reducing large fragment deletion mutation of cell generated by gene editing |
Non-Patent Citations (1)
Title |
---|
Effective control of large deletions after double-strand breaks by homology-directed repair and dsODN insertion;Wen W等;Genome Biol;第22卷(第1期);第17页倒数第1段,第23页附件1图S1b * |
Also Published As
Publication number | Publication date |
---|---|
CN113969311A (en) | 2022-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | CRISPR-GE: a convenient software toolkit for CRISPR-based genome editing | |
KR101858344B1 (en) | Method of next generation sequencing using adapter comprising barcode sequence | |
JP6314091B2 (en) | DNA sequence data analysis | |
TWI385253B (en) | Methods for accurate sequence data and modified base position determination | |
CN110520542A (en) | Method for targeting nucleic acid sequence enrichment and the application in the nucleic acid sequencing of error correcting | |
CN106566876B (en) | Oligonucleotide probe and obtaining method thereof | |
CN110628880A (en) | Method for detecting gene variation by synchronously using messenger RNA and genome DNA template | |
WO2021047363A1 (en) | Method for using whole genome re-sequencing data to quickly identify transgenic or gene editing material and insertion sites thereof | |
WO2015144045A1 (en) | Plasmid library comprising two random markers and use thereof in high throughput sequencing | |
CN108642201B (en) | SNP (Single nucleotide polymorphism) marker related to millet plant height character as well as detection primer and application thereof | |
CN113969311B (en) | Method for detecting mutation after gene editing | |
CN105528532B (en) | A kind of characteristic analysis method in rna editing site | |
KR20210110790A (en) | Synthesis method of single-stranded DNA | |
CN113564266B (en) | SNP typing genetic marker combination, detection kit and application | |
US20220364080A1 (en) | Methods for dna library generation to facilitate the detection and reporting of low frequency variants | |
CN112322714A (en) | Method for detecting gene editing efficiency and gene editing mode and application thereof | |
CN108707685B (en) | SNP (Single nucleotide polymorphism) marker related to tillering number character of millet as well as detection primer and application thereof | |
CN108642203B (en) | SNP (Single nucleotide polymorphism) marker related to millet stem thickness character as well as detection primer and application thereof | |
CN106636362B (en) | Soybean microsatellite marker locus development method and microsatellite marker length detection method in microsatellite marker locus | |
CN106520959B (en) | Development method of orchid microsatellite marker locus and method for detecting length of microsatellite marker in microsatellite marker locus | |
CN106520955B (en) | Development method of rice microsatellite marker locus and length detection method of microsatellite marker in microsatellite marker locus | |
US20130143746A1 (en) | Method for detecting gene region features based on inter-alu polymerase chain reaction | |
CN108642197B (en) | SNP (Single nucleotide polymorphism) marker related to millet code number character as well as detection primer and application thereof | |
CN117844782B (en) | Gene editing nuclease with wide targeting range and application thereof in nucleic acid detection | |
CN106566890B (en) | Method for developing rape microsatellite marker locus and method for detecting length of microsatellite marker in microsatellite marker locus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |