CN113969311B

CN113969311B - Method for detecting mutation after gene editing

Info

Publication number: CN113969311B
Application number: CN202111220895.6A
Authority: CN
Inventors: 张孝兵; 李斯昂; 权子莙; 温伟; 程涛; 张健萍
Original assignee: Institute of Hematology and Blood Diseases Hospital of CAMS and PUMC
Current assignee: Institute of Hematology and Blood Diseases Hospital of CAMS and PUMC
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2024-02-23
Anticipated expiration: 2041-10-20
Also published as: CN113969311A

Abstract

The invention discloses a method for detecting mutation after gene editing. The invention develops a set of long fragment analysis flow, uses long fragment PCR, nanopore sequencing and python script to analyze sequencing data, can automatically shunt the sequencing data according to barcode information, extract corresponding long fragment data, then automatically uses minimap2 to compare shunted files, and can analyze the HDR insertion rate and the large fragment deletion rate of the appointed files, thereby not only making up the defect of analyzing the large fragment by the second generation sequencing, but also being superior to the method based on the second generation sequencing in the aspect of short fragment detection.

Description

Method for detecting mutation after gene editing

Technical Field

The invention belongs to the technical field of biology, and relates to a method for detecting mutation after gene editing.

Background

The CRISPPR-Cas 9 system is an endonuclease system that is now widely used in the field of gene editing (see: jink M, et al A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity science,2012.337 (6096): p.816-21.). Typically, a gene will repair a double strand break generated by the CRISPR-Cas9 system on gene cleavage via both NHEJ or HDR pathways. However, it has been found that various gene mutations such as point mutations, deletion or insertion of large gene fragments, etc., are generated at the target site during gene repair. Therefore, accurate detection of the results of gene editing is also required before the CRIPSR-Cas9 system formally goes to clinical application.

The current detection technology is mainly based on second generation sequencing, and detects small fragment gene mutation mediated by NHEJ or HDR in two repair modes. Like CRISPResso2, insertion of the target site HDR can be detected (see: clement K, et al CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol,2019.37 (3): p.224-226.) however, the maximum length of the second generation sequencing reads is typically limited to 150bp each at both ends, which makes it unusable for detecting large fragment insertions or deletions.

The advent of more advanced third generation sequencing technologies such as nanopore sequencing (Nanopore sequencing) breaks the read length limit and opens the era of long fragment sequencing. Three generations of sequencing can directly read a single DNA molecule, real-time sequencing can be achieved, and the read length of each reading can reach tens of thousands of bases. Three generations of sequencing technology are reported to be based on the use of molecular tags to target original DNA molecules of a target gene in a sample, followed by amplification of the labeled molecules by PCR for sequencing. The data analysis algorithm VAULT (variant analysis with unique molecular identifier for long-read technology) mated to the technology can discover and extract molecular tags from sequencing results, and then use the molecular tags to group the sequenced data (reads), each group representing one DNA molecule in the original system. Thus, sequencing error correction can be performed using all the data within the same group, resulting in highly accurate mutation results, enabling quantitative detection of rare mutations in samples, including single base mutations, as well as a wide range of complex structural variations (see: yuan B, et al, long-read Ind device-molecule Sequencing Reveals CRISPR-induced Genetic Heterogeneity in Human ESCs 2020.). However, the method is complex in operation, high in difficulty and high in cost.

In summary, how to provide a method for precisely detecting mutation after gene editing, which is simple to operate and low in cost, is a problem to be solved in the field of gene editing.

Disclosure of Invention

Aiming at the defects and actual demands of the prior art, the invention provides a method for detecting mutation after gene editing, which can simultaneously and accurately analyze short fragment mutation and long fragment mutation, has simple operation and low cost, and has important significance for the field of gene editing.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the invention provides a method for detecting mutation after gene editing, which comprises the following steps:

(1) Performing PCR on the cell gDNA after gene editing, introducing a barcode sequence through a PCR primer, and performing nanopore sequencing on a PCR product to obtain a sequencing file;

(2) Removing the connector sequence in the sequencing file by using Porehop (v.0.2.4; https:// gitub.com/rrwick/Porehop), reversely complementing the sequence in the file, and merging the reversely complemented sequence and the forward sequence into a new file by using python script;

(3) Writing the full-length sequence in the new file into a reference. Txt file by using seqkit (v.0.11.0; https:// bioif. Benewei. Me/seqkit /), generating a left-end GREPseq file and a right-end GREPseq file, extracting PCR product sequences of the same site, manufacturing a BC-primer-seq file, extracting a single PCR product sequencing file, and obtaining a shunt file;

(4) The split files were aligned to a reference sequence using minimap2 (v.2.17; github-lh3/minimap2: A versatile pairwise aligner for genomic and spliced nucleotide sequences) and samtools (v.1.10; github-samtools/samtools: tools (written in C using htslib) for manipulating next-generation sequencing data) and analyzed.

The invention creatively designs a long fragment analysis flow (step (2) to step (4)): the flow chart is shown in figure 1, sequencing data is analyzed by using long fragment PCR, nanopore sequencing and python script, the GREPore-seq can automatically shunt the sequencing data according to the barcode information, corresponding long fragment data is extracted, and then the shunted files are automatically compared by using minimum 2, and meanwhile, the HDR insertion rate and the large fragment deletion rate of the files can be analyzed.

In the method, the data of the same site are distinguished through different barcode, so that a plurality of samples are mixed simultaneously in one-time nanopore sequencing, and the sequencing cost of a single sample is greatly reduced; when the barcode is used for distinguishing the same site and different experimental data, further, a sequence of about 70bp of the site is used as GrepL and GrepR to distinguish the experimental data of different sites, so that in single nanopore sequencing, a plurality of samples can be mixed at the same site, samples at different sites can be mixed for sequencing, and meanwhile, the samples can be efficiently extracted, and the sequencing cost of a single experiment is greatly reduced.

In the method, when the same site data is extracted, the sequencing data are combined in the forward and reverse directions firstly, then the left and right Grep files are used for extraction, as much as possible, correct sequencing data are extracted, and in the conventional software analysis flow, the step of combining the sequencing data in the forward and reverse directions firstly is omitted, the sequencing data are combined in the forward and reverse directions firstly and then are extracted, so that the required data can be extracted as much as possible, and the efficiency and the sensitivity are improved.

In the method, when single site data are shunted, single PCR product sequencing data are extracted by using a method of firstly manufacturing BC-primer-seq, in a general process, single barcode sequences are usually used for shunting sequencing data, and a plurality of k-primer sequences which are partially overlapped are respectively manufactured for each barcode sequence and each primer sequence, and the same barcode file is shunted by using the plurality of k-primer sequences, so that the data amount obtained by shunting is greatly increased while the accuracy is ensured.

Preferably, the instruction for removing the adaptor sequence in the sequencing file in step (2) is-adapter_threshold 85-extra_end_trim 0.

Preferably, the instruction for writing the reference. Txt file in the step (3) is seqkit fx2tab reference. Txt|seqkittab2fx|seqkit seq-u|seqkit reference-p "| -s > reference.

Preferably, the instruction For generating the left-end GREPseq file in step (3) is seqkit subseq-r 20:90reference.fasta|seqkit sliding-W15-s 5|seqkit fx2tab > > reference-left. Tmp, for/F "keys=2"% A in (reference-left. Tmp) do echo%A > > Grepseq-left. Txt.

Preferably, the instruction For generating the right-end GREPseq file in the step (3) is seqkit subseq-r-90: -20reference.fasta|seqkit sliding-W15-s 5|seqkit fx2tab > > reference-right. Tmp, for/F "keys=2"% A in (reference-right. Tmp) do echo-right. Txt.

Preferably, the instruction for extracting the PCR product sequence of the same site in the step (3) is seqkit grep-s-f15_5F-4 Grepseq-left. Txt-FR.fastq. Gz|seqkit grep-s-f Grepseq-right. Txt-otarget. Fastq. Gz.

Preferably, the instructions for extracting the single PCR product sequencing file in step (3) are:

(1’)seqkit fx2tab BC-Primer-seq.fa.txt|seqkit tab2fx|seqkit seq-u|seqkit replace-p"|-"-s>BC-Primer-seq.fasta；

(2’)Seqkit subseq-r 1:13BC-Primer-seq.fasta|seqkit sliding-W 9-s 1|Seqkit fx2tab>>BC-Primer-seq.temp；

(3’)For/F"tokens＝2"％A in(BC-Primer-seq.temp)do echo％A>>BC-Primer-seq.txt；

(4’)seqkit grep-s-R 1:20-i-r-f BC-Primer-seq.txt target.fastq.gz-oamplicon.fastq.gz。

preferably, the instructions for comparing the split file with the reference sequence in step (4) are as follows:

(1”)samtools view-bS amplicon.sam>amplicon.bam；

(2”)samtools sort-O bam-o amplicon.sorted.bam-T temp amplicon.bam；

(3”)samtools index amplicon.sorted.bam。

preferably, the analysis of step (4) comprises any one or a combination of at least two of a dsODN insertion analysis, a DNA fragment insertion or deletion analysis, an HDR efficiency analysis, or a plasmid backbone insertion analysis.

Preferably, the system of gene editing comprises a CRISPR-Cas system.

In the invention, a section of double-stranded oligonucleotide fragment (dsODN) is introduced in the gene editing process of a CRISPR-Cas system, when a cell repairs DNA double-strand break injury through an NHEJ path, double-stranded DNA is inserted into a cutting site with high probability, the GREPore-seq is an analysis method based on long-fragment sequencing data, but still has the function of detecting short double-stranded DNA insertion, and the accuracy is higher than that of software CRISPResso2 based on second-generation sequencing data analysis, firstly, an extraction file is prepared for a dsODN sequence to be detected, the GREPore-seq automatically prepares a DSgrep file according to the sequence, the file comprises a plurality of sections of k-mers, namely, one section of dsODN sequence is intercepted, and step is set to be 1, namely, each two sections of adjacent k-mers are overlapped. Using DSgrep files containing this information, seqkit extraction is used on the files to be analyzed, with the instructions: the seqkit grep-s-f DSgrep-seq amplicon. Fastq. Gz-o output, and finally, the obtained reading number is compared with the analyzed file, so that the insertion rate of the dsODN can be obtained.

In the invention, GREPore-seq calculates the HDR mediated gene insertion efficiency by counting soft-clip data in a CIGAR character string in a bam file, the comparison result file (bam file) output by comparison software (minimum ap2 or bwa) contains the comparison condition of each read in a reference sequence, the information is written in the CIGAR character string, the CIGAR is generally formed by combining numbers and immediately following characters, the characters represent various comparison conditions, the previous numbers represent the sequence length of the conditions, the deletion of which is generally less than 50bp is included in the CIGAR-D, the deleted fragment is generally longer than 1Kbp for reading the reference sequence which can not be perfectly matched due to the deletion of the fragment, the comparison software generally includes the comparison result file into the CIGAR-S, the sequence data after gene editing is shunted, and the ratio of the CIGAR-S1000 in the result file (bam) is counted after the comparison of the sequence is carried out by using the comparison software and a template containing the HDR insertion sequence. The ratio of the edited data was then compared to the ratio of the wild-type (WT) data, again by calculation, resulting in HDR-mediated large fragment insertion efficiency.

In the invention, the GREPore-seq has high sensitivity, and can detect not only HDR mediated insertion, but also NHEJ mediated large fragment insertion with extremely low incidence.

Preferably, the PCR of step (1) is a long fragment PCR.

Preferably, the long fragment PCR is a long fragment amplification of cellular DNA using a long fragment PCR kit, and different barcode is added to different PCR products during the amplification process.

Preferably, the length of the barcode sequence in step (1) is 8-20 nt, including but not limited to 9nt, 10nt, 11nt, 12nt, 13nt, 15nt, 16nt, 18nt, 19nt, etc.

According to the sequence information of the gene editing site, a segment of 8-20 nt of barcode is added at the 5' end of a primer used for long-segment PCR, so that a segment of specific barcode sequence can be introduced before a product in the PCR amplification process, then Nanopore sequencing is carried out, and a flow chart of cell editing, long-segment PCR and Nanopore sequencing is shown in figure 2.

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a method for detecting mutation after gene editing based on Nanopore sequencing, which develops a set of long fragment analysis flow, uses long fragment PCR, nanopore sequencing and python script to analyze sequencing data, can automatically shunt the sequencing data according to the barcode information to extract corresponding long fragment data, then automatically uses minimum 2 to compare shunted files, and can analyze HDR insertion rate and large fragment deletion rate of the appointed files, thereby not only making up the defect of large fragment analysis of second-generation sequencing, but also ensuring that the detection result is superior to the method based on second-generation sequencing in terms of short fragment detection.

Drawings

FIG. 1 is a block diagram of a GREPore-seq analysis;

FIG. 2 is a schematic diagram of a GREPore-seq workflow;

FIG. 3 is a gel electrophoresis diagram of a long fragment PCR product;

FIG. 4 is a schematic diagram of GREpore-seq detection dsODN insertion;

FIG. 5 is a graph of dsODN insertion rate results;

FIG. 6 is a schematic diagram of the HDR insertion of PGK1 site;

FIG. 7A is a graph showing the results of flow cytometry detection of vector insertion efficiency;

FIG. 7B is a graph showing the results of GREPore-seq detection of vector insertion efficiency;

FIG. 8A is a schematic diagram of the forward and reverse insertion of plasmid backbone sequences in iPSC cells;

FIG. 8B is a graph of the results of the insertion of the plasmid backbone into the IGVs in iPSC cells;

FIG. 8C is a graph showing the results of sequencing the sequence containing the backbone insert after GREPore-seq extraction;

FIG. 8D is a graph showing the frequency of forward and reverse insertion of plasmid backbone in iPSC cells.

Detailed Description

The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.

The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.

Example 1

In this example, long fragment PCR was performed at different gene editing sites, and cell DNA was amplified using a long fragment PCR kit.

A number of different sites (AAVS 1, B2M, BCL A-1, BCL11A-2, EEF2, TRAC and TRBC) were amplified, with specific experimental procedures: cultured cells (K562, HSPC, iPSC), gRNA was designed for different sites using CHOPCHOP (http:// CHopchop. Cbu. Uib. No) tool, and the proposed procedure was used with the designed gRNASpCas9 nucleic V3 protein binds to form RNP complex (ribonucleoprotein complex), and the RNP complex is introduced into cells by electroporation to edit genes in the cells. In K562 cellsFor example, using a Lonza 2B electroporator, T-016 program, the RNP complex was introduced into cells by electroporation, the cells were three days after electrotransfer into the RNP complex, genomic DNA was extracted using QIAamp DNAMini Kit (Qiagen) kit, amplified using long fragment PCR kit already in commercial use, KAPA-HiFi, nileHiFi and PrimeSTAR GXL, respectively, and the amplified products were subjected to gel electrophoresis according to instructions, as shown in fig. 3, KAPAHiFi showed similar amplification results to nilehhifi, 2 out of 14 PCR products could not be amplified, in contrast, primeSTAR GXL successfully amplified all sites, indicating that the kit PrimeSTAR GXL long fragment amplification effect was best, primeSTAR GXL kit 10 μl long fragment PCR reaction system: 100ng of genomic DNA, 2x premix and 0.3. Mu.L of primer (10. Mu.M); the long fragment PCR procedure was: amplification was performed at 98℃for 10 seconds, 60℃for 15 seconds, 68℃for 1 minute/kb for 30 cycles.

Example 2

This example performs dsODN insertion analysis.

In the gene editing process of the CRISPR-Cas system, a 29bp double-stranded oligonucleotide fragment (dsODN) is introduced, when a cell repairs DNA double-strand break injury through an NHEJ path, double-stranded DNA is inserted into a cleavage site, and the GREPore-seq provided by the invention is an analysis method based on long-fragment sequencing data, but still has the function of detecting short double-stranded DNA insertion, and has higher detection accuracy compared with software CRISPResso2 based on second-generation sequencing data analysis. Taking EEF2 locus as an example, performing gene editing on a human primary T cell, after long-fragment PCR (polymerase chain reaction) is carried out on a Nanopore sequence (norcereal origin), firstly, making an extraction file for a dsODN sequence to be detected, automatically making a DSgrep file according to the sequence by GREPore-seq, wherein the file comprises a plurality of sections of k-mers with the length of 13nt, namely, intercepting a section of dsODN sequence with the length of 13nt, and step is 1, namely, 12nt overlap exists between every two adjacent sections of k-mers (as shown in figure 4), and using the DSgrep file containing the information to extract the file to be analyzed by using seqkit, wherein the instructions are as follows: the insertion rate of dsODN was obtained by comparing the final read number with the analyzed file, and the second generation sequencing data was analyzed (grepgs) by the same method, and as shown in fig. 5, data1 to data10 represent 10 representative data, negative controller represents Negative control (without gene editing), dsODN insertion rate of the conventional crisp 2 analysis was significantly lower than grepgs (1.5 times of CRISPREsso 2) and grecore-seq of the present invention (1.4 times of CRISPREsso 2), which was probably because CRISPREsso2 could not effectively recognize the insertion of truncated dsODN, and the result of the analysis of grep-seq was similar to that of grepgs data, and the effectiveness of grecore-seq of the present invention was verified.

Example 3

This example performed an HDR-mediated gene insertion efficiency analysis.

The GREPore-seq of the present invention calculates HDR mediated gene insertion efficiency by counting soft-clip data in CIGAR strings in bam files. The comparison result file (bam file) output by the comparison software (minimum 2) contains the comparison condition of each read in the reference sequence, the information is written into a CIGAR character string, the CIGAR character string is generally formed by combining numbers and characters which follow the numbers, and the characters represent various comparison conditions; the preceding number represents the sequence length for this case. Deletions of typically less than 50bp will be included in cigs-D, and for reads containing large deletions that cannot perfectly match the reference sequence, the deleted fragments will typically be more than 1Kbp long, and alignment software will typically include them in cigs-S.

The invention knocks in a section of mNanGreen green fluorescent protein gene (an insertion schematic diagram is shown in figure 6) at the PGK1 site of the iPSC cell through a double-cut plasmid donor template, firstly, the insertion efficiency is analyzed by flow cytometry as a control, the result is shown in figure 7A, then, nanopore sequencing is carried out, and the insertion efficiency is analyzed by using the GREPore-seq of the invention. Specifically, after template alignment using alignment software with sequences containing HDR insertions, the ratio of cigs ar-S1000 in the results file (bam) is counted, and then the edited sample cigs ar-S1000 ratio is compared with the unedited sample cigs ar-S1000 ratio, resulting in an HDR mediated large fragment (about 1.5 Kbp) insertion efficiency. As shown in fig. 7B, the result shows that the HDR insertion efficiency obtained by using the grecore-seq detection is positively correlated with the result of the flow cytometry, which indicates that the grecore-seq detection of the present invention has higher accuracy in the HDR insertion efficiency.

Example 4

The GREPore-seq of the present invention has high sensitivity, and can detect not only HDR mediated insertion, but also insertion of plasmid vector backbone with extremely low incidence. In this example, taking the double-cut plasmid donor template skeleton (BB) inserted during PGK1 locus gene editing in the iPSC cell in example 3 as an example, the GREPore-seq firstly makes a BBgrep-seq file for the plasmid skeleton, namely, in the full-length sequence of the plasmid skeleton, a 15 nt-length fragment is taken as a k-mer every 100nt distance, and the 15nt sequence fragments form the BBgrep-seq file. BBgrep-seq files are respectively prepared for the forward sequence and the reverse complementary sequence of the plasmid skeleton, and the BBgrep-seq files are divided into two insertion cases of the plasmid skeleton in the forward direction and the reverse direction. The GREPore-seq will extract the files to be analyzed by using the BBgrep-seq files with the two insertion conditions of the positive and negative generated above, and the extraction is set in such a way that two or more k-mers need to exist in each reading sequence at the same time. The number of reads of the extraction result is compared with the original number of reads, and the ratio of plasmid backbone insertions can be obtained. FIGS. 8A-8C show the insertion of the HDR edited plasmid backbone in iPSC cells, and FIG. 8A is a schematic diagram showing the forward insertion and reverse insertion of the backbone sequence; FIG. 8B is an IGV visualization result with little to no insertion of the framework sequences; FIG. 8C shows the results of sequencing the sequence containing the backbone insert after GREPore-seq extraction, and shows that the backbone insert was more clearly detected after GREPore-seq extraction, and that 0.03% NHEJ mediated insert was detected by the lowest GREPore-seq, as shown in FIG. 8D, in the sequence data of the non-genetically Edited WT control versus the genetically Edited iPSC cells (Edited iPSC).

In summary, the invention creatively designs a long fragment analysis flow: the GREPore-seq uses long fragment PCR, nanopore sequencing and python script to analyze sequencing data, the GREPore-seq can automatically shunt the sequencing data according to barcode information to extract corresponding long fragment data, and then the shunted files are automatically compared by using mini ap2, and meanwhile, the HDR insertion rate and the large fragment deletion rate of the files can be analyzed.

The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.

Claims

1. A method for detecting a mutation after gene editing, comprising the steps of:

(2) Removing the connector sequence in the sequencing file by using Porehop, reversely complementing the sequence in the file, and merging the reversely complemented sequence and the forward sequence into a new file by using python script;

the instruction for removing the connector sequence in the sequencing file is-adapter_threshold 85-extra_end_trim 0;

(3) Writing the full-length sequence in the new file into a reference. Txt file by using seqkit, generating a left-end GREPseq file and a right-end GREPseq file, extracting PCR product sequences of the same sites, manufacturing a BC-primer-seq file, extracting a single PCR product sequencing file, and obtaining a shunting file;

the instruction of extracting the PCR product sequence of the same site in the step (3) is seqkit grep-s-f15_5F-4 Grepseq-left. Txt-FR.fastq. Gz|seqkit grep-s-f Grepseq-right. Txt-o target. Fastq. Gz;

the instruction for extracting the single PCR product sequencing file is as follows:

（1’）seqkit fx2tab BC-Primer-seq.fa.txt | seqkit tab2fx | seqkit seq -u | seqkit replace -p " |-" -s > BC-Primer-seq.fasta；

（2’）Seqkit subseq -r 1:13 BC-Primer-seq.fasta | seqkit sliding -W 9 -s 1 | Seqkit fx2tab >> BC-Primer-seq.temp；

（3’）For /F "tokens=2" %A in (BC-Primer-seq.temp) do echo %A>> BC-Primer-seq.txt；

（4’）seqkit grep -s -R 1:20 -i -r -f BC-Primer-seq.txt target.fastq.gz -o amplicon.fastq.gz；

the instruction written into the reference. Txt file in the step (3) is seqkit fx2tab reference. Txt|seqkit tab2fx|seqkit seq-u|seqkit replacement-p "| -s > reference. Fasta;

the instruction For generating the left-end GREPseq file in the step (3) is seqkit subseq-r 20:90reference.fasta |seqkit sliding-W15-s5|seqkit fx2tab > > reference-left. Tmp, for/F "keys=2"% A in (reference-left. Tmp) do% A > > Grepseq-left. Txt;

the instruction For generating the right-end GREPseq file in the step (3) is seqkit subseq-r-90, -20reference.fasta |seqkit sliding-W15-s 5|seqkit fx2tab > > reference-right. Tmp, for/F "keys=2"% A in (reference-right. Tmp) do echo-right. Txt;

(4) Aligning the split file with a reference sequence using minimap2 and samtools and analyzing; the step (4) is that the instruction for comparing the split file with a reference sequence is as follows:

（1’’）samtools view -bS amplicon.sam > amplicon.bam；

（2’’）samtools sort -O bam -o amplicon.sorted.bam -T temp amplicon.bam；

（3’’）samtools index amplicon.sorted.bam。

2. the method of claim 1, wherein the analysis of step (4) comprises any one or a combination of at least two of a dsODN insertion analysis, an HDR efficiency analysis, or a plasmid backbone insertion analysis.

3. The method of claim 1, wherein the PCR of step (1) is long fragment PCR.

4. The method of claim 1, wherein the length of the barcode sequence of step (1) is 8-20 nt.