CN111440846B - Position anchoring bar code system for nanopore sequencing library building - Google Patents

Position anchoring bar code system for nanopore sequencing library building Download PDF

Info

Publication number
CN111440846B
CN111440846B CN202010276679.2A CN202010276679A CN111440846B CN 111440846 B CN111440846 B CN 111440846B CN 202010276679 A CN202010276679 A CN 202010276679A CN 111440846 B CN111440846 B CN 111440846B
Authority
CN
China
Prior art keywords
barcode
sequence
anchor
sequencing
anchored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010276679.2A
Other languages
Chinese (zh)
Other versions
CN111440846A (en
Inventor
戴岩
胡龙
张烨
肖念清
任用
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiansheng Medical Examination Laboratory Co ltd
Jiangsu Xiansheng Medical Diagnosis Co ltd
Original Assignee
Beijing Xiansheng Medical Examination Laboratory Co ltd
Jiangsu Xiansheng Medical Diagnosis Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiansheng Medical Examination Laboratory Co ltd, Jiangsu Xiansheng Medical Diagnosis Co ltd filed Critical Beijing Xiansheng Medical Examination Laboratory Co ltd
Priority to CN202010276679.2A priority Critical patent/CN111440846B/en
Priority to PCT/CN2020/085645 priority patent/WO2021203461A1/en
Publication of CN111440846A publication Critical patent/CN111440846A/en
Application granted granted Critical
Publication of CN111440846B publication Critical patent/CN111440846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B70/00Tags or labels specially adapted for combinatorial chemistry or libraries, e.g. fluorescent tags or bar codes
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B80/00Linkers or spacers specially adapted for combinatorial chemistry or libraries, e.g. traceless linkers or safety-catch linkers

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application relates to a position anchoring bar code system for nanopore sequencing library building, a preparation method and application thereof. The position anchoring bar code system has higher resolution and higher classification accuracy, can obviously reduce the false positive rate of identification, improves the nanopore sequencing precision on the whole, and reduces the sequencing cost.

Description

Position anchoring bar code system for nanopore sequencing library building
Technical Field
The invention relates to the field of gene sequencing, in particular to a position anchoring bar code system for nanopore sequencing library building.
Background
At present, clinical infection patients are numerous and the infection sources are various worldwide, and in China, infectious diseases even account for 49 percent of the total disease of all the diseases. The conventional clinical diagnosis method is to determine the infection source of symptoms through the empirical judgment of doctors and microscopic examination, biochemical analysis and the like, but the limitations of human factors, detection period and detection range easily cause false detection and missed judgment, and are particularly not favorable for diagnosis and treatment of acute infection. With the rapid development of high-throughput sequencing and genomics, the metagenomic sequencing technology can rapidly, comprehensively and objectively identify the composition of microorganisms in a sample, and is increasingly and widely applied to the detection of infectious pathogenic microorganisms in the field of infection diagnosis, so that a more accurate diagnosis basis is provided for clinical decision and subsequent medication.
The Illumina second generation sequencing is well developed in China, but the following problems exist when the Illumina second generation sequencing is applied to microbial detection: firstly, the reading length of the second-generation sequencing is below several hundred bp, and higher homologous sequences exist among different species of microorganisms, so that the accuracy of metagenomic species analysis is poor, irrelevant microorganism information is fed back in a data report, and a doctor is caused to have greater diagnosis interference; secondly, the identification of more deep pathogenic genes and drug-resistant genes requires assembling and splicing of sequencing sequences, so that complex analysis requires higher time and capital cost to make up for the reading length defect of second-generation sequencing data; in addition, instruments related to the second-generation sequencing are expensive, complex to operate, high in early-stage investment and long in whole sequencing time, and are difficult to meet the requirement of acute infection. The third generation sequencing technology PacBio is greatly improved in sequencing read length, can detect long fragment data of 8-12kb, even 40-70kb, but has the defect that the library building process is complex. Moreover, the method has the defect of long sequencing period as the second-generation sequencing, and after one round of sequencing is finished, dozens of hours are needed to finish off-line data, and the quick identification of pathogenic microorganisms is difficult to meet due to the subsequent analysis time.
The nanopore sequencing technology just makes up the disadvantages of other sequencing platforms, so that the sequencing fragment has long reading length, and the library building and sequencing time are short. In addition, the equipment is small and portable, data generation and letter generation analysis can be carried out in real time, and the limitation of a sequencing site and the delay of report feedback are perfectly solved. Therefore, the technology is very suitable for the analysis and identification of clinical infectious microbial pathogens. However, the upper computer chip for nanopore sequencing is very expensive and the price is not user-friendly. The use of barcode sequence (barcode) information to resolve multiple samples is a common cost-effective strategy for high-throughput DNA sequencing. And (3) introducing a unique bar code sequence into each DNA sample in the library building process, and after a plurality of bar code DNA samples are sequenced by the same flow cell at the same time, classifying and distinguishing different computer-mounted samples according to the bar code sequences. Expensive chips in nanopore sequencing technology make multiple sample machines have obvious economic advantages, allowing users to amortize the fixed cost of one flow cell. A series of kits released by Oxford nanopore company provide 12 different barcodes with the length of 24bp, and the barcodes are connected to two ends of a sample DNA sequence in the library building process and then sequenced on a machine, so that one chip can simultaneously obtain sequence information of 12 different samples. However, when samples are distinguished according to a bar code sequence in the follow-up process, the phenomenon of confusion of the bar code with the length of 24bp carried by the library construction kit is serious, and the reason is that the error rate of single base in reads can reach 10-15% in the process of converting a current signal into a base (namely, basefilling), so that when the reads are classified according to the bar code sequence in downstream data analysis, cross contamination of data among the samples can be caused due to error in bar code identification, so that the false positive identification of microorganisms is caused, and great trouble is brought to clinical decision.
The present invention has been made based on this.
Disclosure of Invention
The invention aims to solve the technical problem of improving the accuracy of the existing nanopore sequencing data sample bar code comparison process.
Considering that the sample bar code comparison of the nanopore sequencing platform often generates errors, the subsequent data processing flow is greatly influenced. According to the method, through deep excavation of a large amount of data, errors occurring during sequence alignment of the nanopore sequencing platform are classified and statistically analyzed, and the influence of different error rate types on sequence identification is quantified. It was surprisingly found that sequencing errors of the Indel type (Indel) greatly increase the error rate of sequence identification, while the base Mismatch type (Mismatch) has less influence on the increase of the error rate of sequence identification, so that the influence on the accuracy increase is limited when the length of a barcode is increased in the design of a sample barcode, and the accuracy increase is larger when a position-anchored sequence is added to the barcode with the same length. Based on the discovery, the invention constructs a set of position anchoring bar code system containing a position anchoring sequence, verifies the library construction kit SQK-PBK004 of a nanopore company, performs library construction and machine operation on 10 pure bacteria, and performs classification comparison on off-machine data through the original bar code system and the position anchoring bar code system respectively, and the result shows that the position anchoring bar code system has better sample classification accuracy and is improved by more than 3 orders of magnitude compared with the original bar code system.
Therefore, the first objective of the present invention is to provide a position-anchored barcode system for improving the accuracy of sample nanopore sequencing resolution.
The second purpose of the invention is to provide a preparation method and application of the position anchoring bar code system.
In order to achieve the purpose, the invention provides the following technical scheme:
the invention provides a position anchoring bar code system for nanopore sequencing library building, which is characterized by comprising the following structures:
[BARCODE-ANCHOR]n-BARCODEn+1
wherein n is more than or equal to 1,
the BARCODE is a bar code sequence,
the ANCHOR is an ANCHOR sequence.
In some embodiments, the system includes the following structure: FLANK1- [ BARCODE-ANCHOR]n-
BARCODEn+1-FLANK2,
The FLANK is a flanking sequence,
in some embodiments, 1 ≦ n ≦ 10; preferably, n is 1, 2, 3.
In some embodiments, the BARCODE sequences are the same or different; preferably, the BARCODE sequences are different;
in some embodiments, the ANCHOR sequences are the same or different; preferably, the ANCHOR sequences are different.
In some embodiments, the ANCHOR sequence is 5-50bp in length; preferably, the length of the ANCHOR sequence is 10-35 bp;
in some embodiments, the ANCHOR sequence has < 70% homology to the BARCODE sequence; preferably < 50%.
In some embodiments, the FLANK length is 10-30 bp; preferably, the length of the FLANK sequence is 15-25 bp;
in some embodiments, the position-anchored barcode system for nanopore sequencing pooling comprises any one of the following structures:
FLANK1-BARCODE1-ANCHOR1-BARCODE2-FLANK2;
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-FLANK2;
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-ANCHOR3-
BARCODE5-FLANK2;
in some embodiments, the ANCHOR sequences are different or the same, preferably the ANCHOR sequences are different;
in some embodiments, the BARCODE sequences are different or the same, preferably the BARCODE sequences are different.
The invention also provides a preparation method of the position anchoring bar code system for nanopore sequencing library building, which is characterized by comprising the following steps of: the method comprises directly synthesizing the nucleotide sequence of the position anchoring bar code system, or preparing the position anchoring bar code system by connecting after segmented synthesis.
In some embodiments of the invention, when preparing a position-anchored barcode system comprising an existing barcode linker, the preparation is as follows: on the basis of the existing nanopore sequencing library building bar code, the bridging primer is utilized to realize the series connection of the existing bar code joint and the designed bar code joint; preferably, the bridging primer sequence is ANCHOR in a position-anchored barcode system structure.
In some embodiments of the invention, the existing barcode linker is derived from the original barcode of the SQK-PBK004 kit from ONT corporation.
The invention also provides application of the position anchoring bar code system for nanopore sequencing library building in improving sequencing sample classification accuracy.
The invention also provides application of the position anchoring bar code system for nanopore sequencing library building in reduction of false positive of sequencing sample classification.
The invention also provides application of the position anchoring bar code system for nanopore sequencing library construction in sequencing library construction.
The invention also provides application of the position anchoring bar code system for nanopore sequencing library building in sequencing.
The invention also provides a method for constructing the sequencing library, which is characterized in that the sequencing library is constructed by utilizing the position anchoring bar code system for constructing the nanopore sequencing library.
The invention also provides a sequencing joint, which is characterized in that the sequencing joint sequence comprises the position anchoring bar code system.
The invention also provides a composite, wherein the composition is attached to the position-anchored barcode system described above.
The invention also provides a composition, which is characterized by comprising the position anchoring bar code system.
The invention also provides a kit for nanopore sequencing library building, which is characterized by comprising the position anchoring bar code system or the sequencing joint.
The invention has the beneficial technical effects that:
1) the invention proves that the insertion deletion type error is the main reason of the whole sequence alignment error for the first time, and compared with the basic group mismatching type error, the influence of the basic group mismatching type error on the whole sequence alignment error is small. In practice, the invention limits the error expansion of indel types to the whole comparison result by introducing an anchoring sequence into a bar code system, greatly reduces the comparison score reduction value caused by insertion deletion, screens out remote bar code interference and achieves accurate bar code resolution; compared with the method of only improving the length of the barcode sequence, although the increase of the length of the barcode can properly reduce the sample classification errors caused by base mismatch, the improvement of the accuracy of the whole sequence comparison result is very limited, and the position anchoring barcode system has extremely remarkable effect on the improvement of the result accuracy.
2) The invention designs FLANK1-BARCODE by skillfully utilizing self-carried BARCODEs to connect self-developed BARCODE sequences and utilizing the connecting part sequences as anchoring sequences based on the nano-pore platform SQK-PBK004 library building process1-ANCHOR2-BARCODE2A position-anchored barcode system of the type FLANK2, which can increase the classification accuracy from 0.999 to 0.999999 when resolving different samples.
3) In practical application, the position anchoring bar code system can design bar codes with different lengths and anchoring sequence numbers according to different requirements, and realizes the balance of classification accuracy and microorganism detection rate of different requirements.
4) The position anchoring bar code system has better resolution, higher accuracy, reduced false positive identification, improved nanopore sequencing precision and reduced sequencing cost, and is suitable for popularization and application.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 error rate statistics based on sequencing data of a kit barcode system; FIG. A shows the average error rate and median error rate for each site in the actual sequencing of 10 sets of kit barcode linker sequences; FIG. B shows the average error rate and median error rate of three types of errors, i.e., insertion, deletion and mismatch, of each site alignment in the actual sequencing of 10 sets of kit barcode linker sequences; FIG. C shows the correspondence between barcode linker sites of the kit and errors in the comparison; graph D shows the sub-class summary of the error types of the comparison of the bar code joint of the kit, which shows the distribution of the error types of the comparison that occurs under different sites, the abscissa is the base position of the bar code sequence, the ordinate is the error type, the color shade of the lattice in the graph shows the error rate of the error type at the site, the darker the color indicates the higher the error rate, the color Block of the Block annotation area shows different elements of the joint sequence, and the error type is clustered and analyzed by Euclidean distance;
FIG. 2. Effect of different alignment error types on overall sequence classification accuracy; graph a shows a total error rate of 8%; graph B represents a total error rate of 16%;
fig. 3. effect of not containing ANCHOR sequences (ANCHOR 0) and containing 1 ANCHOR sequence (ANCHOR 1), containing 2 ANCHOR sequences (ANCHOR 2) on overall accuracy of the barcode alignment;
FIG. 4 is a schematic diagram of the original and optimized library building process;
FIG. 5 is a sample classification accuracy comparison of the original barcode system and the position-anchored barcode system, with the shaded portion indicating the result of accurate classification and the other results being the result of misclassification.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Definition of partial terms
Unless defined otherwise below, all technical and scientific terms used in the detailed description of the present invention are intended to have the same meaning as commonly understood by one of ordinary skill in the art. While the following terms are believed to be well understood by those skilled in the art, the following definitions are set forth to better explain the present invention.
As used herein, the terms "comprising," "including," "having," "containing," or "involving" are inclusive or open-ended and do not exclude additional unrecited elements or method steps. The term "consisting of …" is considered to be a preferred embodiment of the term "comprising". If in the following a certain group is defined to comprise at least a certain number of embodiments, this should also be understood as disclosing a group which preferably only consists of these embodiments.
The terms "about" and "substantially" in the present invention denote an interval of accuracy that can be understood by a person skilled in the art, which still guarantees the technical effect of the feature in question. The term generally denotes a deviation of ± 10%, preferably ± 5%, from the indicated value.
Where an indefinite or definite article is used when referring to a singular noun e.g. "a" or "an", "the", this includes a plural of that noun.
Furthermore, the terms first, second, third, (a), (b), (c), and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
The following terms or definitions are provided only to aid in understanding the present invention. These definitions should not be construed to have a scope less than understood by those skilled in the art.
Some technical terms in the present invention are explained as follows:
the position anchoring bar code system comprises a plurality of bar code sequencing label systems connected in series by two or more bar code sequences, wherein the bar codes are anchored by specific ANCHOR sequences, and the system can be applied to construction of a sequencing library in nanopore sequencing, can improve the classification accuracy of a sequencing sample and reduce the application of false positive in sequencing sample classification. The specific structure of the [ BARCODE-ANCHOR ] can be described by the invention]n-BARCODEn+1Wherein n is more than or equal to 1, the BARCODE is a BARCODE sequence, and the ANCHOR is an anchoring sequence. It is understood that any composition, composite, or system, etc., comprising the above-described structures is within the scope of the present invention. Although the present invention is explained by taking the bar code of SQK-PBK004 kit as an example in the prior art, it is only an exemplary illustration and is not a limitation of the present invention. The invention has been verified by specific theory of biological theory analysis and wet experiment, and proves that any inclusion [ BARCODE-ANCHOR]n-BARCODEn+1The bar code system with the structure can be used for constructing a sequencing library, the sequencing sample classification accuracy can be improved, and the false positive of the sequencing sample classification can be reduced. In some preferred embodiments of the present invention, the position-anchored barcode system structure may be as follows: FLANK1- [ BARCODE-ANCHOR]n-BARCODEn+1FLANK2, wherein FLANK is a linker sequence, and the linker sequence is a conventional module for construction of sequencing libraries, and the addition of the module is understood in the art, and FLANK1 and 2 can be in sequence according to actual needsThe same or different.
In view of the fact that a large amount of data is deeply mined, errors occurring during sequence alignment of a nanopore sequencing platform are classified and statistically analyzed, the influence of different error rate types on sequence identification is quantified, it is found that sequencing errors of insertion deletion types (indels) can greatly improve the error rate of sequence identification, and the influence of base Mismatch types (mismatches) on the improvement of the error rate of sequence identification is small, so that the influence of improving the length alignment accuracy of barcodes in the design of sample barcodes is limited. In addition, the problem of sequence length is also confirmed in some embodiments of the present invention, for example, in example 2, it is mentioned that "when 0.16 base mismatch type error is introduced to achieve 99.99% of the overall alignment accuracy, the barcode length only needs to reach 40 bp; and when 0.16 insertion deletion type error is introduced, the length of the barcode needs to reach 80bp ". It can be seen that the length of the position-anchored barcode system of the present invention is appropriately selected according to practical needs in the art, such as 1 ≦ n ≦ 10 in some embodiments of the present invention, such as n ≦ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10; preferably, n is 1, 2 or 3.
It is understood that the sequences of BARCODE as a marker sequence in sequencing may be the same or different in the position-anchored BARCODE system of the present invention; in some preferred embodiments, the BARCODE sequences are different. Also, the ANCHOR sequences serve as ANCHOR components, and the sequences may be the same or different, and in some preferred embodiments, the ANCHOR sequences are different. In addition, the length of the ANCHOR sequence may be known in the art, and may be, for example, 5-50bp, and in some preferred embodiments, the length of the ANCHOR sequence is 10-35 bp.
The ANCHOR sequence as the anchoring component of BARCODE, the sequence of which should be distinguished from the BARCODE sequence, without particular limitation, the homology of the ANCHOR sequence with the BARCODE sequence may be < 80%, < 70%, < 60%, < 50%, < 40%, < 30%, < 20%, < 10%; in some preferred embodiments, the homology is < 50%.
It will be appreciated that as some exemplary position-anchored barcode systems of the present invention, the structure thereof may be embodied as follows:
FLANK1-BARCODE1-ANCHOR1-BARCODE2-FLANK2;
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-FLANK2;
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-ANCHOR3-
BARCODE4-FLANK2;
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-ANCHOR3-
BARCODE4-ANCHOR4-BARCODE5-FLANK2;
……
the 'bar code joint' of the invention refers to a complete section containing a bar code sequence and flanking sequences at two ends. For example, in the present embodiment, the autonomously designed barcode linker is defined as a BBRCD linker, and the barcode linker in the original kit SQK-PBK004 is defined as an ABRCD linker.
The 'barcode sequence' refers to a specific sequence of a barcode, which is contained in a barcode linker and is a part of the barcode linker. For example, in the embodiment of the present invention, the self-designed barcode sequence is defined as BBRCD, and the barcode sequence in the original kit SQK-PBK004 is defined as ABRCD.
"Anchor sequence" (ANCHOR) as used herein refers to a nucleotide sequence for anchoring BARCODE, which may be of any length known in the art, such as 5-50bp, and in some preferred embodiments, 10-35 bp; the sequences thereof should be distinguished from BARCODE sequences without particular limitation, and the homology of ANCHOR sequences to BARCODE sequences may be < 80%, < 70%, < 60%, < 50%, < 40%, < 30%, < 20%, < 10%; in some preferred embodiments, the homology is < 50%. Exemplary, such ANCHOR sequences as mentioned in the examples of the present invention are SEQ ID NO.50, SEQ ID NO.51 and SEQ ID NO.13, etc.
The 'FLANK' refers to flanking sequences at two ends of a barcode system, and is a conventional component of a sequencing barcode connector, for example, for a nanopore sequencing platform, FLANK1 in the invention is a Y-type sequencing connector connected with a protein containing motor, so that DNA can be ensured to realize normal sequencing through a nanopore; FLANK2 is used to ligate the sequenced sample sequence and can be any length known in the art to be verified, such as 10-30bp, and in some preferred embodiments, 15-25 bp. Exemplary, FLANK sequences SEQ ID NO.16 and SEQ ID NO.26, etc., as mentioned in the examples of the present invention.
The invention is further described by the accompanying drawings and the following examples, which are intended to illustrate specific embodiments of the invention and are not to be construed as limiting the scope of the invention in any way. Unless otherwise indicated, the experimental procedures disclosed in the present invention are performed by conventional techniques in the art, and the reagents and raw materials used in the examples are commercially available.
Example 1 alignment error statistics
According to the invention, the main reason for causing bar code confusion is the sequence difference between the sequencing bar code and the preset real bar code, so that the difference between the sequencing bar code sequence and the real bar code sequence is classified and sorted firstly. For this purpose, the present invention separately libraries and machines the sample DNA using 10 sets of barcodes of SQK-PBK004 kit from ONT company, intercepts 250bp of the 5' end of the sequencing data to ensure the inclusion of barcode region, and then performs global alignment (overlap alignment) with the corresponding preset barcode linker. Finally, the comparison difference of each position of the barcode connector sequence is summarized and sorted according to the output multiple comparison files, and the error position distribution and the error type of the sequencing barcode are counted. Among them, the error types are classified into three major groups, i.e., insertion (I), deletion (D), and mismatch (X).
As a result: firstly, in a whole, after 10 groups of sequencing data are preliminarily filtered by a fastq format, total amount of reads is 28543061, but the number of reads actually participating in error statistics is only 24075634, so that about 15.65% of reads without comparison results are filtered in the error statistics process. The average error rate of each site in the actual sequencing of 10 groups of barcode sequences involved in statistics was 8.01%, and the median error rate was 5.80% (fig. 1A), wherein the average error rate of one group of aligned data was as high as 10.31%, and the statistical result of the group with the lowest average error rate was also 6.53%.
Secondly, the error rate of insertion in the actual sequencing of 10 groups of barcode sequences is 2.28% on average, the error rate of deletion is 3.58% on average, and the average error rate of base mismatching is 2.16% on the basis of the three error types of alignment (FIG. 1B); the median error rates were 1.70%, 2.16% and 1.57%, respectively.
From fig. 1B, the average error rate of each site of barcode linker is significantly different from the median, so the present invention further summarizes the alignment errors of different error types at different barcode sequence positions. FIG. 1C shows that there was no significant error rate difference at the remaining positions except for the higher error rate of the first 6 bases at the 5' end due to the initial instability of sequencing.
Further, since the influence of the location on the error type is small, the present invention carefully summarizes the subclasses of the error types without considering the location (fig. 1D): the default base of the reference sequence plus the mutant base of the query sequence, e.g., GA, indicates that G is mismatched to A; deletion of alignment sites is represented by missing bases plus subtitles D, such as GD; the insertion of the alignment site is represented by I, the inserted base is attached to I, and the insertion of the base C occurs at the site as indicated by IC; 2 insertions are I2, and 3 insertions are II; the occurrence of an alignment mismatch at the site followed by a 1 base insertion is denoted by X2 and 2 or more insertions are XX. According to the distance between different error types in FIG. 1D, except for the GA and AG mismatch types, the mismatch probability between other bases is similar and has no obvious bias; the 4 rows with the deepest average chroma are observed to find the comparison deletion probability of exactly 4 bases, the average error rate is about 0.09, and the bias of base deletion is not obvious; finally, observing and aligning the inserted error types, the 5 rows with the most uniform color block distribution indicate that the probability distances of inserting different single bases are close and are positioned at the same clustering level with the error type I2 of inserting 2 bases. In summary, the data for different error type subclasses are as follows:
table 1.
Type of error Error rate
D 0.03580
I1 0.01575
XN 0.01340
GA 0.00548
AG 0.00267
I2 0.00368
II 0.00130
X2 0.00172
XX 0.00032
In the above table, XN represents all alignment mismatch types except GA and AG mismatch, I1 represents random insertion of single base, I2 and II represent random insertion of 2, 3 to 5 bases, respectively, and the ratio of insertion of 3, 4, 5 bases is 8: 1.8: 0.2; x2, XX indicates that sites are randomly introduced with mismatched bases followed by insertion of 1 or more bases.
Example 2 influence of different error types and barcode lengths on overall accuracy of barcode comparison
According to the different error types and the corresponding error rate values of the embodiment 1, the invention simulates barcode sequences with 6 groups of lengths of 20bp, 40bp, 60bp, 80bp, 100bp and 120bp respectively, wherein each group simulates 12 different barcode elements. The invention takes 80bp as an example to illustrate the simulation concrete outline. Firstly, 12 ideal barcode sequences are preset in the invention, and the sequence information is shown in table 2:
table 2.
Figure BDA0002443630410000101
Figure BDA0002443630410000111
Three different types of errors were then introduced, respectively indels only, base mismatches only, both indels and base mismatches, with a total error rate of 0.08 per site. The proportion of the distribution of the detailed error rates for each error type is in accordance with Table 1, and is increased or decreased proportionally to the value of the total error rate. A plurality of samples of a flow cell are loaded on a computer, a plurality of bar code DNAs exist in actual data loaded on the computer, bar codes among the samples can be mixed, 100,000 connector sequences are simulated for each preset bar code under each group length, all the simulated sequences are mixed together to simulate the condition that 12 samples are loaded on the computer at the same time, the names of the simulated sequences are numbered by adding numbers to the preset bar code names, the data are classified through the same letter generation analysis process, and finally whether crosstalk occurs is judged by comparing the bar code information obtained through classification with the names of the simulated sequences.
As a result: as shown in fig. 2A, when the total error rate is 0.08 as shown in fig. 2A, the accuracy of the overall alignment of barcode sequences gradually increases as the length of the barcode increases, regardless of the type of error introduced; when the length of the barcode sequence is the same, the accuracy of the whole alignment of the barcode sequence introduced with the insertion deletion type error is obviously lower than the accuracy of the whole alignment of the barcode sequence introduced with the base mismatch type error. These two conclusions are still true when the overall error rate is raised to 0.16 (fig. 2B). For example, in order to achieve 99.99% of the overall alignment accuracy, when a base mismatch type error of 0.16 is introduced, the barcode length only needs to reach 40 bp; when 0.16 insertion deletion type error is introduced, the length of the barcode needs to reach 80 bp. Therefore, the influence of the insertion deletion type error on the alignment accuracy is larger than that of the base mismatch, and the influence of the barcode length on the alignment accuracy is limited.
Example 3 Effect of insertion of an Anchor sequence on the Overall accuracy of barcode alignment
Based on the conclusion of the above example 2, the invention is designed to modify the barcode fragment in the following two ways, taking 80bp as an example:
1) insertion of 1 anchor sequence: replacing 12 bases in the middle of the barcode sequence with an anchor sequence (shown by underline in table 2) with the same 12bp sequence, namely replacing the original preset barcode fragment with a short barcode-anchor sequence-short barcode form with the same length;
2) insertion of 2 anchor sequences: the 20bp sites at both ends of the barcode sequence in Table 2 were replaced with 12bp anchor sequence 1 and anchor sequence 2 (shown in bold in Table 2), i.e., short barcode-anchor sequence 1-short barcode-anchor sequence 2-short barcode of the same length.
And locking the position range of the short bar codes through the position information obtained by the anchor sequence in the comparison process, and finally realizing data classification according to the short bar code combination result.
As a result: the 12bp underlined in table 2 was replaced with a specific ANCHOR sequence GGTGCTGTTAAC (SEQ ID No.50), introducing the error type and error distribution in example 1 at each site, respectively, with a total error rate randomly distributed within the range of E +0.5E (where the values of E are 0.08, 0.12, 0.16, respectively), simulating 6000 million barcode sequences at each E value;
similarly, 12bp of the bold sequence in Table 2 was replaced with the anchor sequence GGTGCTGTTAAC (SEQ ID NO.50) and the anchor sequence GTACGGAAGTCG (SEQ ID NO.51), respectively, for sequence simulation.
Subsequently, the data were classified using ANCHOR sequence-locked barcode position ranges during the alignment process, and the classifications of no ANCHOR sequence (ANCHOR ═ 0 group), 1 ANCHOR sequence (ANCHOR ═ 1 group), and 2 ANCHOR sequences (ANCHOR ═ 2 group) were accurately determined and compared. As can be seen from fig. 3, the classification accuracy is significantly improved after the anchoring of the sequence is considered:
when the E value is 0.08, the accuracy of 80bp ANCHOR which is 0 component can only reach 99.9999%; the accuracy of ANCHOR-1 group and ANCHOR-2 group containing anchoring sequence is improved to 100%;
when the E value is raised to 0.12, the resolution levels are ranked as follows: ANCHOR 2 group > ANCHOR 1 group > ANCHOR 0 group;
when the E value is increased to 0.16, the difference in the classification accuracy between the three increases, and the classification accuracy of ANCHOR-2 group is still 100% for the simulation data generated by the present invention.
By embodiments 2 and 3, the sample resolution accuracy can be improved to a certain extent by increasing the length of the barcode, but the improvement rate gradually decreases with the increase of the length, and the increase of the order of magnitude property is difficult to achieve; by inserting an anchor sequence anchor barcode region, insertion deletion degradation in comparison can be obviously weakened, 100% accuracy can be achieved in simulation data, and data classification accuracy is also obviously improved along with the increase of the anchor sequence. According to the error rate of 10-15% of single base in the basecall process of the Oxford nanopore sequencer and the total reads number of one-time complete sequencing, the fact that the accuracy of at least 3 orders of magnitude can be improved by introducing a bar code sequence into an anchoring sequence in the machine sequencing data is presumed.
Example 4 preparation of position-anchored barcode System and library construction
In order to further verify the theory through experiments, in this embodiment, as an example, on the basis of the original nanopore library-built PCR barcode kit SQK-PBK004 of ONT company, the existing single barcode linker (with a sequence structure of FLANK1-ABRCD-FLANK3 ') is skillfully utilized, and is connected with the self-designed barcode linker (with a sequence structure of FLANK 5' -BBRCD-FLANK2) through a connection reaction to form the position-anchored barcode system of the present invention: FLANK1-ABRCD-ANCHOR-BBRCD-FLANK2 (wherein, ANCHOR sequence is a sequence formed by connecting FLANK3 'and FLANK 5' through a connection reaction). FLANK2 continues to ligate to sample DNA of known origin by ligation, resulting in the final DNA to be sequenced possessing a position-anchored barcode system. The design is such that we can reverse the concatenated barcode sequence by the known result of the sample DNA, thereby quantifying the barcode system classification accuracy.
The specific construction process is as follows:
according to the barcode sequence information of other kits of ONT company, 10 barcode sequences BBRCD with excellent discrimination are obtained through combination and comparison; then adding a segment of conserved flanking sequence FLANK with the length of 13bp at the 5' end, wherein the segment of sequence has good PCR primer characteristics, moderate GC content, no hairpin, no dimer structure and the like, and a Y-shaped structure of an original PCR joint is used to avoid the situation that a plurality of barcodes are continuously connected in series in the connecting step; the ANCHOR sequence is also the 3 'end sequence (SEQ ID NO.13) of the PCR bridging primer in the test, and is consistent with the self-designed barcode linker FLANK 5', and the 5 'end base sequence is consistent with the FLANK 3' of the original barcode linker of the kit, so that a section of sequencing DNA fragment with 5 'end and 3' end connected with the double barcode linker in series simultaneously is obtained through one PCR reaction.
The PCR bridging primer sequence is as follows:
F1:5’TTCTGTTGGTGCTGATATTGCCCGACTTCCGTAC-3’(SEQ ID NO.13)
F2:5’-ACTTGCCTGTCGCTCTATCTTCCCGACTTCCGTAC-3’(SEQ ID NO.14)
note: the 13bp sequence corresponding to the autonomously designed barcode linker, FLANK 5', is underlined.
The sequence of the self-designed barcode linker FLANK 5' -BBRCD-FLANK2 is shown in Table 3:
TABLE 3
Figure BDA0002443630410000141
The sequence of the original barcode linker FLANK1-ABRCD-FLANK 3' of the SQK-PBK004 kit is shown in Table 4:
TABLE 4
Figure BDA0002443630410000142
Figure BDA0002443630410000151
This example is directed to 10 standard pure strains of Brevibacillus borstelensis, Pseudomonas aeruginosa, Escherichia coli, Salmonella enterica, Klebsiella pneumoniae, Listeria monocytogenes, Staphylococcus aureus; acinetobacter baumannii, Bacillus subtilis and Stenotrophoron maltophia, and 10 sets of position-anchored barcode sequences were specifically prepared by introducing different position-anchored barcode sequences into each pure bacteria sample using the library construction procedure optimized in FIG. 4, as shown in Table 5.
TABLE 5
Figure BDA0002443630410000152
Figure BDA0002443630410000161
The specific nucleotide sequences of the 10 sets of ANCHOR barcodes are shown in table 6, wherein the underlined portions are ANCHOR sequences.
TABLE 6
Figure BDA0002443630410000162
Figure BDA0002443630410000171
The specific library preparation steps are as follows:
(I) annealing
1. Diluting the prepared joint lyophilized powder with annealing solution (1mM EDTA; 50mM NaCl; 5mM Tris-HCl pH 7.5) to 100 μ M;
2. mixing the complementary strands (4 u1 each), incubating at 95 deg.C for 5min, and slowly cooling to room temperature (about 25 deg.C) on a PCR instrument;
3. diluting the annealed joint to 640mM by using an annealing solution;
stored at 4.4 ℃ for later use.
(II) end repair
100ng of nucleic acid sample taken from 1.0.2ml PCR tube was filled with water to 50. mu.l
2. Adding 7 μ l of Ultra II End-prep reaction buffer, 3 μ l of Ultra II End-prep enzyme mix, rotating and mixing at room temperature, incubating at 20 deg.C for 5min, incubating at 65 deg.C for 5min, and standing at room temperature;
3. adding 1 xbeads (AMPure XP beads) to the mixture, uniformly mixing the mixture at room temperature, incubating the mixture for 5min, and removing supernatant on a magnetic force releasing frame instantly;
washing the fresh 4.200 μ l of 80% alcohol for 2 times with beads;
5. adding 17 μ l of nucleic-free water, rotating and mixing at room temperature, incubating for 2min, and placing on magnetic frame immediately until it is clear;
6. carefully pipette 15. mu.l of supernatant into a 0.2ml PCR tube
(III) Joint connection
Adding 10 mul of diluted adaptor and 25 mul of Blunt/TA Ligase Master Mix into a 1.0.2ml PCR tube (15 mul of End-pre-ligated DNA), uniformly mixing by blowing and beating with a gun, and incubating for 25-30 min at 21 ℃;
2. adding 0.4 xbeads, rotating and mixing at room temperature, incubating for 5min, and removing supernatant on a magnetic stand;
washing the fresh 3.200 μ l of 80% alcohol for 2 times with beads;
4. adding 15 μ l of nucleic-free water, rotating and mixing at room temperature, incubating for 2min, and placing on magnetic frame immediately until it is clear;
5. carefully pipette 13.5. mu.l of the supernatant into a 0.2ml PCR tube.
(IV) template amplification
Volume (μ l)
Platinum SuperFi PCR Master Mix 25
SuperFi GC Enhancer 10
DNA 13.5
RYP-F 0.25
RYP-R 0.25
ABRCD primer 1
2. Preparing, carrying out vortex mixing, placing on a PCR instrument after instantaneous separation, and setting a program;
Figure BDA0002443630410000181
Figure BDA0002443630410000191
3. after the reaction is finished, adding 0.6 xbeads, uniformly mixing and incubating for 5min at room temperature, and removing the supernatant on a magnetic frame in an instant separation manner;
washing the fresh 4.200 μ l of 80% alcohol for 2 times with beads;
5. adding 12 μ l 10mM Tris-HCl (50mM NaCl) pH 8.0 eluate, rotating at room temperature, mixing, incubating for 2min, and releasing magnetic frame to clarify; carefully pipette 11. mu.l of supernatant into a 1.5ml low adsorption centrifuge tube
And 6, QC: take 1. mu.l of the Qubit to detect.
7. Adding 1 μ l RAP into 10ul of the washed and product-removed solution, reacting at room temperature for 5min, and preparing for loading.
8. According to the standard nanopore sequencing machine-on process, single-sample machine-on sequencing is carried out on 10 different strains.
EXAMPLE 5 actual sample detection analysis
Based on the experimental procedure of example 4, single sample sequencing was performed on 10 different strains and it was ensured that only one barcode was attached to each sample. Since only a single barcode is used for each sequencing case, if the barcodes which do not correspond to the sample are aligned in the data of the corresponding sample, the sample is considered to be misclassified. The credit generation analysis of the invention uses the official software guppy of the Oxford nanopore company to evaluate the sample classification accuracy of the original barcode system, and uses the autonomous software process to evaluate the sample classification accuracy of the position anchoring barcode system. According to the invention, the resolution capability of 10 groups of position anchoring bar codes is considered, and the accuracy of final reads classification is calculated in the range of the 10 groups of bar codes.
As a result: the results of sample classification using the original barcode system are clearly confounded, with only 99.954% accuracy in fig. 5 for classification using guppy to barcode06, with 0.036% to barcode07 and about 0.01% to other barcodes, with an average accuracy of 99.984% and a proportion of confusion as high as 0.016%. This indicates that the low accuracy of single barcode resolved samples is very likely to cause the leakage of high abundance microorganisms in a certain sample under the condition that multiple samples are simultaneously loaded on the computer, and further cause the detection of high false positives of other samples, and mislead the follow-up clinical diagnosis and treatment decision.
The classification accuracy rate by the position-anchored barcode system reaches 99.9999% on average, as shown in fig. 5, wherein reads classified into barcode01, barcode02, barcode05, barcode09 and barcode10 are consistent with the classification accuracy rate of the simulation data, and are all 100%. Compared with the 99.9% resolution accuracy of the guppy, the resolution accuracy is improved by 3 orders of magnitude, which means that the cardinality for accurately distinguishing samples is improved to million from one thousand of single codes, and the false positive rate caused by reads classification errors is reduced by 1000 times. In summary, the location-anchored barcode classification accuracy is significantly better than a single barcode.
The above description of the specific embodiments of the present application is not intended to limit the present application, and those skilled in the art may make various changes and modifications according to the present application without departing from the spirit of the present application, which is intended to fall within the scope of the appended claims.
Sequence listing
<110> Xiansui medical diagnosis Co., Ltd, Beijing Xiansui medical examination laboratory Co., Ltd
<120> a position anchoring bar code system for nanopore sequencing library building
<130> 2020
<160> 51
<170> SIPOSequenceListing 1.0
<210> 1
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 1
agaacgactt ccatactcgt gtgaaacgag tctcttggga cccatagaag gtctacctcg 60
ctaacaccac tgcgtcaact 80
<210> 2
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 2
ccaaacccaa caacctagat aggcgttcct cgtgcagtgt caagagattt gcgtcctgtt 60
acgagaactc atgagcctct 80
<210> 3
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 3
cttactaccc agtgaacctc ctcggcatag ttctgcatga tgggttaggt aagttgggta 60
tgcaacgcaa tgcatacagc 80
<210> 4
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 4
tgaaacctaa gaaggcaccg tatcctagac accttgggtt gacagacctc agtgaggatc 60
tacttcgacc catgcgtaca 80
<210> 5
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 5
cagacttggt acggttgggt aactggacga agaactcaag tcaaaggcct acttacgaag 60
ctgagggact gcatgtccca 80
<210> 6
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 6
accacaggag gacgatacag agaaccacag tgtcaactag agcctctcta gtttggatga 60
ccaaggatag ccggagttcg 80
<210> 7
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 7
ctttcgttgt tgactcgacg gtagagtaga aagggttcct tcccactcga tccaacagag 60
atgccttcag tggctgtgtt 80
<210> 8
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 8
catctggaac gtggtacacc tgtaactggt gcagctttga acatctagat ggactttggt 60
aacttcctgc gtgttgaatg 80
<210> 9
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 9
agattcagac cgtctcatgc aaagcaagag ctttgactaa ggagcatgtg gaagatgaga 60
ccctgatcta cgtcactact 80
<210> 10
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 10
caggttactc ctccgtgagt ctgatcaatc aagaagggaa agcaaggtca tgttcaacca 60
aggcttctat ggagagggta 80
<210> 11
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 11
ttctgaagtt cctgggtctt gaacgacaga caccgttcat cgactttctt ctcagtcttc 60
ctccagacaa ggccgatcct 80
<210> 12
<211> 80
<212> DNA
<213> Artificial sequence ()
<400> 12
gaatctaagc aaacacgaag gtggtacagt ccgagcctca tgtgatctac cgagatccta 60
cgaatggagt gtcctgggag 80
<210> 13
<211> 34
<212> DNA
<213> Artificial sequence ()
<400> 13
ttctgttggt gctgatattg cccgacttcc gtac 34
<210> 14
<211> 35
<212> DNA
<213> Artificial sequence ()
<400> 14
acttgcctgt cgctctatct tcccgacttc cgtac 35
<210> 15
<211> 13
<212> DNA
<213> Artificial sequence ()
<400> 15
ccgacttccg tac 13
<210> 16
<211> 22
<212> DNA
<213> Artificial sequence ()
<400> 16
cgcatgtgtg tgatgacact gt 22
<210> 17
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 17
aacaaccgaa cctttgaatc agaagtgcaa ctttcccaca ggtag 45
<210> 18
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 18
tggactttgg taacttcctg cgtcggatga acataggata gcgat 45
<210> 19
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 19
caggttactc ctccgtgagt ctgaacggta tgtcgagttc cagga 45
<210> 20
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 20
ctacgtgtaa ggcatacctg ccagctttcg ttgttgactc gacgg 45
<210> 21
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 21
gctgtgttcc acttcattct cctggatcca acagagatgc cttca 45
<210> 22
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 22
tgttaccgtg ggaatgaatc cttagattca gaccgtctca tgcaa 45
<210> 23
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 23
tcctcgtgca gtgtcaagag atccagtaga agtccgacaa cgtca 45
<210> 24
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 24
gttgaatgag cctactgggt cctcgagcct ctcattgtcc gttct 45
<210> 25
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 25
tctcggagat agttctcact gctgtggctt gatctaggta aggtc 45
<210> 26
<211> 45
<212> DNA
<213> Artificial sequence ()
<400> 26
tgagagacaa gattgttcgt ggacaagaaa caggatgaca gaacc 45
<210> 27
<211> 15
<212> DNA
<213> Artificial sequence ()
<400> 27
atcgcctacc gtgac 15
<210> 28
<211> 22
<212> DNA
<213> Artificial sequence ()
<400> 28
tttctgttgg tgctgatatt gc 22
<210> 29
<211> 22
<212> DNA
<213> Artificial sequence ()
<400> 29
acttgcctgt cgctctatct tc 22
<210> 30
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 30
aagaaagttg tcggtgtctt tgtg 24
<210> 31
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 31
gagtcttgtg tcccagttac cagg 24
<210> 32
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 32
gtttcatcta tcggagggaa tgga 24
<210> 33
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 33
ttcggattct atcgtgtttc ccta 24
<210> 34
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 34
cttgtccagg gtttgtgtaa cctt 24
<210> 35
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 35
ttctcgcaaa ggcagaaagt agtc 24
<210> 36
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 36
gtgttaccgt gggaatgaat cctt 24
<210> 37
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 37
ttcagggaac aaaccaagtt acgt 24
<210> 38
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 38
aactaggcac agcgagtctt ggtt 24
<210> 39
<211> 24
<212> DNA
<213> Artificial sequence ()
<400> 39
aagcgttgaa acctttgtcc tctc 24
<210> 40
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 40
aagaaagttg tcggtgtctt tgtgttctgt tggtgctgat attgcccgac ttccgtacaa 60
caaccgaacc tttgaatcag aagtgcaact ttcccacagg tag 103
<210> 41
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 41
gagtcttgtg tcccagttac caggttctgt tggtgctgat attgcccgac ttccgtactg 60
gactttggta acttcctgcg tcggatgaac ataggatagc gat 103
<210> 42
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 42
gtttcatcta tcggagggaa tggattctgt tggtgctgat attgcccgac ttccgtacca 60
ggttactcct ccgtgagtct gaacggtatg tcgagttcca gga 103
<210> 43
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 43
ttcggattct atcgtgtttc cctattctgt tggtgctgat attgcccgac ttccgtacct 60
acgtgtaagg catacctgcc agctttcgtt gttgactcga cgg 103
<210> 44
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 44
cttgtccagg gtttgtgtaa ccttttctgt tggtgctgat attgcccgac ttccgtacgc 60
tgtgttccac ttcattctcc tggatccaac agagatgcct tca 103
<210> 45
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 45
ttctcgcaaa ggcagaaagt agtcttctgt tggtgctgat attgcccgac ttccgtactg 60
ttaccgtggg aatgaatcct tagattcaga ccgtctcatg caa 103
<210> 46
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 46
gtgttaccgt gggaatgaat ccttttctgt tggtgctgat attgcccgac ttccgtactc 60
ctcgtgcagt gtcaagagat ccagtagaag tccgacaacg tca 103
<210> 47
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 47
ttcagggaac aaaccaagtt acgtttctgt tggtgctgat attgcccgac ttccgtacgt 60
tgaatgagcc tactgggtcc tcgagcctct cattgtccgt tct 103
<210> 48
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 48
aactaggcac agcgagtctt ggttttctgt tggtgctgat attgcccgac ttccgtactc 60
tcggagatag ttctcactgc tgtggcttga tctaggtaag gtc 103
<210> 49
<211> 103
<212> DNA
<213> Artificial sequence ()
<400> 49
aagcgttgaa acctttgtcc tctcttctgt tggtgctgat attgcccgac ttccgtactg 60
agagacaaga ttgttcgtgg acaagaaaca ggatgacaga acc 103
<210> 50
<211> 12
<212> DNA
<213> Artificial sequence ()
<400> 50
ggtgctgtta ac 12
<210> 51
<211> 12
<212> DNA
<213> Artificial sequence ()
<400> 51
gtacggaagt cg 12

Claims (17)

1. A position-anchored barcode system for nanopore sequencing pooling, the system comprising the structure:
FLANK1-[BARCODE-ANCHOR]n-BARCODEn+1-FLANK2,
the BARCODE is a BARCODE sequence;
the ANCHOR is an ANCHOR sequence;
n is 1, 2 or 3;
the FLANK is a flanking sequence.
2. The position-anchored BARCODE system of claim 1, wherein the BARCODE sequences are the same or different.
3. The position-anchored BARCODE system of claim 2, wherein the BARCODE sequences are different.
4. The position-anchored barcode system of any of claims 1-3, wherein the ANCHOR sequences are the same or different.
5. The position-anchored barcode system of claim 4, wherein the ANCHOR sequences are different.
6. The position-anchored barcode system of any of claims 1-5, wherein the ANCHOR sequence is 5-50bp in length.
7. The position-anchored barcode system of claim 6, wherein the ANCHOR sequence is 10-35bp in length.
8. The position-anchored BARCODE system of any of claims 1-7, wherein the ANCHOR sequence has < 70% homology to the BARCODE sequence.
9. The position-anchored BARCODE system of claim 8, wherein the ANCHOR sequence has < 50% homology to the BARCODE sequence.
10. The position-anchored barcode system of any of claims 1-9, wherein the structure is any of:
FLANK1-BARCODE1-ANCHOR1-BARCODE2-FLANK2;
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-FLANK2;
or
FLANK1-BARCODE1-ANCHOR1-BARCODE2-ANCHOR2-BARCODE3-ANCHOR3-BARCODE4-FLANK2。
11. A method of making a position-anchored barcode system of any of claims 1-10, wherein: the method comprises directly synthesizing the nucleotide sequence of the position anchoring bar code system, or preparing the position anchoring bar code system by connecting after segmented synthesis.
12. A method of sequencing library construction, wherein a sequencing library is constructed using the position-anchored barcode system of any one of claims 1 to 10.
13. A sequencing adaptor comprising a position-anchored barcode system of any one of claims 1 to 10.
14. A complex comprising the position-anchored barcode system of any one of claims 1 to 10.
15. A composition comprising the position-anchored barcode system of any one of claims 1 to 10.
16. A kit for nanopore sequencing pooling comprising the position-anchored barcode system of any one of claims 1-10, or the sequencing adaptor of claim 13.
17. Use of the position-anchored bar code system according to any of claims 1 to 10, wherein said use is any of the following:
1) the application in improving the classification accuracy of sequencing samples;
2) use in reducing false positives for sequencing sample classification;
3) the application in the construction of sequencing libraries;
4) application in sequencing.
CN202010276679.2A 2020-04-09 2020-04-09 Position anchoring bar code system for nanopore sequencing library building Active CN111440846B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010276679.2A CN111440846B (en) 2020-04-09 2020-04-09 Position anchoring bar code system for nanopore sequencing library building
PCT/CN2020/085645 WO2021203461A1 (en) 2020-04-09 2020-04-20 Position anchoring bar code system for nanopore sequencing library construction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010276679.2A CN111440846B (en) 2020-04-09 2020-04-09 Position anchoring bar code system for nanopore sequencing library building

Publications (2)

Publication Number Publication Date
CN111440846A CN111440846A (en) 2020-07-24
CN111440846B true CN111440846B (en) 2020-12-18

Family

ID=71651430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010276679.2A Active CN111440846B (en) 2020-04-09 2020-04-09 Position anchoring bar code system for nanopore sequencing library building

Country Status (2)

Country Link
CN (1) CN111440846B (en)
WO (1) WO2021203461A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112029823B (en) * 2020-09-03 2021-07-23 江苏先声医疗器械有限公司 Metagenome library building method of nanopore sequencing platform and kit thereof
CN112176032B (en) * 2020-10-16 2021-10-26 广州市达瑞生物技术股份有限公司 Primer combination for nanopore sequencing and library building of respiratory pathogens and application thereof
CN114480740B (en) * 2022-02-18 2023-10-24 杭州柏熠科技有限公司 Targeting sequencing library construction and detection method suitable for 15 plant quarantine viruses

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3578697B1 (en) * 2012-01-26 2024-03-06 Tecan Genomics, Inc. Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation
US9957551B2 (en) * 2014-05-13 2018-05-01 Life Technologies Corporation Systems and methods for validation of sequencing results
CN105989249B (en) * 2014-09-26 2019-03-15 南京无尽生物科技有限公司 For assembling the method, system and device of genome sequence
CN105986324B (en) * 2015-02-11 2018-08-14 深圳华大智造科技有限公司 Cyclic annular tiny RNA library constructing method and its application
CN106282161B (en) * 2016-08-12 2020-10-30 成都诺恩基因科技有限公司 Method for specifically capturing and repeatedly copying low-frequency DNA base variation and application
GB201616590D0 (en) * 2016-09-29 2016-11-16 Oxford Nanopore Technologies Limited Method
CN117887804A (en) * 2017-02-02 2024-04-16 纽约基因组研究中心公司 Methods and compositions for identifying or quantifying targets in biological samples
WO2020036926A1 (en) * 2018-08-17 2020-02-20 Cellecta, Inc. Multiplex preparation of barcoded gene specific dna fragments

Also Published As

Publication number Publication date
WO2021203461A1 (en) 2021-10-14
CN111440846A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN111440846B (en) Position anchoring bar code system for nanopore sequencing library building
CN106048009B (en) Label joint for ultralow frequency gene mutation detection and application thereof
US20210403991A1 (en) Sequencing Process
EP4023795A1 (en) Method for detecting mutation and methylation of tumor specific gene in ctdna
CN111052249B (en) Methods of determining predetermined chromosome conservation regions, methods of determining whether copy number variation exists in a sample genome, systems, and computer readable media
CN111073961A (en) High-throughput detection method for gene rare mutation
EP3819385A1 (en) Construction and sequencing data analysis method for ctdna library for simultaneously detecting various common mutations in liver cancer
CN105986015A (en) Method and kit for detecting one or more target sequence of multiple samples based on high-throughput sequencing
CN106939344A (en) The joint being sequenced for two generations
CN108624700A (en) The kit and its special primer pair combination of 124 micro- haplotype seats of detection are synchronized based on two generation sequencing technologies
CN110603327A (en) PCR primer pair and application thereof
CN113981056A (en) Method for performing high-throughput sequencing based on internal reference of known tag
CN110724731A (en) Method for adding internal reference quantity of nucleic acid copy number in multiplex PCR system
US10179934B2 (en) High-throughput detection method for DNA synthesis product
CN108728515A (en) A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
WO2021253372A1 (en) High-compatibility pcr-free library building and sequencing method
CN112301430B (en) Library building method and application
CN114277114A (en) Method for adding unique identifier in amplicon sequencing and application
WO2019010776A1 (en) Combined label, connector and method for determining that low-frequency mutation nucleic acid sequence is comprised
CN113046415A (en) Construction method and application of RNA sequencing library
WO2019010775A1 (en) Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation
CN113444769A (en) Construction method and application of DNA tag sequence
CN111926394B (en) Database building method and detection kit based on metagenomics
CN117757979B (en) Primer group, kit and identification method for identifying soybean varieties
WO2024104130A1 (en) Whole genome molecular marker development method utilizing degenerate primer amplification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant