WO2023060419A1 - 孔蛋白单体的突变体、蛋白孔及其应用 - Google Patents

孔蛋白单体的突变体、蛋白孔及其应用 Download PDF

Info

Publication number
WO2023060419A1
WO2023060419A1 PCT/CN2021/123209 CN2021123209W WO2023060419A1 WO 2023060419 A1 WO2023060419 A1 WO 2023060419A1 CN 2021123209 W CN2021123209 W CN 2021123209W WO 2023060419 A1 WO2023060419 A1 WO 2023060419A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
mutant
mutated
protein
porin
Prior art date
Application number
PCT/CN2021/123209
Other languages
English (en)
French (fr)
Inventor
刘少伟
谢馥励
李倩雯
何京雄
赵帅
Original Assignee
成都齐碳科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都齐碳科技有限公司 filed Critical 成都齐碳科技有限公司
Priority to PCT/CN2021/123209 priority Critical patent/WO2023060419A1/zh
Publication of WO2023060419A1 publication Critical patent/WO2023060419A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/28Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Vibrionaceae (F)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/483Physical analysis of biological material
    • G01N33/487Physical analysis of biological material of liquid biological material

Definitions

  • the invention belongs to the technical field of characterization of target analyte characteristics, and particularly relates to a mutant of a porin monomer, a protein pore containing the same and its application for detecting target analytes.
  • nucleic acid sequencing technology continues to develop, becoming the core field of life science research, and playing a huge role in promoting the development of technology in the fields of biology, chemistry, electricity, life science, and medicine.
  • Using nanopores to develop a new type of rapid, accurate, low-cost, high-precision and high-throughput nucleic acid sequencing technology is one of the hot spots of the post-human genome project.
  • Nanopore sequencing technology also known as the fourth-generation sequencing technology, is a single-stranded nucleic acid molecule as a sequencing unit, using a nanopore that can provide an ion current channel, so that the single-stranded nucleic acid molecule is driven by electrophoresis. Through this nanopore, when the nucleic acid passes through the nanopore, the current of the nanopore is reduced, and the gene sequencing technology reads the sequence information in real time for the different signals generated.
  • Nanopore sequencing can not only realize natural DNA and RNA sequencing, but also directly obtain the base modification information of DNA and RNA.
  • the use of bisulfite treatment provides a great impetus for the direct study of epigenetic-related phenomena at the genomic level.
  • nanopore detection technology has the advantages of low cost, high throughput, and label-free.
  • Nanopore analysis technology originated from the invention of the Coulter counter and the recording technology of single-channel current.
  • Physiology and Medicine Nobel Prize winners Neher and Sakamann used patch clamp technology to measure membrane potential in 1976 to study membrane proteins and ion channels, which promoted the practical application of nanopore sequencing technology.
  • Kasianowicz et al. proposed a new idea of using ⁇ -hemolysin to sequence DNA, which was a milestone in the sequencing of biological nanopore single molecules.
  • biological nanopores such as MspA porin and phage Phi29 connector enriched the research on nanopore analysis technology.
  • Li et al. opened a new era of solid-state nanopore research. Limited by the development of the semiconductor and material industries, solid-state nanopore sequencing progresses slowly.
  • nanopore sequencing technology lies in the design of a special biological nanopore.
  • the read head structure formed by the constriction area in the hole can cause the channel current when single-stranded nucleic acid (such as ssDNA) molecules pass through the nanopore.
  • the blockage of the nanopore briefly affects the current intensity flowing through the nanopore (the magnitude of the current change affected by each base is different), and finally the highly sensitive electronic device detects these changes to identify the base passing through.
  • protein pores are used as nanopores for sequencing, and porins are mainly derived from Escherichia coli.
  • nanopore sequencing technology there is a single nanopore protein, and it is necessary to develop alternative nanopore proteins to realize nanopore sequencing technology. Porins are also closely related to sequencing accuracy, and porins are also involved in changes in the interaction mode with rate-controlling proteins. Further optimizing the stability of the interaction interface between porins and rate-controlling proteins is crucial to improving the consistency and stability of sequencing data. have a positive impact. The accuracy of nanopore sequencing technology also needs to be improved. Therefore, it is necessary to develop improved nanopore proteins to further improve the resolution of nanopore sequencing.
  • the purpose of the embodiments of the present invention is to provide a mutant of an alternative porin monomer, a protein pore comprising the same, and an application thereof.
  • an embodiment of the present invention provides a mutant of a porin monomer, wherein the amino acid of the mutant of the porin monomer includes the sequence shown in SEQ ID NO: 1 or has at least 99%, 98 %, 97%, 96%, 95%, 90%, 80%, 70%, 60% or 50% identity sequence, or consists of it, and the amino acid of the mutant of the porin monomer is included in the corresponding A mutation at one or more of positions K67, D71, S72, and Y74 of SEQ ID NO: 1;
  • K67, D71, S72, and Y74 are specifically: (1) K67; (2) D71; (3) S72; (4) Y74; (5) K67 and D71 ; (6) K67 and S72 (7) K67 and Y74; (8) D71 and S72; (9) D71 and Y74; (10) S72 and Y74; (11) K67, D71 and S72; (12) K67, D71 and Y74; (13) D71, S72 and Y74; (14) K67, D71, S72 and Y74.
  • the amino acid of the mutant of the porin monomer comprises 62-209, 62-74, 62-75, 65-79, 67-209, 67-75, or 67- Mutations at one or more positions of 74.
  • amino acids of the mutant of the porin monomer include:
  • (1) one or more positions corresponding to Q62, K67, D71, S72, and Y74 of SEQ ID NO:1 have amino acid insertions, deletions and/or substitutions; (2) corresponding to Q62, K67, and Y74 of SEQ ID NO:1 One or more positions of K67, D71, S72, Y74, E110, E119, E126, and K209 have amino acid insertions, deletions and/or substitutions; (3) K67, D71, S72, D71, S72, corresponding to SEQ ID NO:1 There are amino acid insertions, deletions and/or substitutions at one or more positions of Y74 and S75; or (4) one or one of K67, T69, A70, D71, S72, S73 and Y74 corresponding to SEQ ID NO:1 There are insertions, deletions and/or substitutions of amino acids at various positions.
  • amino acid mutation of the mutant of the porin monomer is selected from the following:
  • Q62 corresponding to SEQ ID NO: 1 is mutated to 0 to 5 of G, A, V, L, and I; K67 is mutated to 0 to 3 of R, H, and K; D71 is mutated to N, E 0 to 4 of D, Q; S72 is mutated to 0 to 1 of P; Y74 is mutated to 0 to 5 of S, C, U, T, M;
  • Q62 corresponding to SEQ ID NO: 1 is mutated to 0 to 5 of G, A, V, L, and I; K67 is mutated to 0 to 3 of R, H, and K; D71 is mutated to N, E 0 to 4 of D, Q; S72 mutation is 0 to 1 of P; Y74 is 0 to 3 of F, Y, W; E110 is 0 to 3 of N, D, E, Q 4 kinds; E119 is mutated to 0 to 4 of N, D, E, Q; E126 is mutated to 0 to 4 of N, D, E, Q; K209 is mutated to 0 to 1 of P;
  • K67 corresponding to SEQ ID NO: 1 is mutated to 0 to 3 of R, H, and K; D71 is mutated to 0 to 5 of G, A, V, L, and I; S72 is mutated to 0 to 5 of P 0 to 1; Y74 is mutated to 0 to 3 of F, Y, W; S75 is mutated to 0 to 5 of C, U, S, T, M; and
  • K67 corresponding to SEQ ID NO: 1 is mutated into 0 to 3 of R, H, K; T69 is mutated into 0 to 5 of S, C, T, U, M; A70 is mutated into P 0 to 1; D71 is mutated to 0 to 5 of G, A, V, L, I; S72 is mutated to 0 to 3 of F, Y, W; S73 is mutated to G, A, V, L, I 0 to 5 of them; Y74 is mutated to 0 to 3 of F, Y, and W.
  • amino acid mutation of the mutant of the porin monomer is selected from the following:
  • Q62 corresponding to SEQ ID NO: 1 is mutated to G, A, V, L, or I; K67 is mutated to R, or H; D71 is mutated to N, E, or Q; S72 is mutated to P; Y74 is mutated is S, C, U, T, or M;
  • Q62 corresponding to SEQ ID NO: 1 is mutated to G, A, V, L, or I; K67 is mutated to R, or H; D71 is mutated to N, E, or Q; S72 is mutated to P; Y74 is deleted; E110 mutation to N, D, or Q; E119 mutation to N, D, or Q; E126 mutation to N, D, or Q; K209 mutation to P;
  • K67 corresponding to SEQ ID NO: 1 is mutated to R, or H; D71 is mutated to G, A, V, L, or I; S72 is mutated to P; Y74 is deleted; S75 is deleted; and
  • K67 corresponding to SEQ ID NO: 1 is mutated to R, or H; T69 is mutated to S, C, U, or M; A70 is mutated to P; D71 is mutated to G, A, V, L, or I; S72 Mutation to F, Y, or W; S73 mutation to G, A, V, L, or I; Y74 deletion.
  • amino acid mutation of the mutant of the porin monomer is selected from the following:
  • an embodiment of the present invention provides a mutant of a porin monomer, wherein the amino acid of the mutant of the porin monomer includes the sequence shown in SEQ ID NO: 1 or has at least 99%, 98 %, 97%, 96%, 95%, 90%, 80%, 70%, 60% or 50% identical sequences, and mutants of said porin monomers include:
  • Q62 is mutated to 0 to 5 of G, A, V, L, I; K67 is mutated to R 0 to 3 among , H, K; T69 is mutated to 0 to 5 among S, C, T, U, M; A70 is mutated to 0 to 1 among P; D71 is mutated to N, E, D, 0 to 4 in Q, or mutated to 0 to 5 in G, A, V, L, I; S72 is mutated to 0 to 1 in P, or mutated to 0 to 3 in F, Y, W species; S73 is mutated to 0 to 5 of G, A, V, L, I; Y74 is mutated to 0 to 5 of S, C, U, T, M, or mutated to 0 among F, Y, W to 3 types; S75 is mutated to 0 to 5 of C, U, S, T,
  • 0 to N types include 0, 1, 2, 3, 4...N types.
  • Q62 is mutated to 0 to 5 of G, A, V, L, and I, it means that Q62 is mutated to 0, 1, 2, 3, or 4 of G, A, V, L, and I. or 5 amino acids.
  • the amino acids before and after the mutation are different.
  • T69 when one amino acid is mutated, the amino acids before and after the mutation are different.
  • T69 when this is 1, T69 is not mutated to T, but can only be mutated to S, C, U, M Any one of them; when there are two kinds, T69 can be mutated into any two of S, C, T, U, M, and so on.
  • the mutation when the mutation is 0 amino acid, it refers to the deletion of this amino acid.
  • Y74 when Y74 is mutated to 0 of F, Y, or W, it means that Y74 is deleted.
  • an embodiment of the present invention provides a protein pore, including at least one mutant of a porin monomer.
  • the embodiment of the present invention provides a complex for characterizing a target analyte, characterized in that: the protein pore and the rate-controlling protein used in conjunction with it.
  • embodiments of the present invention provide nucleic acids encoding mutants of porin monomers, protein pores, or complexes.
  • the embodiments of the present invention provide a vector or a genetically engineered host cell comprising the nucleic acid.
  • the embodiments of the present invention provide mutants of porin monomers, their protein pores, complexes, nucleic acids, vectors or host cells in detecting the presence, absence or one or more characteristics of target analytes or preparing detection targets. Application in a product of the presence, absence, or one or more characteristics of an analyte.
  • the embodiments of the present invention provide a method for producing a protein pore or a polypeptide thereof, comprising transforming the host cell with the vector, and inducing the host cell to express the protein pore or a polypeptide thereof.
  • embodiments of the present invention provide a method for determining the presence, absence or one or more characteristics of a target analyte, comprising:
  • the method comprises: said target analyte interacting with said protein pore present in a membrane such that said target analyte moves relative to said protein pore.
  • the target analyte is a nucleic acid molecule.
  • a method for determining the presence, absence, or one or more characteristics of a target analyte comprises coupling said target analyte to a membrane; The protein pore interacts so that the target analyte moves relative to the protein pore.
  • an embodiment of the present invention provides a kit for determining the presence, absence or one or more characteristics of a target analyte, including the mutant of the porin monomer, the protein pore, The complex, the nucleic acid, or the vector or host, and the membrane components.
  • the embodiments of the present invention provide a device for determining the presence, absence or one or more characteristics of a target analyte, including the protein pore or the complex, and the membrane.
  • the target analytes include polysaccharides, metal ions, inorganic salts, polymers, amino acids, peptides, proteins, nucleotides, oligonucleotides, polynucleotides, dyes, drugs, diagnostic agents, explosives substances or environmental pollutants;
  • said target analyte comprises a polynucleotide
  • said polynucleotide comprises DNA or RNA; and/or, said one or more characteristics are selected from (i) the length of said polynucleotide; (ii) the identity of said polynucleotide; (iii) the sequence of the polynucleotide; (iv) the secondary structure of the polynucleotide and (v) whether the polynucleotide is modified; and/or, the rate-controlling protein in the complex Including polynucleotide binding proteins.
  • Figure 1 illustrates the basic working principle of a nanopore according to one embodiment.
  • Fig. 2 shows a schematic diagram of DNA sequencing according to one embodiment.
  • Figure 3 shows the corresponding pore blocking signal when nucleotides pass through the protein pore according to one embodiment.
  • FIGS. 4A, 4B and 4C show a wild-type protein pore channel surface structure and ribbon diagram model according to one embodiment.
  • 4A is a side view of the surface structure model
  • FIG. 4B is a top view of the surface structure model
  • FIG. 4C is a streamer structure model.
  • Figure 5 shows the constriction region amino acid residue distribution and constriction region diameter of wild-type channels according to one embodiment.
  • Figure 6A shows a surface potential map of a wild-type channel monomer according to one embodiment
  • Figure 6B shows a monomer streamer model and a stick model of the distribution of amino acid residues in its constriction region.
  • Fig. 7 shows the distribution characteristics of the amino acid residues in the constriction zone of mutation hole 1 and the diameter of the constriction zone according to one embodiment.
  • Fig. 8 shows a cartoon schematic diagram of homology-based modeling of mutant pore 1 according to an embodiment.
  • Fig. 9 shows the results of negative staining electron microscope photos of mutation well 1 according to an embodiment, and the arrows indicate the target protein particles.
  • Fig. 10 shows the two-dimensional classification results of negative staining electron microscopy of mutant hole 1 according to an embodiment, and the category indicated by the arrow shows that the oligomerization state of mutant hole 1 is 9-mer.
  • Figure 11 shows the structure of the DNA construct BS7-4C3-PLT according to one embodiment.
  • FIG. 12A shows the opening current and gating characteristics of the mutation hole 1 at a voltage of ⁇ 180 mV according to one embodiment.
  • FIG. 12B shows the situation of nucleic acid passing through the mutation hole 1 under the voltage of +180mV according to one embodiment.
  • 13A and 13B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 1, according to one embodiment.
  • Fig. 14 is an enlarged display diagram of a single signal area in the embodiment of Fig. 13A.
  • FIG. 16A shows the opening current and gating characteristics of the mutation hole 2 at a voltage of ⁇ 180 mV according to one embodiment.
  • FIG. 16B shows the situation of nucleic acid passing through the mutation hole 2 under the voltage of +180mV according to one embodiment.
  • Figures 17A and 17B show example current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 2, according to one embodiment.
  • Fig. 18 is an enlarged display diagram of a single signal area in the embodiment of Fig. 17A.
  • Fig. 19 has shown according to an embodiment when the helicase Mph-MP1-E105C/A362C controls the DNA construct BS7-4C3-PLT to pass through the chip test current trajectory when the mutation hole 2 shifts (y of the two trajectories)
  • Axis coordinates current (pA)
  • x-axis coordinates sampling points (pieces)).
  • FIG. 20A shows the opening current and gating characteristics of the mutation hole 3 at a voltage of ⁇ 180 mV according to one embodiment.
  • FIG. 20B shows the situation of nucleic acid passing through the mutation hole 3 under the voltage of +180mV according to one embodiment.
  • 21A and 21B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 3, according to one embodiment.
  • Fig. 22 is an enlarged display diagram of a single signal area in the embodiment of Fig. 21A.
  • FIG. 23A shows the opening current and gating characteristics of the mutation hole 4 at a voltage of ⁇ 180 mV according to one embodiment.
  • FIG. 23B shows the condition of nucleic acid passing through mutant hole 4 under the voltage of +180mV according to one embodiment.
  • 24A and 24B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 4, according to one embodiment.
  • Fig. 25 is an enlarged display diagram of a single signal area in the embodiment of Fig. 24B.
  • Fig. 26 shows the protein purification results of Mutant 1 according to an embodiment
  • lanes 1-6 show the SDS-PAGE electrophoresis detection results of different fractions separated.
  • Fig. 27 shows the results of molecular sieve purification of the protein of mutant 1 according to an embodiment, and the position indicated by the arrow is the peak of the target protein.
  • nucleotide includes two or more nucleotides
  • a helicase includes two or more helicases.
  • nucleotide sequence refers to a polymeric form of nucleotides (ribonucleotides or deoxyribonucleotides) of any length. The term refers only to the primary structure of the molecule. Thus, the term includes double- and single-stranded DNA and RNA.
  • nucleic acid refers to a single- or double-stranded covalently linked sequence of nucleotides wherein the 3' and 5' ends on each nucleotide are linked by a phosphodiester bond.
  • Nucleotides can consist of deoxyribonucleotide bases or ribonucleotide bases.
  • Nucleic acids can include DNA and RNA, and can be prepared synthetically in vitro or isolated from natural sources.
  • the nucleic acid may further comprise modified DNA or RNA, such as methylated DNA or RNA, or RNA that has been post-translationally modified, such as 5'-capping with 7-methylguanosine, 3'-end processing, such as cleavage and polyadenylation, and splicing.
  • Nucleic acids can also include synthetic nucleic acids (XNA), such as hexitol nucleic acid (HNA), cyclohexene nucleic acid (CeNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), locked nucleic acid (LNA) and peptide nucleic acid (PNA).
  • HNA hexitol nucleic acid
  • CeNA cyclohexene nucleic acid
  • TAA threose nucleic acid
  • GNA glycerol nucleic acid
  • LNA locked nucleic acid
  • PNA peptide nucleic acid
  • nucleic acid or polynucleotide
  • bp base pairs
  • nt nucleotides
  • kb kilobase pair
  • Polynucleotides of less than about 40 nucleotides in length are commonly referred to as “oligonucleotides” and may contain primers used in DNA manipulations, such as by polymerase chain reaction (PCR).
  • Polynucleotides such as nucleic acids, are macromolecules comprising two or more nucleotides.
  • the polynucleotide or nucleic acid may comprise any combination of nucleotides.
  • the nucleotides may be naturally occurring or synthetic.
  • One or more nucleotides in the polynucleotide may be oxidized or methylated.
  • One or more nucleotides in the polynucleotide may be damaged.
  • the polynucleotide may comprise a pyrimidine dimer. This dimer is often associated with damage caused by UV light and is a major cause of skin melanoma.
  • One or more nucleotides in the polynucleotide may be modified, for example with conventional labels or tags.
  • the polynucleotide may comprise one or more abasic (ie, lacking a nucleobase), or lacking a nucleobase and sugar (ie, being C3) nucleotides.
  • the nucleotides in the polynucleotide may be linked to each other in any manner.
  • the nucleotides are usually linked by their sugar and phosphate groups, as in nucleic acids.
  • the nucleotides may be linked via their nucleobases, as in pyridine dimers.
  • a polynucleotide can be single-stranded or double-stranded. At least a portion of the polynucleotide is preferably double-stranded.
  • a polynucleotide may be a nucleic acid, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
  • a polynucleotide may comprise an RNA strand hybridized to a DNA strand.
  • the polynucleotide may be any synthetic nucleic acid known in the art, such as peptide nucleic acid (PNA), glycerol nucleic acid (GNA), threose nucleic acid (TNA), locked nucleic acid (LNA) or other synthetic nucleic acid having nucleotide side chains.
  • PNA peptide nucleic acid
  • GNA glycerol nucleic acid
  • TNA locked nucleic acid
  • LNA locked nucleic acid having nucleotide side chains.
  • the PNA backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds.
  • the GNA backbone is composed of repeating ethylene glycol units linked by phosphodiester bonds.
  • the TNA backbone is composed of repeating threosyl groups linked together by phosphodiester bonds.
  • LNA is formed from ribonucleic acid as described above, with an additional bridge connecting the 2' oxygen and 4' carbon in the ribose moiety.
  • Bridged nucleic acids are modified RNA nucleotides. They can also be called restricted or inaccessible RNA13BNA monomers can contain 5-membered, 6-membered or even 7-membered bridges with "fixed" C3'-endo sugar puckering .
  • the bridging structure is synthetically introduced into the 2',4'-position of ribose to generate a 2',4'-BNA monomer.
  • the polynucleotide is most preferably ribonucleic acid (RNA) or deoxyribonucleic acid (DNA).
  • a polynucleotide can be of any length.
  • a polynucleotide can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotides or nucleotide pairs in length.
  • the polynucleotide may be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs or 100000 or more nucleotides or cores in length nucleotide pair.
  • any number of polynucleotides can be studied.
  • the methods of the embodiments may involve characterizing 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100 or more polynucleotides. If two or more polynucleotides are characterized, they may be different polynucleotides or in the case of the same polynucleotide.
  • Polynucleotides can be naturally occurring or synthetic.
  • the methods can be used to verify the sequence of prepared oligonucleotides.
  • the methods are typically performed in vitro.
  • amino acid is used in its broadest sense and is meant to include amino acids containing amine ( NH2 ) and carboxyl (COOH) functional groups as well as side chains (such as R groups) that are unique to each amino acid. of organic compounds.
  • amino acid refers to a naturally occurring L ⁇ -amino acid or residue.
  • amino acid also includes D-amino acids, retro-inverse amino acids, and chemically modified amino acids (such as amino acid analogs), naturally occurring amino acids (such as norleucine) that are not usually incorporated into proteins, and amino acids that are known in the art.
  • Chemically synthesized compounds such as ⁇ -amino acids with properties known to be characteristic of amino acids.
  • analogs or mimetics of phenylalanine or proline are included in the definition of amino acid which allow the same conformational constraints on the peptide compound as native Phe or Pro. Such analogs and mimetics are referred to herein as "functional equivalents" of the corresponding amino acids.
  • protein protein
  • polypeptide and “peptide” are further used interchangeably herein to refer to polymers of amino acid residues as well as variants and synthetic analogs of amino acid residues. Accordingly, these terms apply to amino acid polymers in which one or more amino acid residues are a synthetic non-naturally occurring amino acid, such as a chemical analog of the corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers.
  • Polypeptides may also undergo maturation or post-translational modification processes, which may include, but are not limited to, glycosylation, proteolytic cleavage, lipidation, signal peptide cleavage, propeptide cleavage, phosphorylation, and the like.
  • “Homologues” of a protein encompass peptides, oligonucleotides, oligonucleotides having amino acid substitutions, deletions and/or insertions relative to the unmodified or wild-type protein in question and having similar biological and functional activity to the unmodified protein from which they were derived. Peptides, polypeptides, proteins and enzymes.
  • amino acid identity refers to the degree to which sequences are identical on an amino acid-by-amino acid basis over a comparison window.
  • percent sequence identity is calculated by comparing two optimally aligned sequences over a comparison window and determining the occurrence of identical amino acid residues (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, Ile, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gln, Cys, and Met) to get the number of matching positions, divide the number of matching positions by the positions in the comparison window total (i.e., window size), and multiply the result by 100 to obtain percent sequence identity.
  • amino acid residues e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, Ile, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gln, Cys, and Met
  • Sequence identity may also be to fragments or portions of full-length polynucleotides or polypeptides. Thus, a sequence may have only 50% overall sequence identity to the full-length reference sequence, but the sequence of a particular region, domain or subunit may have 80%, 90%, or as much as 99% sequence identity to the reference sequence.
  • wild-type refers to a gene or gene product isolated from a naturally occurring source.
  • a wild-type gene is the most frequently observed gene in a population, and thus is arbitrarily designed to be the "normal” or "wild-type” form of that gene.
  • modified refers to a gene or gene product that exhibits sequence modifications (for example, substitutions, truncations or insertions), post-translational modifications and/or A gene or gene product that has a functional property (eg, altered property). Note that naturally occurring mutants can be isolated; these mutants are identified by the fact that they have altered characteristics compared to the wild-type gene or gene product.
  • arginine (R) can be substituted for methionine (M) by substituting a codon for arginine (CGT) for methionine (ATG) at the relevant position in the polynucleotide encoding the mutated monomer.
  • CCT codon for arginine
  • ATG methionine
  • non-naturally occurring amino acids can be introduced by including a synthetic aminoacyl-tRNA in the IVTT system used to express the mutated monomer.
  • non-naturally occurring amino acids can be introduced by expressing mutated monomers in Gulbenkiania indica that are nutritive for specific amino acids in the presence of synthetic (i.e. non-naturally occurring) analogs of those specific amino acids defective type.
  • Mutant monomers can also be produced by naked ligation if they are produced using partial peptide synthesis. Conservative substitutions replace an amino acid with another amino acid of similar chemical structure, similar chemical properties, or similar side chain volume.
  • the introduced amino acids may have a similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge to the amino acids they replace.
  • a conservative substitution can introduce another aromatic or aliphatic amino acid in place of a pre-existing aromatic or aliphatic amino acid.
  • Conservative amino acid changes are well known in the art and can be selected based on the properties of the 20 major amino acids defined in Table 1 below. In the case of amino acids with similar polarity, this can also be determined by reference to the hydrophilicity scale for amino acid side chains in Table 2.
  • Mutated or modified proteins, monomers or peptides may also be chemically modified at any point in any manner.
  • the mutated or modified monomer or peptide is preferably by attachment of the molecule to one or more cysteines (cysteine linkage), attachment of the molecule to one or more lysines, attachment of the molecule to one or more chemical modification by attachment of an unnatural amino acid, enzymatic modification of the epitope, or modification of the terminus. Suitable methods for making such modifications are well known in the art.
  • Modified mutants of proteins, monomers or peptides can be chemically modified by attachment of any molecule.
  • mutants of modified proteins, monomers, or peptides can be chemically modified by the attachment of dyes or fluorophores.
  • a mutated or modified monomer or peptide is chemically modified with a molecular adapter that facilitates the interaction between a pore comprising the monomer or peptide and a target nucleotide or target polynucleotide sequence.
  • Molecular adapters are preferably cyclic molecules, cyclodextrins, substances capable of hybridization, DNA binding or intercalating agents, peptides or peptide analogs, synthetic polymers, aromatic planar molecules, positively charged small molecules or capable of hydrogen bonding bonded small molecules.
  • the presence of the adapter improves the host-guest chemistry of the pore and the nucleotide or polynucleotide sequence, thereby improving the sequencing capability of the pore formed from the mutated monomer.
  • the principles of host-guest chemistry are well known in the art.
  • Adapters have an effect on the physical or chemical properties of the pore that improves the interaction of the pore with the nucleotide or polynucleotide sequence.
  • An adapter can alter the charge of the barrel or channel of the pore, or specifically interact or bind to a nucleotide or polynucleotide sequence, thereby facilitating its interaction with the pore.
  • a “protein pore” is a transmembrane protein structure that defines a channel or pore that allows the translocation of molecules and ions from one side of the membrane to the other. The translocation of ionic species through the pore can be driven by a potential difference applied to either side of the pore.
  • a “nanopore” is a protein pore in which the smallest diameter of the pathway through which molecules or ions pass is on the nanometer scale (10 -9 meters).
  • the protein pore may be a transmembrane protein pore.
  • the transmembrane protein structure of the protein pore can be monomeric or oligomeric in nature.
  • a pore comprises multiple polypeptide subunits arranged around a central axis, forming a protein-lined channel extending substantially perpendicular to the membrane in which the nanopore resides.
  • the number of polypeptide subunits is not limited. Typically, the number of subunits is from 5 to 30, suitably from 6 to 10. Alternatively, the number of subunits is not as defined as in the case of perfringolysin or related large membrane pores.
  • the portion of the protein subunit that forms the protein-lined channel within the nanopore typically contains secondary structural motifs that may include one or more transmembrane ⁇ -barrel and/or ⁇ -helical portions.
  • the protein pore comprises one or more porin monomers.
  • Each porin monomer can be from Gulbenkiania indica.
  • the protein pore comprises a mutant of one or more porin monomers (ie, one or more porin mutated monomers).
  • the porin is from a wild-type protein from the kingdom of Bacteria, a wild-type homologue, or a mutant thereof. Mutants can be modified porins or porin mutants. Modifications in a mutant include, but are not limited to, any one or more modifications or combinations of modifications disclosed herein.
  • the Bacteria wild-type protein is a protein from Gulbenkiania indica.
  • the wild-type protein of the kingdom is a protein from Gulbenkiania indica (Gene: Ga0061063_1194).
  • a porin homologue refers to a protein having at least 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91 %, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50% complete sequence identity of the polypeptides.
  • a porin homologue refers to a polynucleotide having at least 99%, 98%, 97%, 96%, 95%, 94%, 93% of the polynucleotide encoding the protein shown in SEQ ID NO:2 , 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, 50% complete sequence identity.
  • the polynucleotide sequence may comprise a sequence that differs from SEQ ID NO: 2 based on the degeneracy of the genetic code.
  • Polynucleotide sequences can be derived and replicated using standard methods in the art. Chromosomal DNA encoding wild-type porins can be extracted from pore-producing organisms such as Gulbenkiania indica. The gene encoding the pore subunit can be amplified using PCR including specific primers. The amplified sequence can then be subjected to site-directed mutagenesis. Suitable methods of site-directed mutagenesis are known in the art and include, for example, combinatorial chain reactions. The constructed polynucleotides encoding the embodiments can be prepared by techniques known in the art, for example, in Sambrook, J. and Russell, D. (2001). Molecular Cloning A Laboratory Manual, 3rd Edition. Cold Spring Harbor Laboratory Press, Cold Spring Those described in Harbor, NY.
  • polynucleotide sequences can then be incorporated into recombinant replicable vectors, such as cloning vectors.
  • the vector can be used to replicate the polynucleotide in a compatible host cell.
  • polynucleotide sequences can be prepared by introducing the polynucleotide into a replicable vector, introducing the vector into a compatible host cell, and growing the host cell under conditions that cause replication of the vector. The vector can be recovered from the host cell.
  • an insulating film 102 with nanoscale pores divides the cavity into two small chambers, as shown in FIG. 1 , when a voltage acts on the electrolyte chamber, ions or other Small molecules pass through the pores under the action of an electric field, forming a stable and detectable ionic current.
  • ions or other Small molecules pass through the pores under the action of an electric field, forming a stable and detectable ionic current.
  • Different types of biomolecules can be detected by knowing the size and surface properties of the nanopores, the applied voltage, and the solution conditions.
  • ssDNA single-stranded DNA
  • FIG. 2 shows a schematic diagram 200 of DNA sequencing.
  • the nanopore is the only channel for ions on both sides of the phospholipid membrane to pass through.
  • Rate-controlling proteins such as polynucleotide binding proteins act as motor proteins for nucleic acid molecules such as DNA, pulling DNA strands sequentially through the nanopore/protein pore in steps of single nucleotides.
  • the corresponding pore blocking signal is recorded ( Figure 3).
  • porins are screened from different species (mainly bacteria and archaea) in nature by means of bioinformatics and evolution.
  • the porin is from any organism, preferably from Gulbenkiania indica.
  • sequence analysis the porin has a complete functional domain.
  • structural biology methods to predict and analyze the 3D structure model of the porin, select the channel protein with a suitable reading head architecture.
  • genetic engineering, protein engineering, protein directed evolution, and computer-aided protein design are used to transform, test, and optimize candidate channel proteins (or porins).
  • multiple homologous protein mutants are obtained, preferably Two (different homologous protein backbones), with different signal characteristics and signal distribution patterns.
  • porins in the examples can be applied to the fourth generation sequencing technology.
  • the porin is a nanoporin.
  • porins can be applied to solid state pores for sequencing.
  • a new protein backbone is used to form a new constriction region (reading head region) structure, thereby providing a new mode of action during the sequencing process.
  • the porins of the examples have good edge-hopping distribution and recombination efficiency with phospholipid membranes.
  • the wild-type porin monomer is genetically mutated to form a mutant of the porin monomer.
  • the amino acid of the mutant of the porin monomer comprises the sequence shown in SEQ ID NO: 1 or comprises at least 99%, 98%, 97%, 96%, 95%, 94%, 93% , 92%, 91%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, or 50% identical sequence, and the mutant of the porin monomer The amino acid corresponds to one or more positions of K67, D71, S72, and Y74 of SEQ ID NO: 1 with mutations.
  • mutations include insertions, deletions and/or substitutions of amino acids.
  • the amino acids of the mutant of the porin monomer correspond to (1) 62-209, (2) 62-74, (3) 62-75, (4) 67- Mutations at one or more of positions 209, (5) 67-75, or (6) 67-74.
  • the amino acids of the mutant of the porin monomer correspond to (1) 62-209, (2) 62-74, (3) 62-75, (4) 67- One or more positions at positions 209, (5) 67-75, or (6) 67-74 have amino acid insertions, deletions and/or substitutions.
  • the amino acid of the mutant of the porin monomer has mutations only at positions Q62, K67, D71, S72, and Y74 corresponding to SEQ ID NO: 1, or has amino acids at one or more positions insertions, deletions and/or substitutions.
  • the amino acid of the mutant of the porin monomer has mutations only at positions Q62, K67, D71, S72, Y74, E110, E119, E126, and K209 corresponding to SEQ ID NO: 1, or at positions There are insertions, deletions and/or substitutions of amino acids at one or more positions.
  • the amino acid of the mutant of the porin monomer has mutations only at positions K67, D71, S72, Y74, and S75 corresponding to SEQ ID NO: 1, or has amino acids at one or more positions insertions, deletions and/or substitutions.
  • the amino acid of the mutant of the porin monomer has mutations only at positions K67, T69, A70, D71, S72, S73, and Y74 corresponding to SEQ ID NO: 1, or at one or more Positions have insertions, deletions and/or substitutions of amino acids.
  • At one or more positions means 1, 2, 3, 4, 5, 6, 7, 8, 9, 10... or up to all positions. For example, at one or more positions of 5 amino acids is at 1, 2, 3, 4 or 5 positions.
  • the position corresponding to SEQ ID NO: 1 means that no matter whether the sequence number is changed by amino acid insertion or deletion or the use of an identity sequence, the relative position remains unchanged, and the sequence of SEQ ID NO: 1 can still be used number.
  • Q62 corresponding to SEQ ID NO: 1 can be mutated to Q62L, even if the sequence number of SEQ ID NO: 1 changes or adopts a sequence having identity as defined herein with SEQ ID NO: 1, corresponding to the sequence of SEQ ID NO: 1 Amino acid Q at position 62 (even if it is not position 62 in another sequence) can also be mutated into L, which is still within the protection scope of the present invention.
  • the amino acids of the mutant of the porin monomer consist of the sequence shown in SEQ ID NO: 1, or have at least 99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, or 70%, 65%, 60%, 55%, or 50% identity sequence composition, and the porin monomer
  • the amino acid of the mutant corresponds to one or more positions of K67, D71, S72, and Y74 of SEQ ID NO:1.
  • sequence of SEQ ID NO: 1 of the porin monomer is from Gulbenkiania indica.
  • the nucleotide sequence encoding the amino acid of SEQ ID NO:1 is SEQ ID NO:2.
  • Q62 corresponding to SEQ ID NO: 1 is mutated into 0 to 5 of G, A, V, L, and I; K67 is mutated into 0 to 3 of R, H, and K; D71 is mutated into 0 to 4 of N, E, D, Q; S72 is mutated to 0 to 1 of P; Y74 is mutated to 0 to 5 of S, C, U, T, M.
  • Q62 corresponding to SEQ ID NO: 1 is mutated into 0 to 5 of G, A, V, L, and I; K67 is mutated into 0 to 3 of R, H, and K; D71 is mutated into 0 to 4 of N, E, D, Q; S72 mutation to 0 to 1 of P; Y74 mutation to 0 to 3 of F, Y, W; E110 mutation to N, D, E, Q 0 to 4 of N, D, E, Q in E119 mutation; 0 to 4 of N, D, E, Q in E126 mutation; 0 to 1 of P in K209 mutation .
  • the mutation of K67 corresponding to SEQ ID NO: 1 is 0 to 3 of R, H, and K; the mutation of D71 is 0 to 5 of G, A, V, L, and I; the mutation of S72 is 0 to 1 of P; 0 to 3 of F, Y, W in Y74 mutation; 0 to 5 of C, U, S, T, M in S75 mutation.
  • the K67 corresponding to SEQ ID NO: 1 is mutated into 0 to 3 of R, H, and K; the T69 is mutated into 0 to 5 of S, C, T, U, and M; the A70 is mutated into 0 to 1 of P; D71 mutation of 0 to 5 of G, A, V, L, I; S72 mutation of 0 to 3 of F, Y, W; S73 mutation of G, A, V, 0 to 5 of L and I; Y74 is mutated into 0 to 3 of F, Y, and W.
  • a mutant of a porin monomer wherein the amino acid mutation is selected from the following:
  • the amino acid sequence of the mutant of the porin monomer comprises, or consists of, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, or SEQ ID NO: 19.
  • the protein pore comprises at least one mutant of a porin monomer (or a mutated monomer of a porin). In one embodiment, the protein pore comprises at least two, three, four, five, six, seven, eight, nine, or ten or more mutants of the porin monomer. In one embodiment, the protein pore comprises mutants of at least two porin monomers, which may be the same or different. In one embodiment, the protein pore comprises mutants of two or more monomers of the porin, preferably the mutants of the two or more monomers are the same. In one embodiment, the protein pore comprises mutants of nine porin monomers. In one embodiment, the diameter of the constricted region of the protein pore is 0.7nm-2.2nm, 0.9nm-1.6nm, 1.4-1.6nm or
  • mutants of porin monomers or protein pores comprising the same for detecting the presence, absence or one or more characteristics of a target analyte.
  • mutants of porin monomers or protein pores are used to detect the sequence of a nucleic acid molecule, or characterize a polynucleotide sequence, eg, sequence a polynucleotide sequence, because they can distinguish between different nucleotides with high sensitivity.
  • Mutants of porin monomers or protein pores that include them can discriminate between four nucleotides in DNA and RNA, and even between methylated and unmethylated nucleotides, with unexpectedly high resolution .
  • dCMP deoxycytosine monophosphate
  • Mutants of porin monomers or protein pores can also discriminate between different nucleotides under a range of conditions.
  • mutants of said porin monomers or protein pores discriminate between nucleotides under conditions favorable for nucleic acid characterization, such as sequencing.
  • the degree to which mutants of porin monomers or protein pores discriminate between different nucleotides can be controlled by varying the applied potential, salt concentration, buffer, temperature and the presence of additives such as urea, betaine and DTT. This allows mutants of porin monomers or the function of the protein pore to be finely tuned, especially when sequenced.
  • Mutants of porin monomers or protein pores can also be used to identify polynucleotide polymers by interacting with one or more monomers rather than at nucleotide-based nucleotides.
  • a mutant of a porin monomer or a protein pore may be isolated, substantially isolated, purified or substantially purified. Mutants of the porin monomers or protein pores of the examples are isolated or purified if they are completely free of any other components, such as liposomes or other protein pores/porins. A mutant porin monomer or protein pore is substantially isolated if the mutant porin monomer or protein pore is mixed with a carrier or diluent that does not interfere with its intended use.
  • the mutant of the porin monomer or the protein pore can contain less than 10%, less than 5%, less than 2% or less than 1% of other components such as triblock copolymers, liposomes or other protein pores/pores If the mutant porin monomer or protein pore exists in the form of a protein, the porin monomer is substantially isolated or substantially purified. Alternatively, mutants of porin monomers or protein pores can be present in the membrane.
  • the membrane is preferably an amphiphilic layer.
  • the amphiphilic layer is a layer formed of amphiphilic molecules, for example, phospholipids, which have both hydrophilic and lipophilic properties.
  • Amphiphiles can be synthetic or naturally occurring.
  • the amphiphilic layer can be monolayer or bilayer.
  • the amphiphilic layer is usually planar.
  • the amphiphile may be curved.
  • the amphiphilic layer may be supported.
  • the membrane can be a lipid bilayer.
  • a lipid bilayer is formed by two opposing layers of lipids. The two layers of lipids are aligned such that their hydrophobic tail groups face each other to form a hydrophobic interior.
  • the hydrophilic headgroups of the lipids face outward towards the aqueous environment on each side of the bilayer.
  • the membrane includes a solid state layer. Solid layers can be formed from organic and inorganic materials. If the membrane comprises a solid state layer, the pores typically exist in the amphiphilic membrane or in a layer comprised within the solid state layer, e.g., in holes, wells, gaps, channels, trenches or slits within the solid state layer.
  • Embodiments provide a method of determining the presence, absence, or one or more properties of a target analyte.
  • the method involves contacting the target analyte with a mutant porin monomer or protein pore such that the target analyte moves relative to, e.g., through, the mutant porin monomer or protein pore, and One or more measurements are taken as the target analyte moves relative to the mutant porin monomer or protein pore, thereby determining the presence, absence, or one or more properties of the target analyte.
  • the target analyte may also be referred to as a template analyte or an analyte of interest.
  • Target analytes are preferably polysaccharides, metal ions, inorganic salts, polymers, amino acids, peptides, polypeptides, proteins, nucleotides, oligonucleotides, polynucleotides, dyes, drugs, diagnostic agents, explosives or environmental pollutants .
  • the method may involve determining the presence, absence or one or more properties of two or more target analytes of the same class, eg, two or more proteins, two or more nucleotides, or two or more drugs.
  • the method may involve determining the presence, absence or one or more properties of two or more different classes of target analytes, e.g., one or more proteins, one or more nucleotides and one or more medications.
  • the method comprises contacting the target analyte with a mutant porin monomer or protein pore such that the target analyte moves through the mutant porin monomer or protein pore.
  • the protein pore typically comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9 or at least 10 porin mutant monomers, e.g. , 7, 8, 9 or 10 monomers.
  • the protein pores comprise identical monomers or different porin monomers, preferably 8 or 9 identical monomers. One or more of said monomers, eg 2, 3, 4, 5, 6, 7, 8, 9 or 10, are preferably chemically modified as discussed above.
  • the amino acid of each monomer comprises SEQ ID NO: 1 and the above-mentioned mutants thereof.
  • the amino acid of each monomer consists of SEQ ID NO: 1 and its above-mentioned mutants.
  • the methods of the embodiments may measure two, three, four or five or more characteristics of a polynucleotide.
  • the one or more characteristics are preferably selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide, and ( v) whether the polynucleotide has been modified. In one embodiment, any combination of (i) to (v) may be measured.
  • (i) for example, by determining the number of interactions between the polynucleotide and the mutant/protein pore of the protein monomer or the duration of the interaction between the polynucleotide and the mutant/protein pore of the protein monomer vs. The length of the polynucleotide is measured.
  • the identity of the polynucleotide can be measured in a variety of ways, either in conjunction with or without measurement of the polynucleotide sequence.
  • the former is simpler; the polynucleotides are sequenced and then identified.
  • the latter can be done in several different ways. For example, the presence of a particular motif in a polynucleotide can be measured (without measuring the rest of the sequence of the polynucleotide).
  • measurement of specific electrical and/or optical signals in the method can identify the polynucleotide as coming from a specific source.
  • the sequence of the polynucleotide can be determined as previously described. Suitable sequencing methods, in particular those using electrical measurements, are described in Stoddart D et al., ProC Natl Acad Sci, 12; 106(19)7702-7, Lieberman KR et al, J Am Chem SoC. 2010; 132(50)17961-72, and in International Application W02000/28312.
  • secondary structure can be measured in a number of ways. For example, if the method involves an electrical measurement method, the secondary structure can be measured using a change in residence time or a change in the current flowing through the pore. This allows distinguishing between regions of single- and double-stranded polynucleotides.
  • the presence or absence of any modification can be measured.
  • the method comprises determining whether the polynucleotide is abasic or lacks nucleobases and sugars by methylation, oxidation, damage, with one or more proteins or with one or more labels, tagging or grooming. Certain modifications will result in specific interactions with the pore, which can be measured using the methods described below. For example, methylcytosine can be distinguished from cytosine based on the current flowing through the pore during its interaction with each nucleotide.
  • the target polynucleotide is contacted with a mutant protein monomer/protein pore, such as a mutant protein monomer/protein pore as in the examples.
  • Mutants/protein pores of said protein monomers are usually present in membranes. Suitable membranes are described above.
  • the method can be performed using any device suitable for studying membranes/protein pores or mutant systems of porin monomers - where mutants of protein monomers/protein pores are present in membranes.
  • the method can be performed using any device suitable for use with the transmembrane porosity side.
  • the device comprises a chamber containing an aqueous solution and a barrier dividing the chamber into two parts.
  • the barrier typically has pores in which a membrane comprising pores is formed.
  • the barrier forms a membrane in which mutants/protein pores of protein monomers are present.
  • the method can be performed using the apparatus described in International Application No. PCT/GB08/000562 (WO 2008/102120).
  • Electrical measurements include voltage measurement, capacitance measurement, current measurement, impedance measurement, tunneling measurement (Ivanov AP et al., Nano Lett.2011Jan12; 11(I):279-85) and FET measurement (International Application WO 2005 /124888).
  • Optical measurements can be combined with electrical measurements (Soni GV et al., Rev Sci Instrum. 2010 Jan; 81(1) 014301).
  • the measurement may be a transmembrane current measurement, eg measurement of ionic current flowing through the pore.
  • the electrical or optical measurements may employ conventional electrical or optical measurements.
  • Electrical measurements can be used as described in Stoddart D et al., ProC Natl Acad Sci, 12; 106(19) 7702-7, Lieberman KR et al, J Am Chem SoC. 2010; 132(50) 17961-72 and the international application WO Standard single channel recording equipment in 2000/28312.
  • electrical measurements can be performed using a multi-channel system, for example as described in International Application WO 2009/077734 and International Application WO 2011/067559.
  • the method is preferably carried out using an electrical potential applied across the membrane.
  • the applied potential may be a voltage potential.
  • the applied potential may be a chemical potential.
  • An example of this is using a salt gradient across a membrane, such as an amphiphilic layer. Salt gradients are disclosed in Holden et al., J Am Chem SoC. 2007 Jul 11;129(27):8650-5.
  • the current flowing through the mutant/protein pore of the protein monomer as the polynucleotide moves relative to the mutant/protein pore of the protein monomer is used to estimate or determine the sequence of the polynucleotide. This is chain sequencing.
  • the method may comprise measuring the current flowing through the pore as the polynucleotide moves relative to the pore.
  • the apparatus used in the method may thus also include circuitry capable of applying an electrical potential and measuring electrical signals across the membrane and pores.
  • the method can be performed using patch clamp or voltage clamp.
  • the method may include measuring the current flowing through the pore as the polynucleotide moves relative to the pore. Suitable conditions for measuring ion flux through transmembrane protein pores are known in the art and disclosed in the Examples.
  • the method is generally carried out by applying a voltage across the membrane and the pore.
  • the voltage used is typically from +5V to -5V, eg from +4V to -4V, from +3V to -3V or from +2V to -2V.
  • the voltage used is typically from -600mV to +600V or -400mV to +400mV.
  • the voltage used preferably has a lower limit selected from -400mV, -300mV, -200mV, -150mV, -100mV, -50mV, -20mV and 0mV and independently selected from +10mV, +20mV, +50mV, +100mV, +150mV, +200mV, +300nA ⁇ P+400mV upper limit range.
  • the voltage used is more preferably in the range of 100 mV to 240 mV and most preferably in the range of 120 mV to 220 mV.
  • the process is generally carried out in the presence of any charge carrier, such as a metal salt such as an alkali metal salt, a halide salt such as a chloride salt, such as an alkali metal chloride salt.
  • Charge carriers may include ionic liquids or organic salts such as tetramethylammonium chloride, trimethylphenylammonium chloride, phenyltrimethylammonium chloride or 1-ethyl-3-methylimidazolium chloride.
  • the salt is present in the aqueous solution in the chamber.
  • potassium chloride (KCl), sodium chloride (NaCl), cesium chloride (CsCl) or a mixture of potassium ferrocyanide and potassium ferricyanide is used.
  • KCl, NaCl and mixtures of potassium ferrocyanide and potassium ferricyanide are preferred.
  • Charge carriers may be asymmetric on the membrane. For example, the type and/or concentration of charge carriers may be different on each side of the membrane.
  • the concentration of the salt may be saturated.
  • the concentration of the salt may be 3M or less, and is typically 0.1 to 2.5M, 0.3 to 1.9M, 0.5 to 1.8M, 0.7 to 1.7M, 0.9 to 1.6M or 1M to 1.4M.
  • the concentration of the salt is preferably from 150 mM to 1M.
  • the method is preferably performed using a salt concentration of at least 0.3M, eg at least 0.4M, at least 0.5M, at least 0.6M, at least 0.8M, at least 1.0M, at least 1.5M, at least 2.0M, at least 2.5M or at least 3.0M.
  • a high salt concentration provides a high signal-to-noise ratio and allows the passing of the current to indicate the presence of the nucleotide to be identified against the background of normal current fluctuations.
  • the methods are generally performed in the presence of a buffer.
  • the buffer is present in the aqueous solution in the chamber. Any buffer may be used in the methods of the invention.
  • the buffer is a phosphate buffer.
  • Other suitable buffers are HEPES or Tris-HCl buffer.
  • the process is typically performed at a pH of 4.0 to 12.0, 4.5 to 10.0, 5.0 to 9.0, 5.5 to 8.8, 6.0 to 8.7, 7.0 to 8.8, or 7.5 to 8.5.
  • the pH used is preferably about 7.5.
  • the method can be carried out at a temperature of 0°C to 100°C, 15°C to 95°C, 16°C to 90°C, 17°C to 85°C, 18°C to 80°C, 19°C to 70°C or 20°C to 60°C.
  • the method is generally carried out at room temperature.
  • the method is optionally performed at a temperature that supports enzyme function, eg, about 37°C.
  • a method for determining the presence, absence, or one or more characteristics of a target analyte comprises coupling the target analyte to a membrane; and combining the target analyte with The protein pores present in the membrane interact (eg, contact) such that the target analyte moves relative to (eg, passes through) the protein pore.
  • the current flow through the protein pore as the target analyte moves relative to the protein pore is measured to determine the presence, absence or one or more characteristics of the target analyte (e.g., multinucleated). nucleotide sequence).
  • a rate-controlling protein is one that can control the speed at which a target analyte (e.g., a polynucleotide) moves relative to a protein pore (e.g., slow it down) so that this speed is able to respond to the presence, absence, or one or more characteristics of the target analyte.
  • a target analyte e.g., a polynucleotide
  • Proteins for detection eg, sequencing of polynucleotides.
  • Protein pores are used in conjunction with rate-controlling proteins to characterize target analytes.
  • the rate-controlling protein slows down the passage of polynucleotides through the protein pore to enable sequencing.
  • Rate-controlling proteins include the polynucleotide binding proteins described below.
  • the characterization methods of the embodiments preferably comprise contacting the polynucleotide with a polynucleotide binding protein such that the protein controls movement of the polynucleotide relative to the mutant/protein pore of the protein monomer, e.g. Mutant/protein pores.
  • the method comprises (a) contacting the polynucleotide with a mutant/protein pore of a protein monomer and a polynucleotide binding protein such that the protein controls the relative relation of the polynucleotide to the mutant/pore of the protein monomer. Movement of the protein pore, e.g., through the mutant/protein pore of the protein monomer, and (b) obtaining one or more measurements as the polynucleotide moves relative to the mutant/protein pore of the protein monomer, wherein, Such measurements are indicative of one or more characteristics of the polynucleotide, thereby characterizing the polynucleotide.
  • the method comprises (a) contacting the polynucleotide with a mutant/protein pore of a protein monomer and a polynucleotide binding protein such that the protein controls the relative relation of the polynucleotide to the mutant/pore of the protein monomer. Movement of the protein pore, e.g., through the mutant/protein pore of the protein monomer, and (b) measuring through the mutant/protein pore of the protein monomer as the polynucleotide moves relative to the mutant/protein pore of the protein monomer wherein the current is indicative of one or more characteristics of the polynucleotide, thereby characterizing the polynucleotide.
  • a polynucleotide binding protein can be any protein capable of binding a polynucleotide and controlling its movement through a pore.
  • a polynucleotide binding protein typically interacts with and modifies at least one property of a polynucleotide.
  • Proteins can modify polynucleotides by cleaving them to form individual nucleotides or short chains of nucleotides (eg, dinucleotides or trinucleotides).
  • a protein can modify a polynucleotide by orienting it or moving it to a specific position, ie, controlling its movement.
  • the polynucleotide binding protein is preferably derived from a polynucleotide handling enzyme.
  • a polynucleotide-handling enzyme is a polypeptide capable of interacting with and modifying at least one property of a polynucleotide.
  • the enzymes can modify polynucleotides by cleaving them to form individual nucleotides or short chains of nucleotides (eg, dinucleotides or trinucleotides).
  • the enzyme can modify the polynucleotide by orienting it or moving it to a specific position.
  • a polynucleotide-handling enzyme need not exhibit enzymatic activity as long as it is capable of binding polynucleotides and controlling their movement through the pore.
  • the enzyme may be modified to remove its enzymatic activity, or may be used under conditions that prevent its use as an enzyme.
  • Polynucleotide-handling enzymes are preferably polymerases, exonucleases, helicases and topoisomerases, eg, gyrase.
  • the enzyme is preferably a helicase, such as Hel308Mbu, Hel308Csy, Hel308Tga, Hel308Mhu, Tral Eco, XPD Mbu, Dda or variants thereof. Any helicase can be used in the embodiments.
  • any number of helicases can be used. For example, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more helicases may be used. In some embodiments, different numbers of helicases can be used.
  • the methods of the embodiments preferably comprise contacting the polynucleotide with two or more helicases.
  • the two or more helicases are typically the same helicase.
  • the two or more helicases may be different helicases.
  • the two or more helicases may be any combination of the aforementioned helicases.
  • the two or more helicases may be two or more Dda helicases.
  • the two or more helicases may be one or more Dda helicases and one or more TrwC helicases.
  • the two or more helicases may be different variants of the same helicase.
  • the two or more helicases are preferably linked to each other.
  • the two or more helicases are more preferably covalently linked to each other.
  • the helicases can be linked in any order and using any method.
  • the invention also provides a kit for characterizing a target analyte (eg, a target polynucleotide).
  • a target analyte eg, a target polynucleotide
  • the kit contains the well and membrane components of the examples.
  • the film is preferably formed from components. Pores are preferably present in the membrane.
  • the kit may comprise the components of any of the membranes disclosed above, such as amphiphilic or triblock copolymer membranes.
  • the kit may further comprise a polynucleotide binding protein. Any of the polynucleotide binding proteins discussed above can be used.
  • the membrane is an amphiphilic layer, a solid state layer, or a lipid bilayer.
  • the kit may further comprise one or more anchors for coupling the polynucleotide to the membrane.
  • the kit is preferably for characterizing double-stranded polynucleotides, and preferably comprises Y adapters and hairpin loop adapters.
  • Y adapters preferably have one or more helicases attached, and hairpin loop adapters preferably have one or more molecular brakes attached.
  • Y adapters preferably comprise one or more first anchors for coupling the polynucleotide to the membrane
  • hairpin loop adapters preferably comprise one or more second anchors for coupling the polynucleotide to the membrane Anchor, and the strength of coupling of the hairpin loop adapter to the membrane is preferably greater than that of the Y adapter to the membrane.
  • the kit may additionally comprise one or more other reagents or instruments that enable performance of any of the above-mentioned embodiments.
  • reagents or instruments include one or more of the following: a suitable buffer (aqueous solution), a device for obtaining a sample from an individual (such as a container or instrument containing a needle), a device for amplifying and/or expressing a polynucleotide device, or voltage or patch clamp equipment.
  • the reagents may be present in the kit in a dry form such that the fluid sample resuspends the reagents.
  • the kit may also optionally contain instructions enabling use of the kit with the methods of the invention or details as to which organisms may use the methods.
  • the invention also provides an apparatus for characterizing a target analyte (eg, a target polynucleotide).
  • the device includes single or multiple protein monomer mutants/protein pores, and single or multiple membranes.
  • a mutant/protein pore of said protein monomer is preferably present in said membrane.
  • the number of pores and membranes is preferably equal. Preferably there is a single pore in each membrane.
  • the apparatus preferably also includes instructions for implementing the methods of the embodiments.
  • the device can be any conventional device for analyte analysis, eg, an array or a chip. Any of the embodiments discussed in connection with the method of an embodiment applies equally to the device.
  • the device may also include any of the features present in the kits described herein.
  • the equipment used in the embodiment can specifically be the gene sequencer QNome-9604 of Qitan Technology.
  • the wild-type porin is from Gulbenkiania indica
  • the amino acid sequence of the wild-type porin is SEQ ID NO:1
  • the nucleotide sequence encoding the amino acid sequence is shown in SEQ ID NO:2.
  • Mutant 1 of the porin monomer is that the wild-type porin has multiple mutations corresponding to SEQ ID NO: 1, specifically Q62L, K67R, D71N, S72P, and Y74T.
  • the protein pore of mutant 1 that includes a porin monomer is mutant pore 1.
  • the amino acid sequence of mutant 1 of the protein monomer is shown in SEQ ID NO:16.
  • the wild-type porin is from Gulbenkiania indica
  • the amino acid sequence of the wild-type porin is SEQ ID NO: 1
  • the nucleotide sequence of the sequence encoding this amino acid is shown in SEQ ID NO: 2.
  • Mutant 2 of the porin monomer is that the wild-type porin has multiple mutations corresponding to SEQ ID NO: 1, specifically Q62L, K67R, D71N, S72P, Y74 deletion, E110N, E119N, E126N, and K209P.
  • the protein pore of mutant 2 that includes a porin monomer is mutant pore 2.
  • the amino acid sequence of mutant 2 of the protein monomer is shown in SEQ ID NO:17.
  • the wild-type porin is from Gulbenkiania indica
  • the amino acid sequence of the wild-type porin is SEQ ID NO: 1
  • the nucleotide sequence of the sequence encoding this amino acid is shown in SEQ ID NO: 2.
  • Mutant 3 of the porin monomer is that the wild-type porin has multiple mutations corresponding to SEQ ID NO: 1, specifically K67R, D71A, S72P, Y74 deletion, and S75 deletion.
  • the protein pore of mutant 3 that includes a porin monomer is mutant pore 3.
  • the amino acid sequence of mutant 3 of the protein monomer is shown in SEQ ID NO:18.
  • the wild-type porin is from Gulbenkiania indica
  • the amino acid sequence of the wild-type porin is SEQ ID NO: 1
  • the nucleotide sequence of the sequence encoding this amino acid is shown in SEQ ID NO: 2.
  • Mutant 4 of the porin monomer is that the wild-type porin has multiple mutations corresponding to SEQ ID NO: 1, specifically K67R, T69S, A70P, D71A, S72Y, S73A, and Y74 deletions.
  • the protein pore of mutant 4 that includes a porin monomer is mutant pore 4.
  • the amino acid sequence of mutant 4 of the protein monomer is shown in SEQ ID NO:19.
  • FIG. 4A is a side view 400 of a predicted protein structure model, where a protein monomer 402 is shown in a darker color.
  • FIG. 4B is a top view 404 of the surface structure model, where a darker part shows a protein monomer 406 .
  • FIG. 4C is a streamer structure model diagram 408 , and the darker part is the protein monomer 410 .
  • Figure 5 shows the distribution of amino acid residues and the diameter of the constriction zone of the wild-type channel.
  • the diameter of the channel in the constriction zone between the two porin monomers 502 and 504 is at most followed by The minimum diameter is Shown in the middle is the amino acid composition of the constriction structure, namely T69, S73 and Y74.
  • Figure 6A shows the surface potential map of the wild-type channel monomer, where the color depth represents the electrical strength.
  • Figure 6B shows the stick model of the monomer streamer model and the distribution of amino acid residues in the constriction region. The amino acid composition and numbering of the loop in the constriction region are enlarged, and part 602 is the amino acid residue pointing to the central region of the protein pore.
  • Fig. 7 shows the distribution characteristics of amino acid residues in the constriction zone of mutation hole 1 and the diameter of the constriction zone.
  • the stick model shows the distribution of key amino acid residues in the narrow region of the mutant pore.
  • the mutant structure reduces the thickness of the constriction region.
  • the amino acid residues pointing to the center of the pore are threonine at position 70, serine at position 74, and threonine at position 75. acid.
  • the hydrogen bond interaction formed by amino acid residues at positions 65-79 may be closely related to the correct assembly of the channel complex.
  • the diameter of the narrowest region of the constriction zone between the two porin monomers 702 and 704 is about The widest area diameter is approx. Median diameter approx.
  • Figure 8 shows a cartoon schematic diagram of mutation hole 1 based on homology modeling.
  • Region 1 corresponds to the crown formation region
  • region 2 corresponds to the constriction and loops region
  • region 3 corresponds to the transmembrane ⁇ barrel region .
  • Fig. 9 shows the result of negative staining electron micrograph of mutation hole 1, and the arrow indicates the target protein particle.
  • Fig. 10 shows the results of two-dimensional classification by negative staining electron microscopy of mutant hole 1, the class indicated by the arrow shows that the oligomerization state of mutant hole 1 is 9-mer.
  • the DNA construct BS7-4C3-PLT was prepared.
  • the structure of BS7-4C3-PLT is shown in Figure 11, and the sequence information is as follows:
  • C3, C18, dSpacer and iSpC3 are marker sequences introduced to indicate the resolution characteristics of well sequencing.
  • the c rate control protein in Figure 11 is a helicase
  • Mph-MP1-E105C/A362C (with mutation E105C/A362C), the amino acid sequence is SEQ ID NO:14, and the nucleic acid sequence is SEQ ID NO:15.
  • Mutation hole 1 was used as a protein hole, and was detected by single-hole sequencing technology. After insertion of a single porin with amino acid sequence Mutant 1 into the phospholipid bilayer, buffer (625 mM KCl, 10 mM HEPES pH 8.0, 50 mM MgCl 2 ) was passed through the system to remove any excess Mutant 1 nanopores . The DNA construct BS7-4C3-PLT (1-2nM final concentration was added to the mutant 1 nanopore experimental system, after mixing, the buffer solution (625mM KCl, 10mM HEPES pH 8.0, 50mM MgCl 2 ) was flowed through the system to remove any excess DNA construct BS7-4C3-PLT.
  • buffer solution (625mM KCl, 10mM HEPES pH 8.0, 50mM MgCl 2 ) was flowed through the system to remove any excess DNA construct BS7-4C3-PLT.
  • helicase Mph-MP1-E105C/A362C, 15nM final concentration
  • fuel ATP 3mM final concentration
  • Mutation hole 1 opens at a voltage of ⁇ 180mV.
  • Fig. 12A shows the opening current and gating characteristics of mutant pore 1 at a voltage of ⁇ 180 mV.
  • FIG. 12B shows the condition of single-stranded nucleic acid passing through mutation hole 1 at +180mV voltage. Nucleic acids can pass through the pores. After adding ssnucleic acid, the downward line shows the signal of the nucleic acid passing through the hole.
  • the DNA construct BS7-4C3-PLT was sequenced through the mutant hole 1 by single-hole sequencing technology, and the nucleic acid sequencing signal generated by the sequencing system was added after the hole insertion was completed.
  • Figures 13A and 13B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 1. According to the signal characteristics, it can be concluded that mutant hole 1 has high-resolution potential for nucleic acid sequencing.
  • FIG. 14 is an enlarged result showing the current trace in the part of FIG. 13A .
  • the portion indicated by the dotted arrow shows the enlarged result of the current trace.
  • Figure 15 shows the chip test current trace when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant hole 1. These further indicate that mutant pore 1 has high resolution for nucleic acid sequencing.
  • the embodiment 8 uses the mutation hole 2 to perform the empty detection and via detection.
  • FIG. 16A shows the opening current and gating characteristics of mutant pore 2 at a voltage of ⁇ 180 mV.
  • FIG. 16B shows the condition of single-stranded nucleic acid passing through mutant hole 2 at +180mV voltage. Nucleic acids can pass through the pores. After adding ssnucleic acid, the downward line shows the signal of the nucleic acid passing through the hole.
  • FIGS 17A and 17B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 2. According to the signal characteristics, mutation hole 2 can be used for nucleic acid sequencing.
  • Figure 18 shows a zoomed in result of part of the current trace.
  • the portion indicated by the dotted arrow shows the enlarged result of the current trace.
  • Figure 19 shows the chip test current trace when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 2. These indicate that the mutation hole 2 can be used for nucleic acid sequencing.
  • the embodiment 9 uses the mutation hole 3 to perform the empty detection and via detection.
  • FIG. 20A shows the opening current and gating characteristics of mutant pore 3 at a voltage of ⁇ 180 mV.
  • FIG. 20B shows the condition of single-stranded nucleic acid passing through mutation hole 3 at a voltage of +180 mV. Nucleic acids can pass through the pores. After adding ssnucleic acid, the downward line shows the signal of the nucleic acid passing through the hole.
  • Figures 21A and 21B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 3. According to the signal characteristics, it can be concluded that nucleic acid sequencing of mutant hole 3 has high-resolution potential.
  • Figure 22 shows a zoomed in result of part of the current trace.
  • the portion indicated by the dotted arrow shows the enlarged result of the current trace.
  • the region of this single signal is enlarged and displayed, which further shows that mutant hole 3 has high resolution for nucleic acid sequencing.
  • Example 10 Similar to Example 7, in Example 10, the mutation hole 4 is used for empty testing and via testing.
  • FIG. 23A shows the opening current and gating characteristics of mutant pore 4 at a voltage of ⁇ 180 mV.
  • FIG. 23B shows the condition of single-stranded nucleic acid passing through mutant hole 4 at +180mV voltage. Nucleic acids can pass through the pores. After adding ssnucleic acid, the downward line shows the signal of the nucleic acid passing through the hole.
  • Figures 24A and 24B show exemplary current traces when the helicase Mph-MP1-E105C/A362C controls the translocation of the DNA construct BS7-4C3-PLT through mutant pore 4. According to the signal characteristics, mutation hole 4 can be used for nucleic acid sequencing.
  • Figure 25 shows a zoomed in result of part of the current trace.
  • the portion indicated by the dotted arrow shows the enlarged result of the current trace.
  • the enlarged display of the region of this single signal further proves that the mutant hole 4 can be used for nucleic acid sequencing.
  • 1% inoculum was transferred to ampicillin-resistant TB liquid medium for expanded culture, cultured at 37°C and 220 rpm, and its OD600 value was continuously measured.
  • IPTG Isopropyl ⁇ -D-Thiogalactoside
  • the cells were collected by centrifugation. The cells were resuspended in the disruption buffer and then crushed under high pressure, purified by Ni-NTA affinity chromatography, and the target eluted samples were collected. Mutants 2-4 of the porin monomer were purified as above.
  • FIG. 26 shows the protein purification results of mutant 1, and lanes 1-6 show the SDS-PAGE electrophoresis detection results of different fractions separated.
  • Fig. 27 shows the result of molecular sieve purification of the protein of mutant 1, and the position indicated by the arrow is the peak of the target protein.

Abstract

本发明属于靶分析物特性的表征技术领域,具体提供了一种孔蛋白单体的突变体、包含其的蛋白孔、以及其检测靶分析物的应用,其中所述孔蛋白单体的突变体的氨基酸包括SEQ ID NO:1所示的序列或与其具有至少99%、98%、97%、96%、95%、90%、80%、70%、60%或50%同一性的序列,并且所述孔蛋白单体的突变体的氨基酸包括在对应SEQ ID NO:1的K67、D71、S72、和Y74的一个或多个位置处的突变。

Description

孔蛋白单体的突变体、蛋白孔及其应用 技术领域
本发明属于靶分析物特性的表征技术领域,特别涉及一种孔蛋白单体的突变体、包含其的蛋白孔以及其检测靶分析物的应用。
背景技术
随着对核酸结构和序列的研究,核酸测序技术不断发展,成为生命科学研究的核心领域,对生物、化学、电学、生命科学、医学等领域的技术发展起到巨大的推动作用。利用纳米孔研究出新型的快速、准确、低成本、高精度及高通量的核酸测序技术是后人类基因组计划的热点之一。
纳米孔(Nanopore)测序技术,又被称为第四代测序技术,是一种以单链核酸分子作为测序单元,利用一个能够提供离子电流通道的纳米孔,使得单链核酸分子在电泳驱动下通过该纳米孔,当核酸通过纳米孔时,会减少纳米孔的电流,对产生的不同信号实时读取序列信息的基因测序技术。
纳米孔测序主要特点是:读长很长,准确率较高,错误区大都发生在均聚寡核苷酸区域。纳米孔测序不但可实现天然DNA和RNA测序还可直接获取DNA和RNA的碱基修饰信息,例如它能够直接读取出甲基化的胞嘧啶,而不必像二代测序方法那样需要事先对基因组进行重亚硫酸盐(bisulfite)处理,这对于在基因组水平直接研究表观遗传相关现象有极大推动。纳米孔检测技术作为一个新型平台,具有低成本、高通量、非标记等优势。
纳米孔分析技术起源于Coulter计数器的发明以及单通道电流的记录技术。生理与医学诺贝尔奖获得者Neher和Sakamann在1976年利用膜片钳技术测量膜电势,研究膜蛋白及离子通道,推动了纳米 孔测序技术的实际应用进程。1996年,Kasianowicz等提出了利用α-溶血素对DNA测序的新设想,是生物纳米孔单分子测序的里程碑标志。随后,MspA孔蛋白、噬菌体Phi29连接器等生物纳米孔的研究报道,丰富了纳米孔分析技术的研究。Li等在2001年开启了固态纳米孔研究的新时代。受限于半导体和材料工业的发展,固态纳米孔测序进展缓慢。
纳米孔测序技术的关键点之一在于所设计的一种特殊生物纳米孔,孔内缢缩区形成的读取头结构可在当单链核酸(例如ssDNA)分子通过纳米孔时,造成孔道电流的阻塞,从而短暂地影响流过纳米孔的电流强度(每种碱基所影响的电流变化幅度是不同的),最后高灵敏度的电子设备检测到这些变化从而鉴定所通过的碱基。目前采用蛋白孔作为纳米孔进行测序,孔蛋白主要以大肠杆菌为来源。
目前纳米孔蛋白单一,需要开发替代的纳米孔蛋白实现纳米孔测序技术。孔蛋白也与测序精度密切相关,而且孔蛋白还涉及与控速蛋白的相互作用的模式变化,进一步优化孔蛋白与控速蛋白相互作用界面的稳定性,对提高测序数据的一致性和稳定性有积极影响。纳米孔测序技术的准确率也有待改善,因此,需要开发改进的纳米孔蛋白,以进一步提高纳米孔测序的分辨率。
发明内容
为解决上述问题,本发明实施例的目的在于提供一种替代的孔蛋白单体的突变体、包含其的蛋白孔、及其应用。
第一方面,本发明实施例提供了一种孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸包括SEQ ID NO:1所示的序列或与其具有至少99%、98%、97%、96%、95%、90%、80%、70%、60%或50%同一性的序列,或由其组成,并且所述孔蛋白单体的突变体的氨基酸包括在对应SEQ ID NO:1的K67、D71、S72、和Y74中的一个或 多个位置处的突变;
K67、D71、S72、和Y74中的一个或多个具体为:(1)K67;(2)D71;(3)S72;(4)Y74;(5)K67和D71;(6)K67和S72;(7)K67和Y74;(8)D71和S72;(9)D71和Y74;(10)S72和Y74;(11)K67、D71和S72;(12)K67、D71和Y74;(13)D71、S72和Y74;(14)K67、D71、S72和Y74。
优选地,所述孔蛋白单体的突变体的氨基酸包括在对应SEQ ID NO:1的62-209、62-74、62-75、65-79、67-209、67-75、或67-74的一个或多个位置处的突变。
优选地,所述孔蛋白单体的突变体的氨基酸包括:
(1)对应SEQ ID NO:1的Q62、K67、D71、S72、和Y74的一个或多个位置处具有氨基酸的插入、缺失和/或替换;(2)对应SEQ ID NO:1的Q62、K67、D71、S72、Y74、E110、E119、E126、和K209的一个或多个位置处具有氨基酸的插入、缺失和/或替换;(3)对应SEQ ID NO:1的K67、D71、S72、Y74、和S75的一个或多个位置处具有氨基酸的插入、缺失和/或替换;或者(4)对应SEQ ID NO:1的K67、T69、A70、D71、S72、S73、和Y74的一个或多个位置处具有氨基酸的插入、缺失和/或替换。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸突变选自以下:
(a)对应SEQ ID NO:1的Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;D71突变为N、E、D、Q中的0至4种;S72突变为P中的0至1种;Y74突变为S、C、U、T、M中的0至5种;
(b)对应SEQ ID NO:1的Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;D71突变为N、E、D、Q中的0至4种;S72突变为P中的0至1种;Y74突变为F、Y、W中0 至3种;E110突变为N、D、E、Q中的0至4种;E119突变为N、D、E、Q中的0至4种;E126突变为N、D、E、Q中的0至4种;K209突变为P中的0至1种;
(c)对应SEQ ID NO:1的K67突变为R、H、K中的0至3种;D71突变为G、A、V、L、I中的0至5种;S72突变为P中的0至1种;Y74突变为F、Y、W中0至3种;S75突变为C、U、S、T、M中的0至5种;和
(d)对应SEQ ID NO:1的K67突变为R、H、K中的0至3种;T69突变为S、C、T、U、M中的0至5种;A70突变为P中的0至1种;D71突变为G、A、V、L、I中的0至5种;S72突变为F、Y、W中0至3种;S73突变为G、A、V、L、I中的0至5种;Y74突变为F、Y、W中0至3种。
在一个实施例中,孔蛋白单体的突变体的氨基酸突变选自以下:
(a)对应SEQ ID NO:1的Q62突变为G、A、V、L、或I中;K67突变为R、或H;D71突变为N、E、或Q;S72突变为P;Y74突变为S、C、U、T、或M;
(b)对应SEQ ID NO:1的Q62突变为G、A、V、L、或I;K67突变为R、或H;D71突变为N、E、或Q;S72突变为P;Y74缺失;E110突变为N、D、或Q;E119突变为N、D、或Q;E126突变为N、D、或Q;K209突变为P;
(c)对应SEQ ID NO:1的K67突变为R、或H;D71突变为G、A、V、L、或I;S72突变为P;Y74缺失;S75缺失;和
(d)对应SEQ ID NO:1的K67突变为R、或H;T69突变为S、C、U、或M;A70突变为P;D71突变为G、A、V、L、或I;S72突变为F、Y、或W;S73突变为G、A、V、L、或I;Y74缺失。
在一个实施例中,孔蛋白单体的突变体的氨基酸突变选自以下:
(a)对应SEQ ID NO:1的Q62L、K67R、D71N、S72P、和Y74T;
(b)对应SEQ ID NO:1的Q62L、K67R、D71N、S72P、Y74缺失、 E110N、E119N、E126N、和K209P;
(c)对应SEQ ID NO:1的K67R、D71A、S72P、Y74缺失、和S75缺失;和
(d)对应SEQ ID NO:1的K67R、T69S、A70P、D71A、S72Y、S73A、和Y74缺失。
第二方面,本发明实施例提供了一种孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸包括SEQ ID NO:1所示的序列或与其具有至少99%、98%、97%、96%、95%、90%、80%、70%、60%或50%同一性的序列,并且所述孔蛋白单体的突变体包括:
(1)在对应SEQ ID NO:1的Q62、K67、T69、A70、D71、S72、S73、Y74、S75、E110、E119、E126、和K209的一个或多个位置处具有突变;
(2)在对应SEQ ID NO:1的Q62L、K67R、T69S、A70P、D71N/D71A、S72P/S72Y、S73A、Y74T/Y74缺失、S75缺失、E110N、E119N、E126N、和K209P的一个或多个位置处具有突变;
(3)在对应SEQ ID NO:1的K67、D71、S72、和/或Y74处具有突变,并额外在Q62、T69、A70、S73、S75、E110、E119、E126、和K209的至少一个位置处具有突变;
(4)在对应SEQ ID NO:1的K67R、D71N/D71A、S72P/S72Y、和/或Y74T/Y74缺失处具有突变;或者
(5)在对应SEQ ID NO:1的K67R、D71N/D71A、S72P/S72Y、和/或Y74T/Y74缺失处具有突变,并额外在Q62L、T69S、A70P、S73A、S75缺失、E110N、E119N、E126N、和K209P的至少一个位置处具有突变。
在一个实施例中,在第二方面的孔蛋白单体的突变体的(1)中的突变中:Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;T69突变为S、C、T、U、M中的0至5种;A70突变为P中的0至1种;D71突变为N、E、D、Q中的0至4种,或者 突变为G、A、V、L、I中的0至5种;S72突变为P中的0至1种,或者突变为F、Y、W中0至3种;S73突变为G、A、V、L、I中的0至5种;Y74突变为S、C、U、T、M中的0至5种,或者突变为F、Y、W中0至3种;S75突变为C、U、S、T、M中的0至5种;E110突变为N、D、E、Q中的0至4种;E119突变为N、D、E、Q中的0至4种;E126突变为N、D、E、Q中的0至4种;K209突变为P中的0至1种。
0至N种包括0、1、2、3、4……N种。例如,Q62突变为G、A、V、L、I中的0至5种,指Q62突变为G、A、V、L、I中的0种,1种,2种,3种,4种或5种氨基酸。
在一个实施例中,当突变为1种氨基酸时,突变前后的氨基酸不相同。例如,对于T69突变为S、C、T、U、M中的0至5种而言,当此为1种时,T69并不突变为T,而只能突变为S、C、U、M中的任一种;当此为2种时,T69可突变为S、C、T、U、M中的中的任意两种,以此类推。
当突变为0种氨基酸时,指该位氨基酸缺失。例如,当Y74突变为F、Y、W的0种时,指Y74缺失。
第三方面,本发明实施例提供了一种蛋白孔,包括至少一个孔蛋白单体的突变体。
第四方面,本发明实施例提供了一种用于表征靶分析物的复合物,其特征在于:所述的蛋白孔及与其配合使用的控速蛋白。
第五方面,本发明实施例提供了编码孔蛋白单体的突变体、蛋白孔、或复合物的核酸。
第六方面,本发明实施例提供了包含所述核酸的载体或遗传工程化的宿主细胞。
第七方面,本发明实施例提供了孔蛋白单体的突变体、其蛋白孔、复合物、核酸、载体或宿主细胞在检测靶分析物存在、不存在或一个或多个特征或制备检测靶分析物存在、不存在或一个或多个特征的产品中的应用。
第八方面,本发明实施例提供了一种产生蛋白孔或其多肽的方法,包括用所述的载体转化所述的宿主细胞,诱导所述宿主细胞表达所述的蛋白孔或其多肽。
第九方面,本发明实施例提供了一种用于确定靶分析物存在、不存在或一个或多个特征的方法,包括:
a.使靶分析物与蛋白孔、复合物、或复合物中的蛋白孔接触,使得所述靶分析物相对于所述蛋白孔移动;以及
b.在所述靶分析物相对于所述蛋白孔移动时获取一个或多个测量值,从而确定所述靶分析物的存在、不存在或一个或多个特征。
在一个实施例中,所述方法包括:所述靶分析物与存在于膜中的所述蛋白孔相互作用从而使得所述靶分析物相对所述蛋白孔移动。
在一个实施例中,靶分析物是核酸分子。
在一个实施例中,用于确定靶分析物存在、不存在或一个或多个特征的方法包括将所述靶分析物偶联到膜上;和所述靶分析物与存在于所述膜中的所述蛋白孔相互作用从而使得所述靶分析物相对所述蛋白孔移动。
第十方面,本发明实施例提供了一种用于确定靶分析物存在、不 存在或一个或多个特征的试剂盒,包括所述的孔蛋白单体的突变体、所述的蛋白孔、所述的复合物、所述的核酸、或所述的载体或宿主,和所述的膜的组分。
第十一方面,本发明实施例提供了一种用于确定靶分析物存在、不存在或一个或多个特征的装置,包括所述的蛋白孔或所述复合物,和所述的膜。
在一个实施例中,所述靶分析物包括多糖、金属离子、无机盐、聚合物、氨基酸、肽、蛋白、核苷酸、寡核苷酸、多核苷酸、染料、药物、诊断剂、爆炸物或环境污染物;
优选地,所述靶分析物包括多核苷酸,
更优选地,所述多核苷酸包括DNA或RNA;和/或,所述一个或多个特征选自(i)所述多核苷酸的长度;(ii)所述多核苷酸的一致性;(iii)所述多核苷酸的序列;(iv)所述多核苷酸的二级结构和(v)所述多核苷酸是否经修饰;和/或,所述复合物中所述控速蛋白包括多核苷酸结合蛋白。
附图说明
所描述的附图仅是示意性的而非限制性的。
图1示出了根据一个实施例的纳米孔的基本工作原理。
图2示出了根据一个实施例的DNA测序的示意图。
图3示出了根据一个实施例核苷酸穿过蛋白孔时相应的堵孔信号。
图4A、4B和4C示出了根据一个实施例的野生型蛋白孔通道表面结构和飘带图模型。图4A为表面结构模型侧视图,图4B为表面结构模型俯视图,及图4C为飘带结构模型。
图5示出了根据一个实施例的野生型通道缢缩区氨基酸残基分布和缢缩区直径。
图6A示出了根据一个实施例的野生型通道单体表面电势图,图 6B示出了单体飘带模型及其缢缩区氨基酸残基分布的棍棒模型。
图7示出了根据一个实施例的突变孔1缢缩区氨基酸残基分布特征和缢缩区直径。
图8示出了根据一个实施例的突变孔1基于同源建模的卡通示意图。
图9示出了根据与一个实施例的突变孔1负染电镜照片结果,箭头指示的是目标蛋白颗粒。
图10示出了根据与一个实施例的突变孔1负染电镜二维分类结果,箭头指示的一类显示突变孔1的寡聚状态为9聚体。
图11示出了根据一个实施例的DNA构建体BS7-4C3-PLT的结构。
图12A示出了根据一个实施例的突变孔1在±180mV电压下开孔电流及其门控特征。
图12B示出了根据一个实施例的突变孔1在+180mV电压下的核酸过孔情况。
图13A和13B示出了根据一个实施例当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔1移位时的示例电流轨迹。
图14是图13A实施例单独一条信号的区域放大显示图。
图15示出了根据一个实施例当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔1移位时的芯片测试电流轨迹(两条轨迹的y轴坐标=电流(pA),x轴坐标=采样点(个))。
图16A示出了根据一个实施例的突变孔2在±180mV电压下开孔电流及其门控特征。
图16B示出了根据一个实施例的突变孔2在+180mV电压下的核酸过孔情况。
图17A和17B示出了根据一个实施例当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔2移位时的示例电流轨迹。
图18是图17A实施例单独一条信号的区域放大显示图。
图19示出了根据一个实施例了了当解旋酶 Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔2移位时的芯片测试电流轨迹(两条轨迹的y轴坐标=电流(pA),x轴坐标=采样点(个))。
图20A示出了根据一个实施例的突变孔3在±180mV电压下开孔电流及其门控特征。
图20B示出了根据一个实施例的突变孔3在+180mV电压下的核酸过孔情况。
图21A和21B示出了根据一个实施例当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔3移位时的示例电流轨迹。
图22是图21A实施例单独一条信号的区域放大显示图。
图23A示出了根据一个实施例的突变孔4在±180mV电压下开孔电流及其门控特征。
图23B示出了根据一个实施例的突变孔4在+180mV电压下的核酸过孔情况。
图24A和24B示出了根据一个实施例当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔4移位时的示例电流轨迹。
图25是图24B实施例单独一条信号的区域放大显示图。
图26示出了根据一个实施例突变体1的蛋白纯化结果,1-6泳道显示的是分离的不同组分的SDS-PAGE电泳检测结果。
图27示出了根据一个实施例突变体1的蛋白的分子筛纯化结果,箭头指示位置为目标蛋白峰。
具体实施方式
应理解,所公开的产品和方法的不用应用可根据所属领域的特定需要来调适。还应理解,本文所用的术语仅出于描述本发明的特定实施例的目的,并且不打算是限制性的。
另外除非上下文另外明确规定,否则如本说明书和权利要求书中 所使用,单数形式“一”和“所述”包括多个。举例来说,提及“核苷酸”包括两个或更多个核苷酸,提及“一个解旋酶”包括两个或更多解旋酶。
如本文所使用的,术语“包括”是指必须包括任何所列举的要素,并且也可以任选地包括其他元素。“由...组成”是指不包括所有未列举的元素。由这些术语中的每一个定义的实施例在本发明的范围内。
如本文所用的“核苷酸序列”、“DNA序列”或“核酸分子”是指任何长度的核苷酸(核糖核苷酸或脱氧核糖核苷酸)的聚合形式。该术语仅指分子的一级结构。因此,该术语包括双链和单链DNA和RNA。
本文所用的术语“核酸”是指单链或双链共价连接的核苷酸序列,其中每个核苷酸上的3'和5'末端通过磷酸二酯键连接。核苷酸可以由脱氧核糖核苷酸碱基或核糖核苷酸碱基组成。核酸可以包括DNA和RNA,并可以在体外合成制备或从自然资源中分离。核酸可以进一步包括修饰的DNA或RNA,例如甲基化的DNA或RNA,或经过翻译后修饰的RNA,例如用7-甲基鸟苷进行的5'-盖帽,3'-端加工,例如裂解和多腺苷化,以及拼接。核酸还可以包括合成核酸(XNA),例如己糖醇核酸(HNA),环己烯核酸(CeNA),苏糖核酸(TNA),甘油核酸(GNA),锁核酸(LNA)和肽核酸(PNA)。核酸(或多核苷酸)的大小通常用双链多核苷酸的碱基对(bp)数目表示,或在单链多核苷酸的情况下用核苷酸的数目(nt)表示。1千个bp或nt等于一个千碱基对(kb)。长度小于约40个核苷酸的多核苷酸通常称为“寡核苷酸”,并且可以包含用于DNA操作(例如通过聚合酶链式反应(PCR))中的引物。
多核苷酸,例如核酸,是包含两个或多个核苷酸的大分子。所述多核苷酸或核酸可以包含任意核苷酸的任意组合。所述核苷酸可以是天然存在的或人工合成的。所述多核苷酸中的一个或多个核苷酸可以被氧化或甲基化。所述多核苷酸中的一个或多个核苷酸可以被损伤。 例如,所述多核苷酸可以包含嘧啶二聚体。这种二聚体通常与由紫外线造成的损伤有关并且是皮肤黑色素瘤的主要成因。所述多核苷酸中的一个或多个核苷酸可以被修饰,例如用常规的标记或标签。所述多核苷酸可以包含一个或多个无碱基的(即缺少核碱基)、或缺少核碱基和糖(即为C3)的核苷酸。
所述多核苷酸中的核苷酸可以任意方式相互连接。所述核苷酸通常通过其糖基和磷酸基团连接,如在核酸中一样。所述核苷酸可以通过其核碱基连接,如在啼啶二聚体中一样。
多核苷酸可以是单链或双链的。多核苷酸的至少一部分优选是双链的。多核苷酸可以是核酸,例如脱氧核糖核酸(DNA)或核糖核酸(RNA)。多核苷酸可以包含一条RNA链,所述RNA链杂合到一条DNA链。多核苷酸可以是任意现有技术已知的合成核酸,例如肽核酸(PNA),甘油核酸(GNA),苏糖核酸(TNA),锁核酸(LNA)或具有核苷酸侧链的其他合成聚合物。所述PNA骨架是由通过肽键连接的重复的N-(2-氨基乙基)-甘氨酸单元组成。所述GNA骨架是由通过磷酸二酯键连接的重复乙二醇单元组成。所述TNA骨架是由通过磷酸二酯键连接在一起的重复苏糖基组成。LNA由上述核糖核酸形成,具有连接核糖部分中2’氧和4’碳的额外桥连结构。桥连的核酸(BNA)是修饰的RNA核苷酸。它们也可以称为限制的或不可接近的RNA13BNA单体可以含有5元,6元或甚至7元桥连结构并带有“固定的”C3’-内糖折叠结构(C3’-endo sugar puckering)。所述桥连结构被合成引入核糖的2’,4’-位,以产生2’,4’-BNA单体。
多核苷酸最优选核糖核酸(RNA)或脱氧核糖核酸(DNA)。多核苷酸可以为任意长度。例如,多核苷酸的长度可以是至少10,至少50,至少100,至少150,至少200,至少250,至少300,至少400或至少500个核苷酸或核苷酸对。所述多核苷酸的长度可以为1000个或更多个核苷酸或核苷酸对,5000个或更多个核苷酸或核苷酸对或100000个 或更多个核苷酸或核苷酸对。
任意数量的多核苷酸可以被研究。例如实施例的方法可以涉及表征2、3、4、5、6、7、8、9、10、20、30、50、100个或更多个多核苷酸。如果两个或更多个多核苷酸被表征,它们可以是不同的多核苷酸或相同多核苷酸的情形。
多核苷酸可以是天然存在的或人工合成的。例如,所述方法可用于验证所制备的寡核苷酸的序列。所述方法通常在体外进行。
在本公开的上下文中,术语“氨基酸”以其最广义的意义使用,并且意指包括含有胺(NH 2)和羧基(COOH)官能团以及每种氨基酸所特有的侧链(例如R基团)的有机化合物。在一些实施方案中,氨基酸是指天然存在的Lα-氨基酸或残基。本文使用天然存在的氨基酸的常用单字母和三字母缩写:A=Ala;C=Cys;D=Asp;E=Glu;F=Phe;G=Gly;H=His;I=Ile;K=Lys;L=Leu;M=Met;N=Asn;P=Pro;Q=Gln;R=Arg;S=Ser;T=Thr;V=Val;W=Trp;和Y=Tyr(Lehning e r,A.L.,(1 975)BioChemis try,第2版,第71-92页,Worth Publishers,New York)。通用术语“氨基酸”还包括D-氨基酸、逆-反氨基酸以及经化学修饰的氨基酸(诸如氨基酸类似物),通常不并入蛋白质中的天然存在的氨基酸(诸如正亮氨酸)及具有本领域已知是氨基酸特征的性质的化学合成化合物(诸如β-氨基酸)。例如,在氨基酸的定义中包括苯丙氨酸或脯氨酸的类似物或模拟物,这些类似物或模拟物允许如同天然Phe或Pro一样对肽化合物进行相同的构象限制。此类类似物和模拟物在本文中称为相应氨基酸的“功能等效物”。Roberts和Vellaccio,The Peptides:Analysis,Synthesis,Biology,Gross和Meiehofer编辑,第5卷第341页,Academic Press,Inc.,N.Y.1983列出了氨基酸的其他实例,其通过引用并入本文。
术语“蛋白质”、“多肽”和“肽”在本文中进一步可互换使用, 是指氨基酸残基的聚合物以及氨基酸残基的变体和合成类似物。因此,这些术语适用于其中一个或多个氨基酸残基是合成的非天然存在的氨基酸,诸如相应天然存在的氨基酸的化学类似物的氨基酸聚合物,以及适用于天然存在的氨基酸聚合物。多肽还可经历成熟或翻译后修饰过程,这些过程可以包括但不限于:糖基化、蛋白水解裂解、脂化、信号肽裂解、前肽裂解、磷酸化等。
蛋白质的“同源物”涵盖相对于所讨论的未修饰的或野生型蛋白质具有氨基酸取代、缺失和/或插入并且具有与它们所来源的未修饰的蛋白质相似的生物和功能活性的肽、寡肽、多肽、蛋白质和酶。如本文所用,术语“氨基酸同一性”是指在比较窗口中,在氨基酸-氨基酸的基础上,序列相同的程度。因此,通过以下方式计算“序列同一性百分比”:在比较窗口中比较两个最佳比对的序列,确定两个序列中出现相同氨基酸残基(例如,Ala、Pro、Ser、Thr、Gly、Val、Leu、Ile、Phe、Tyr、Trp、Lys、Arg、His、Asp、Glu、Asn、Gln、Cys和Met)的位置数量以得到匹配位置数量,将匹配位置数量除以比较窗口中的位置总数(即窗口大小),并将结果乘以100得到序列同一性百分比。
序列同一性也可以是全长多核苷酸或多肽的片段或部分。因此,序列可与全长参考序列仅有50%的整体序列同一性,但是特定区、结构域或亚基的序列可与参考序列具有80%、90%或高达99%的序列同一性。
术语“野生型”是指从天然存在的来源分离的基因或基因产物。野生型基因是在群体中最常观测到的基因,因此任意地设计为该基因的“正常”或“野生型”形式。相反,术语“经修饰的”、“突变”或“变体”是指与野生型基因或基因产物相比,显示出序列修饰(例如,取代、截短或插入)、翻译后修饰和/或功能性质(例如,特性改变)的基因或基因产物。注意,可以分离天然存在的突变体;这些突变体是通过与野生型基因或基因产物相比,它们具有改变的特征这一事实来鉴 定的。引入或取代天然存在的氨基酸的方法是本领域众所周知的。例如,可以通过在编码突变的单体的多核苷酸中的相关位置用精氨酸的密码子(CGT)置换蛋氨酸的密码子(ATG),用精氨酸(R)取代蛋氨酸(M)。引入或取代非天然存在的氨基酸的方法也是本领域众所周知的。例如,可以通过在用于表达突变的单体的IVTT系统中包括合成的氨酰基-tRNA来引入非天然存在的氨基酸。可替代地,可以通过在Gulbenkiania indica中表达突变的单体来引入非天然存在的氨基酸,Gulbenkiania indica在那些特定氨基酸的合成(即非天然存在的)类似物的存在下对于特定氨基酸而言为营养缺陷型。如果突变的单体是使用部分肽合成法产生的,则它们也可以通过裸连接产生。保守性取代将氨基酸置换为具有相似化学结构、相似化学性质或相似侧链体积的其他氨基酸。引入的氨基酸可以具有与它们所置换的氨基酸相似的极性、亲水性、疏水性、碱性、酸性、中性或电荷。可替代地,保守性取代可以引入另一种芳族或脂肪族氨基酸代替预先存在的芳族或脂肪族氨基酸。保守性氨基酸变化是本领域众所周知的,并且可以根据下表1中定义的20种主要氨基酸的性质进行选择。在氨基酸具有相似极性的情况下,这也可以参考表2中氨基酸侧链的亲水性量表来确定。
表1-氨基酸的化学性质
Figure PCTCN2021123209-appb-000001
表2-亲水性量表
侧链 亲水性
Ile,I 4.5
Val,V 4.2
Leu,L 3.8
Phe,F 2.8
Cys,C 2.5
Met,M 1.9
Ala,A 1.8
Gly,G -0.4
Thr,T -0.7
Ser,S -0.8
Trp,W -0.9
Tyr,Y -1.3
Pro,P -1.6
His,H -3.2
Glu,E -3.5
Gln,Q -3.5
Asp,D -3.5
Asn,N -3.5
Lys,K -3.9
Arg,R -4.5
众所周知,性质相似氨基酸彼此之间保守性替换通常不会影响肽序列的活性,保守性替换如表3。
表3保守氨基酸替换
Figure PCTCN2021123209-appb-000002
突变或经修饰的蛋白质、单体或肽也可以任何方式在任何位点进行化学修饰。突变或经修饰的单体或肽优选通过分子与一个或多个半胱氨酸的附接(半胱氨酸连接),分子与一个或多个赖氨酸的附接,分子与一个或多个非天然氨基酸的附接,表位的酶修饰或末端的修饰来进行化学修饰。进行此类修饰的合适方法是本领域众所周知的。经修饰的蛋白质、单体或肽的突变体可以通过任何分子的附接进行化学修饰。例如,经修饰的蛋白质、单体或肽的突变体可以通过染料或荧光团的附接进行化学修饰。在一些实施方案中,用促进包含单体或肽的孔与靶核苷酸或靶多核苷酸序列之间的相互作用的分子衔接子化学修饰突变或经修饰的单体或肽。分子衔接子优选为环状分子、环糊精、能够杂交的物质、DNA结合剂或嵌入剂、肽或肽类似物、合成聚合物、芳族平面分子、带正电荷的小分子或能够氢键键合的小分子。
衔接子的存在改善了孔和核苷酸或多核苷酸序列的主-客体化学,从而改善了由突变的单体形成的孔的测序能力。主-客体化学的原理是本领域众所周知的。衔接子对孔的物理或化学性质有影响,这种影响改善了孔与核苷酸或多核苷酸序列的相互作用。衔接子可以改变孔的 桶或通道的电荷,或与核苷酸或多核苷酸序列特异性相互作用或结合,从而促进其与孔的相互作用。
“蛋白孔”是跨膜蛋白结构,其限定了允许分子和离子从膜的一侧易位到另一侧的通道或孔。离子物质通过孔的易位可以由施加到孔任一侧的电位差驱动。“纳米孔”是一种蛋白孔,其中分子或离子所通过的通道的最小直径为纳米级(10 -9米)。在一些实施方案中,蛋白孔可以是跨膜蛋白孔。蛋白孔的跨膜蛋白结构本质上可以是单体或寡聚体。通常,孔包含多个围绕中心轴排列的多肽亚基,从而形成基本上垂直于纳米孔所驻留的膜延伸的蛋白内衬通道。多肽亚基的数量没有限制。通常,亚基的数量为5至30,合适地亚基的数量为6至10。可替代地,亚基的数量不像在产气荚膜梭菌溶素(perfringolysin)或相关大膜孔的情况下那样定义。纳米孔内形成蛋白内衬通道的蛋白亚基部分通常包含可包括一个或多个跨膜β-桶和/或α-螺旋部分的二级结构基序。
在一个实施例中,蛋白孔包含一个或多个孔蛋白单体。每个孔蛋白单体可以来自Gulbenkiania indica。在一个实施例中,蛋白孔包括一个或多个孔蛋白单体的突变体(即一个或多个孔蛋白突变的单体)。
在一个实施例中,孔蛋白来自生物界野生型蛋白、野生型同源物、或其突变体。突变体可以成为修饰的孔蛋白或孔蛋白突变体。突变体中的修饰包括但不限于本文公开的任何一种或多种修饰或所述修饰的组合。在一个实施例中,生物界野生型蛋白是来自Gulbenkiania indica的蛋白。在一个实施例中,生物界野生型蛋白是来自Gulbenkiania indica(Gene:Ga0061063_1194)的蛋白。
在一个实施例中,孔蛋白同源物是指与SEQ ID NO:1所示的蛋白具有至少99%、98%、97%、96%、95%、94%、93%、92%、91%、90%、85%、80%、75%、70%、65%、60%、55%、50%的完整序列同一性的 多肽。
在一个实施例中,孔蛋白同源物是指与SEQ ID NO:2所示的蛋白的编码多核苷酸具有至少99%、98%、97%、96%、95%、94%、93%、92%、91%、90%、85%、80%、75%、70%、65%、60%、55%、50%的完整序列同一性的多核苷酸。所述多核苷酸序列可以包含基于遗传密码的简并性而与SEQ ID NO:2不同的序列。
多核苷酸序列可以采用本领域的标准方法进行衍生和复制。编码野生型孔蛋白的染色体DNA可以从产生孔的生物体如Gulbenkiania indica中提取。编码所述孔亚基的基因可以使用包括特异性引物的PCR进行扩增。所述扩增的序列随后可以进行定点突变。定点突变的合适方法是本领域已知的并且包括,例如,组合链式反应。编码实施例的构建的多核苷酸可以采用本领域公知的技术制备,例如在Sambrook,J.and Russell,D.(2001).Molecular Cloning A Laboratory Manual,3rd Edition.Cold Spring Harbor Laboratory Press,Cold Spring Harbor,NY中描述的那些。
所得到的多核苷酸序列随后可以被整合到重组可复制载体上,例如克隆载体。所述载体可以用于在相容的宿主细胞中复制所述多核苷酸。因此多核苷酸序列可以通过将多核苷酸引入到可复制载体中,将载体引入相容的宿主细胞中,并在引起载体复制的条件下使宿主细胞生长而进行制备。所述载体可以从所述宿主细胞中回收。
纳米孔或蛋白孔的基本工作原理
在一个实施例中,在充满电解液的腔100内,带有纳米级小孔的绝缘膜102将腔体分成2个小室,如图1所示,当电压作用于电解液室,离子或其他小分子物质在电场力作用下穿过小孔,形成稳定的可检测的离子电流。掌握纳米孔的尺寸和表面特性、施加的电压及溶液条件,可检测不同类型的生物分子。
由于组成DNA的四种碱基腺嘌呤(A)、鸟嘌呤(G)、胞嘧啶(C)和胸腺嘧啶(T)的分子结构及体积大小均不同,单链DNA(ssDNA)在控速酶和电场驱使下通过纳米级的小孔时,不同碱基的化学性质差异导致穿越纳米孔或蛋白孔时引起的电流的变化幅度不同,从而得到所测核酸例如DNA的序列信息。
图2示出了DNA测序的示意图200。如图2所示,在一个典型的纳米孔/蛋白孔测序实验中,纳米孔是磷脂膜两侧离子通过的唯一通道。控速蛋白例如多核苷酸结合蛋白充当核酸分子例如DNA的马达蛋白,拉动DNA链使其以单个核苷酸的步长依次通过纳米孔/蛋白孔。每当一个核苷酸穿过纳米孔/蛋白孔,相应的堵孔信号会被记录下来(图3)。通过相应算法分析这些序列相关的电流信号,可以反推出核酸分子例如DNA的序列信息。
在实施例中,孔蛋白通过生物信息学手段和进化角度,从自然界不同物种(主要是细菌和古细菌)进行筛选。在一个实施例中,孔蛋白来自于任何生物体,优选来自于Gulbenkiania indica。通过序列分析,孔蛋白具有完整功能结构域。利用结构生物学手段预测分析孔蛋白3D结构模型,选择具有合适读取头架构形式的通道蛋白。之后利用基因工程、蛋白质工程、蛋白质定向进化和计算机辅助蛋白质设计等手段,对候选通道蛋白(或孔蛋白)进行改造、测试和优化,经过几轮迭代,得到同源蛋白突变体多个,优选两个(不同同源蛋白骨架),有不同的信号特征和信号分布模式。
实施例中的孔蛋白可应用于第四代测序技术。在一个实施例中,孔蛋白是纳米孔蛋白。在一个实施例中,孔蛋白可应用于固态孔进行测序。
在一个实施例中,采用新的蛋白骨架,形成新的缢缩区(读取头 区域)结构,从而在测序过程中提供全新的作用模式。实施例的孔蛋白具有良好的跳边分布和与磷脂膜重组的效率。
在一个实施例中,对野生型孔蛋白单体进行基因突变改造形成孔蛋白单体的突变体。在一个实施例中,孔蛋白单体的突变体的氨基酸包括SEQ ID NO:1所示的序列或包括与其具有至少99%、98%、97%、96%、95%、94%、93%、92%、91%、90%、85%、80%、75%、70%、65%、60%、55%、或50%同一性的序列,并且所述孔蛋白单体的突变体的氨基酸对应SEQ ID NO:1的K67、D71、S72、和Y74的一个或多个位置处具有突变。
在一个实施例中,突变包括氨基酸的插入、缺失和/或替换。在一个实施例中,SEQ ID NO:1的K67、D71、S72、和Y74的一个或多个位置处具有突变,是SEQ ID NO:1的K67、D71、S72、和Y74中的一个或多个位置处具有氨基酸插入、缺失和/或替换。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸对应SEQ ID NO:1的(1)62-209、(2)62-74、(3)62-75、(4)67-209、(5)67-75、或(6)67-74位的一个或多个位置处具有突变。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸对应SEQ ID NO:1的(1)62-209、(2)62-74、(3)62-75、(4)67-209、(5)67-75、或(6)67-74位的一个或多个位置处具有氨基酸的插入、缺失和/或替换。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸仅在对应SEQ ID NO:1的Q62、K67、D71、S72、和Y74位具有突变,或在一个或多个位置处具有氨基酸的插入、缺失和/或替换。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸仅在对应 SEQ ID NO:1的Q62、K67、D71、S72、Y74、E110、E119、E126、和K209位具有突变,或在一个或多个位置处具有氨基酸的插入、缺失和/或替换。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸仅在对应SEQ ID NO:1的K67、D71、S72、Y74、和S75位具有突变,或在一个或多个位置处具有氨基酸的插入、缺失和/或替换。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸仅在对应SEQ ID NO:1的K67、T69、A70、D71、S72、S73、和Y74位具有突变,或在一个或多个位置处具有氨基酸的插入、缺失和/或替换。
“一个或多个位置处”指1个、2个、3个、4个、5个、6个、7个、8个、9个、10个……或直至全部位置。例如,5个氨基酸的一个或多个位置处为1个、2个、3个、4个或5个位置处。
在一个实施例中,对应SEQ ID NO:1的位置是指无论是否通过氨基酸插入或缺失或采用同一性序列从而使得序列编号发生变化时,相对位置不变,依然可采用SEQ ID NO:1序列的编号。例如,对应SEQ ID NO:1的Q62可突变为Q62L,即使SEQ ID NO:1序列编号变化或采用与SEQ ID NO:1具有本文限定的同一性的序列,相对应于SEQ ID NO:1的62位的氨基酸Q(即使在另一序列中并非为62位)也可突变为L,仍在本发明的保护范围内。
在一个实施例中,孔蛋白单体的突变体的氨基酸由SEQ ID NO:1所示的序列组成,或由与其具有至少99%、98%、97%、96%、95%、94%、93%、92%、91%、90%、85%、80%、75%或70%、65%、60%、55%、或50%同一性的序列组成,并且所述孔蛋白单体的突变体的氨基酸对应SEQ ID NO:1的K67、D71、S72、和Y74位的一个或多个位置处具有突变。
在一个实施例中,孔蛋白单体的SEQ ID NO:1序列来自Gulbenkiania indica。编码SEQ ID NO:1氨基酸的核苷酸序列为SEQ ID NO:2。
在一个实施例中,对应SEQ ID NO:1的Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;D71突变为N、E、D、Q中的0至4种;S72突变为P中的0至1种;Y74突变为S、C、U、T、M中的0至5种。
在一个实施例中,对应SEQ ID NO:1的Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;D71突变为N、E、D、Q中的0至4种;S72突变为P中的0至1种;Y74突变为F、Y、W中0至3种;E110突变为N、D、E、Q中的0至4种;E119突变为N、D、E、Q中的0至4种;E126突变为N、D、E、Q中的0至4种;K209突变为P中的0至1种。
在一个实施例中,对应SEQ ID NO:1的K67突变为R、H、K中的0至3种;D71突变为G、A、V、L、I中的0至5种;S72突变为P中的0至1种;Y74突变为F、Y、W中0至3种;S75突变为C、U、S、T、M中的0至5种。
在一个实施例中,对应SEQ ID NO:1的K67突变为R、H、K中的0至3种;T69突变为S、C、T、U、M中的0至5种;A70突变为P中的0至1种;D71突变为G、A、V、L、I中的0至5种;S72突变为F、Y、W中0至3种;S73突变为G、A、V、L、I中的0至5种;Y74突变为F、Y、W中0至3种。
在一个实施例中,孔蛋白单体的突变体,其中氨基酸突变选自以下:
(a)对应SEQ ID NO:1的Q62L、K67R、D71N、S72P、和Y74T;
(b)对应SEQ ID NO:1的Q62L、K67R、D71N、S72P、Y74缺失、E110N、E119N、E126N、和K209P;
(c)对应SEQ ID NO:1的K67R、D71A、S72P、Y74缺失、和S75缺失;和
(d)对应SEQ ID NO:1的K67R、T69S、A70P、D71A、S72Y、S73A、和Y74缺失。
在一个实施例中,所述孔蛋白单体的突变体的氨基酸序列包括SEQ ID NO:16、SEQ ID NO:17、SEQ ID NO:18、或SEQ ID NO:19,或由其组成。
在一个实施例中,蛋白孔包括至少一个孔蛋白单体的突变体(或孔蛋白突变的单体)。在一个实施例中,蛋白孔包括至少两个、三个、四个、五个、六个、七个、八个、九个或十个或更多孔蛋白单体的突变体。在一个实施例中,蛋白孔包括至少两个孔蛋白单体的突变体,所述孔蛋白单体的突变体可以是相同的或不同的。在一个实施例中,蛋白孔包括两个或多个孔蛋白单体的突变体,优选为两个或多个单体的突变体相同。在一个实施例中,蛋白孔包括九个孔蛋白单体的突变体。在一个实施例中,蛋白孔的缢缩区孔道直径为0.7nm-2.2nm、0.9nm-1.6nm、1.4-1.6nm或
Figure PCTCN2021123209-appb-000003
孔蛋白单体的突变体或包括其的蛋白孔用于检测靶分析物存在、不存在或一个或多个特征中的应用。在一个实施例中,孔蛋白单体的突变体或蛋白孔用于检测核酸分子的序列,或表征多核苷酸序列,例如测序多核苷酸序列,因为它们可以高灵敏度区分不同的核苷酸。孔蛋白单体的突变体或包括其的蛋白孔可以区分DNA和RNA中的四种核苷酸,甚至可以区分甲基化和未甲基化的核苷酸,并且分辨率出人预料的高。孔蛋白单体的突变体或蛋白孔显示对全部四种DNA/RNA核苷酸的几乎完全分离。基于在蛋白孔中的停留时间和流过蛋白孔的 电流,进一步区分脱氧胞嘧啶单磷酸(dCMP)和甲基-dCMP。
孔蛋白单体的突变体或蛋白孔还可以在一系列条件下区分不同核苷酸。特别地,所述孔蛋白单体的突变体或蛋白孔在有利于核酸表征如测序的条件下区分核苷酸。通过改变施加的电势,盐浓度,缓冲液,温度和添加剂如脲,甜菜碱和DTT的存在,可以控制孔蛋白单体的突变体或蛋白孔区分不同核苷酸的程度。这允许孔蛋白单体的突变体或蛋白孔的功能被精细调控,特别是在测序时。孔蛋白单体的突变体或蛋白孔也可以用于通过与一种或多种单体的相互作用而不是在以核苷酸为基础的核苷酸上,来鉴定多核苷酸聚合物。
孔蛋白单体的突变体或蛋白孔可以是分离的,基本上分离的,纯化的或基本纯化的。如果完全不含任何其它组分,例如脂质体或其它蛋白孔/孔蛋白,则实施例的孔蛋白单体的突变体或蛋白孔被分离或纯化。如果孔蛋白单体的突变体或蛋白孔与不会干扰其预期用途的载体或稀释剂混合,则该孔蛋白单体的突变体或蛋白孔基本上被分离。例如,如果孔蛋白单体的突变体或蛋白孔以包含小于10%,小于5%,小于2%或小于1%的其它组分如三嵌段共聚物,脂质体或其它蛋白孔/孔蛋白的形式存在,则所述孔蛋白单体的突变体或蛋白孔基本上被分离或基本上被纯化。替代地,孔蛋白单体的突变体或蛋白孔可以存在于膜中。
例如,膜优选为两亲层。两亲层是由两亲分子形成的层,例如,磷脂,其具有亲水性和亲油性。两亲分子可以是合成的或天然存在的。两亲层可以是单层或双层。两亲层通常是平面的。两亲层可以是弯曲的。可以对两亲层进行支撑。膜可以为脂质双层。脂质双层是由脂质的两个相对的层形成的。脂质的两层被排列为使得它们的疏水性尾部基团面向彼此以形成疏水性内部。脂质的亲水性头部基团面向外朝向该双层的每一侧上的含水环境。膜包括固态层。固态层可以由有机和无机材料形成。如果膜包括固态层,则孔通常存在于两亲膜中或固态 层内包括的层中,例如,固态层内的孔洞、阱、间隙、通道、沟槽或狭缝中。
分析物的表征
实施例提供一种确定靶分析物的存在、不存在或一种或多种特性的方法。该方法涉及将所述靶分析物与孔蛋白单体的突变体或蛋白孔接触,使得所述靶分析物相对于,例如穿过,所述孔蛋白单体的突变体或蛋白孔移动,并且当所述靶分析物相对于所述孔蛋白单体的突变体或蛋白孔移动时获取一个或多个测量值,从而确定所述靶分析物的存在、不存在或一种或多种特性。所述靶分析物也可以被称为模板分析物或感兴趣的分析物。
靶分析物优选为多糖、金属离子,无机盐,聚合物,氨基酸,肽,多肽,蛋白,核苷酸,寡核苷酸,多核苷酸,染料,药物,诊断剂,爆炸物或环境污染物。所述方法可以涉及确定两种或更多种相同类别的靶分析物的存在、不存在或一种或多种特性,例如,两种以上蛋白,两种以上核苷酸或两种以上药物。或者,所述方法可以涉及确定两种或更多种不同类别的靶分析物的存在、不存在或一种或多种特性,例如,一种或多种蛋白,一种或多种核苷酸和一种或多种药物。
所述方法包括将所述靶分析物与孔蛋白单体的突变体或蛋白孔接触,使得所述靶分析物移动穿过所述孔蛋白单体的突变体或蛋白孔。所述蛋白孔一般包含至少1个,至少2个,至少3个,至少4个,至少5个,至少6个,至少7,至少8,至少9或至少10个孔蛋白突变的单体,例如,7,8,9或10个单体。所述蛋白孔包括相同的单体或不同的孔蛋白单体,优选包含8或9个相同的单体。所述单体中的一个或多个,例如2、3、4、5、6、7、8、9或10个,优选如上述讨论的被化学修饰。在一个实施例中,每个单体的氨基酸包括SEQ ID NO:1及其上述突变体。在一个实施例中,每个单体的氨基酸由SEQ ID NO:1及其上述突变体组成。
实施例的方法可以测量多核苷酸的两个、三个、四个或五个或更多个特征。所述一个或多个特征优选选自(i)多核苷酸的长度,(ii)多核苷酸的身份,(iii)多核苷酸的序列,(iv)多核苷酸的二级结构,以及(v)多核苷酸是否被修饰。在一个实施例中,可以测量(i)至(v)的任意组合。
对于(i),可以例如通过确定多核苷酸和蛋白单体的突变体/蛋白孔之间相互作用的数量或多核苷酸和蛋白单体的突变体/蛋白孔之间相互作用的持续时间对多核苷酸的长度进行测量。
对于(ii),可以以多种方式测量多核苷酸的身份,多核苷酸的身份可以结合多核苷酸序列的测量或不结合多核苷酸序列的测量进行测量。前者较为简单;对多核苷酸进行测序进而进行识别。后者可以通过几种不同方式完成。例如,可以测量多核苷酸中特定基序的存在(无需测量多核苷酸的其余序列)。或者,所述方法中特定的电和/或光信号的测量可以识别出所述多核苷酸来自特定来源。
对于(iii),多核苷酸的序列可以如先前所述进行测定。合适的测序方法,特别是使用电学测量方法的测序方法,描述在Stoddart D et al.,ProC Natl Acad Sci,12;106(19)7702-7,Lieberman KR et al,J Am Chem SoC.2010;132(50)17961-72,以及国际申请W02000/28312中。
对于(iv),二级结构可以采用多种方法测量。例如,如果所述方法涉及电学测量方法,则可以使用停留时间的变化或流过孔的电流的变化来测量所述二级结构。这允许区分单链和双链多核苷酸的区域。
对于(v),可以测量是否存在任何修饰。所述方法优选包括,测定多核苷酸是否通过甲基化,氧化,损伤,用一种或多种蛋白或用一种或多种标记,标签或进行无碱基或缺少核碱基和糖的修饰。特定的修 饰将导致与所述孔的特异性相互作用,其可以使用下述方法进行测量。例如,甲基胞嘧啶可以基于其与每个核苷酸相互作用期间流过所述孔的电流而与胞嘧啶区分开来。
所述靶多核苷酸与蛋白单体的突变体/蛋白孔接触,例如如实施例的蛋白单体的突变体/蛋白孔。所述蛋白单体的突变体/蛋白孔通常存在于膜中。合适的膜如前文所述。所述方法可以使用适合于研究膜/蛋白孔或孔蛋白单体的突变体系统—其中蛋白单体的突变体/蛋白孔存在于膜中的任何装置进行。所述方法可以使用适合用于跨膜孔感侧的任何装置进行。例如,所述装置包括包含水性溶液的腔室和将腔室分成两个部分的屏障。所述屏障通常具有孔洞,在孔洞中形成包含孔的膜。或者所述屏障形成膜,所述膜中存在蛋白单体的突变体/蛋白孔。该方法可以使用描述于国际申请号PCT/GB08/000562(WO 2008/102120)中的装置进行。
可以进行各种不同类型的测量。这包括但不限于电学测量和光学测量。电学测量包括电压测量、电容测量、电流测量,阻抗测量,隧道测量(tunnelling measurement)(Ivanov AP et al.,Nano Lett.2011Jan12;11(I):279-85)以及FET测量(国际申请WO 2005/124888)。光学测量可以与电学测量结合(Soni GV et al.,Rev Sci Instrum.2010Jan;81(1)014301)。所述测量可以是跨膜电流测量,例如流过所述孔的离子电流的测量。在一个实施例中,电学测量或光学测量可采用常规的电学或光学测量。
电学测量可以使用描述在Stoddart D et al·,ProC Natl Acad Sci,12;106(19)7702-7,Lieberman KR et al,J Am Chem SoC.2010;132(50)17961-72和国际申请WO 2000/28312中的标准单通道记录设备进行。替代地,电学测量可以使用多通道系统进行,例如如国际申请W02009/077734和国际申请WO 2011/067559中描述的。
所述方法优选采用跨膜施加的电势进行。所述施加的电势可以是电压电势。替代地,所施加的电势可以是化学电势。其一实例为采用跨膜,例如双亲性分子层的盐梯度进行。盐梯度被公开在Holden et al.,J Am Chem SoC.2007Jul 11;129(27):8650-5中。在一些情况下,多核苷酸相对所述蛋白单体的突变体/蛋白孔移动时流过所述蛋白单体的突变体/蛋白孔的电流用于估算或确定所述多核苷酸的序列。这就是链测序。
所述方法可以包括测量多核苷酸相对所述孔移动时流过所述孔的电流。因此用于所述方法的设备也可以包括能够施加电势并测量穿过膜和孔的电信号的电路。所述方法可以采用膜片钳或电压钳进行。
可以包括测量多核苷酸相对所述孔移动时流过所述孔的电流。测量通过跨膜蛋白孔的离子流的合适条件是本领域已知的并且在实施例中公开。所述方法通常通过施加在所述膜和所述孔上的电压进行。所使用的电压通常为从+5V至-5V,例如从从+4V至-4V,从+3V至-3V或从+2V至-2V。所使用的电压通常为从-600mV至+600V或-400mV至+400mV。所使用的电压优选在具有选自-400mV,-300mV,-200mV,-150mV,-100mV,-50mV,-20mV和0mV的下限和独立地选自+10mV,+20mV,+50mV,+100mV,+150mV,+200mV,+300nA^P+400mV的上限的范围内。所使用的电压更优选在100mV至240mV的范围内并且最优选在120mV至220mV的范围内。通过使用增加的施加电势,可以增加孔对不同核苷酸的识别。
所述方法通常在存在任何电荷载体的情况下进行,例如金属盐例如碱金属盐,卤化物盐例如氯化物盐,例如碱金属氯化物盐。电荷载体可以包括离子液体或有机盐,例如四甲基氯化铵,三甲基苯基氯化铵,苯基三甲基氯化铵或1-乙基-3-甲基咪唑鑰氯化物。在上述示例性装置中,盐存在于所述腔室中的水性溶液中。通常使用氯化钾(KCl),氯化钠(NaCl),氯化铯(CsCl)或亚铁氰化钾和铁氰化钾的混合物。KCl,NaCl和亚铁氰化钾和铁氰化钾的混合物是优选的。电荷载体在所述膜 上可以是不对称的。例如,电荷载体的类型和/或浓度可以在所述膜的每一侧上不同。
所述盐的浓度可以是饱和的。所述盐的浓度可以为3M或更低,并且通常为0.1至2.5M,0.3至1.9M,0.5至1.8M,0.7至1.7M,0.9至1.6M或1M至1.4M。所述盐的浓度优选为150mM至1M。所述方法优选使用至少0.3M,例如至少0.4M,至少0.5M,至少0.6M,至少0.8M,至少1.0M,至少1.5M,至少2.0M,至少2.5M或至少3.0M的盐浓度进行。高盐浓度提供高的信噪比,并允许通过电流指示在正常电流波动背景下待识别的核苷酸的存在。
所述方法通常在存在缓冲液的情况下进行。在上述示例性装置中,所述缓冲液存在于所述腔室中的水性溶液中。任意缓冲液可以用于本发明的方法。通常地,所述缓冲液为磷酸缓冲液。其他合适的缓冲液为HEPES或Tris-HCl缓冲液。所述方法通常在pH为4.0至12.0、4.5至10.0、5.0至9.0、5.5至8.8、6.0至8.7、7.0至8.8、或7.5至8.5下进行。使用的pH值优选约7.5。
所述方法可以在0℃至100℃,15℃至95℃,16℃至90℃,17℃至85℃,18℃至80℃,19℃至70℃或20℃至60℃温度下进行。所述方法通常在室温下进行。所述方法任选的在支持酶功能的温度下进行,例如约37℃。
在一个实施例中,用于确定靶分析物(例如多核苷酸)存在、不存在或一个或多个特征的方法包括将所述靶分析物偶联到膜上;和所述靶分析物与存在于所述膜中的所述蛋白孔相互作用(例如接触)从而使得所述靶分析物相对所述蛋白孔移动(例如穿过所述蛋白孔)。在一个实施例中,测量所述靶分析物相对于所述蛋白孔移动时通过所述蛋白孔的电流,从而确定所述靶分析物的存在、不存在或一个或多个特征(例如为多核苷酸的序列)。
控速蛋白
控速蛋白为可以控制靶分析物(例如多核苷酸)相对于蛋白孔移动速度(例如减慢其移动速度)从而使得此速度能够实现对靶分析物的存在、不存在或一个或多个特征进行检测(例如多核苷酸的测序)的蛋白。蛋白孔与控速蛋白配合使用用于表征靶分析物。在一个实施例中,控速蛋白减慢多核苷酸穿过蛋白孔的速度从而实现测序。控速蛋白包括下文介绍的多核苷酸结合蛋白。
多核苷酸结合蛋白
实施例的表征方法优选包括使多核苷酸与多核苷酸结合蛋白接触,使得所述蛋白控制多核苷酸相对于蛋白单体的突变体/蛋白孔的移动,例如,通过所述蛋白单体的突变体/蛋白孔。
更优选地,所述方法包括(a)使多核苷酸与蛋白单体的突变体/蛋白孔和多核苷酸结合蛋白接触,使得所述蛋白控制多核苷酸相对于蛋白单体的突变体/蛋白孔的移动,例如,通过蛋白单体的突变体/蛋白孔,和(b)当多核苷酸相对于蛋白单体的突变体/蛋白孔移动时获取一个或多个测量值,其中,所述测量值指示多核苷酸的一个或多个特征,从而表征多核苷酸。
更优选地,所述方法包括(a)使多核苷酸与蛋白单体的突变体/蛋白孔和多核苷酸结合蛋白接触,使得所述蛋白控制多核苷酸相对于蛋白单体的突变体/蛋白孔的移动,例如,通过蛋白单体的突变体/蛋白孔,和(b)当多核苷酸相对于蛋白单体的突变体/蛋白孔移动时测量通过蛋白单体的突变体/蛋白孔的电流,其中,所述电流指示多核苷酸的一个或多个特征,从而表征多核苷酸。
多核苷酸结合蛋白可以是能够结合多核苷酸并控制其移动通过孔的任何蛋白。多核苷酸结合蛋白通常与多核苷酸相互作用并改性多核 苷酸的至少一种性质。蛋白可以通过裂解多核苷酸以形成各单个核苷酸或核苷酸的短链(例如,二核苷酸或三核苷酸)来对其进行改性。蛋白可以通过使多核苷酸定向或将其移动到特定位置来对其进行改性,即,控制它的移动。
多核苷酸结合蛋白优选衍生自多核苷酸处理酶。多核苷酸处理酶是能够与多核苷酸相互作用并改性多核苷酸的至少一种性质的多肽。所述酶可以通过裂解多核苷酸以形成各单个核苷酸或核苷酸的短链(例如,二核苷酸或三核苷酸)来对其进行改性。所述酶可以通过使多核苷酸定向或将其移动到特定位置来对其进行改性。多核苷酸处理酶不需要显示酶活性,只要其能够结合多核苷酸并控制其通过孔的移动即可。例如,可以对所述酶进行改性以去除其酶活性,或者可以在防止其用作酶的条件下进行使用。
多核苷酸处理酶优选为聚合酶、外切核酸酶、解旋酶和拓扑异构酶,例如,促旋酶。在一个实施例中,所述酶优选为解旋酶,例如Hel308Mbu、Hel308Csy、Hel308Tga、Hel308Mhu、Tral Eco、XPD Mbu、Dda或其变体。实施例中可以使用任何解旋酶。
在一个实施例中,可以使用任何数量的解旋酶。例如,可以使用I,2,3,4,5,6,7,8,9,10或更多个解旋酶。在一些实施例中,可以使用不同数目的解旋酶。
实施例的方法优选包括使多核苷酸与两个或更多个解旋酶接触。所述两个或更多个解旋酶通常是相同的解旋酶。所述两个或更多个解旋酶可以是不同的解旋酶。
所述两个或更多个解旋酶可以是上述解旋酶的任意组合。所述两个或更多个解旋酶可以是两个或更多个Dda解旋酶。所述两个或更多个解旋酶可以是一种或多种Dda解旋酶和一种或多种TrwC解旋酶。 所述两个或更多个解旋酶可以是相同解旋酶的不同变体。
所述两个或更多个解旋酶优选地彼此连接。所述两个或更多个解旋酶更优选地彼此共价连接。解旋酶可以以任何顺序并使用任何方法连接。
试剂盒
本发明还提供一种用于表征靶分析物(例如靶多核苷酸)的试剂盒。试剂盒包含实施例的孔和膜的组分。膜优选地由组分形成。孔优选地存在于膜中。试剂盒可包含上文所公开的任一个膜(如两亲层或三嵌段共聚物膜)的组分。试剂盒可进一步包含多核苷酸结合蛋白。可使用上文所论述的任一个多核苷酸结合蛋白。
在一个实施例中,膜为两亲层、固态层、或脂双层。
试剂盒可进一步包含用于使多核苷酸与膜偶联的一或多个锚。
试剂盒优选地是用于表征双链多核苷酸,并优选地包含Y衔接子和发夹环衔接子。
Y衔接子优选地具有所连接的一或多个解螺旋酶,且发夹环衔接子优选地具有所连接的一或多个分子制动器。Y衔接子优选地包含用于使多核苷酸与膜偶联的一或多个第一锚,发夹环衔接子优选地包含用于使多核苷酸与膜偶联的一或多个第二锚,且发夹环衔接子与膜偶联的强度优选地大于Y衔接子与膜偶联的强度。
试剂盒可另外包含使得能够进行上文提到的任一个实施例的一或多个其它试剂或仪器。此类试剂或仪器包括以下中的一或多个:合适缓冲液(水性溶液)、从个体获得样本的装置(如包含针的容器或仪器)、用于扩增和/或表达多核苷酸的装置,或电压或贴片钳设备。试剂可以 干态形式存在于试剂盒中,使得流体样本再悬浮试剂。试剂盒还可任选地包含使得能够用本发明的方法使用试剂盒的说明书或关于何种生物体可使用所述方法的详情。
设备(或装置)
本发明还提供了一种用于表征靶分析物(例如,靶多核苷酸)的设备。所述设备包括单个或多个蛋白单体的突变体/蛋白孔、和单个或多个膜。所述蛋白单体的突变体/蛋白孔优选存在于所述膜中。孔和膜的数量优选相等。优选地,每个膜中存在单个孔。
所述设备优选地还包括用于实施实施例中方法的指令。所述设备可以是任一用于分析物分析的常规设备,例如,阵列或芯片。结合实施例的所述方法所讨论的任一实施例同样适用于所述设备。所述设备还可以包括本述试剂盒中存在的任何特征。用于实施例的设备具体可为齐碳科技基因测序仪QNome-9604。
上述提及的现有技术以全文引用的方式并入本文。
以下实施例用以阐述本发明,但不具有限制作用。
实施例1
在实施例中,野生型孔蛋白来自Gulbenkiania indica,并且该野生型孔蛋白的氨基酸序列是SEQ ID NO:1,编码此氨基酸序列的核苷酸序列由SEQ ID NO:2所示。孔蛋白单体的突变体1是野生型孔蛋白在对应SEQ ID NO:1的多处具有突变,具体为Q62L、K67R、D71N、S72P、和Y74T。包括孔蛋白单体的突变体1的蛋白孔为突变孔1。蛋白单体的突变体1的氨基酸序列如SEQ ID NO:16所示。
实施例2
在实施例中,野生型孔蛋白来自Gulbenkiania indica,并且该野生 型孔蛋白的氨基酸序列是SEQ ID NO:1,编码此氨基酸的序列的核苷酸序列由SEQ ID NO:2所示。孔蛋白单体的突变体2是野生型孔蛋白在对应SEQ ID NO:1的多处具有突变,具体为Q62L、K67R、D71N、S72P、Y74缺失、E110N、E119N、E126N、和K209P。包括孔蛋白单体的突变体2的蛋白孔为突变孔2。蛋白单体的突变体2的氨基酸序列如SEQ ID NO:17所示。
实施例3
在实施例中,野生型孔蛋白来自Gulbenkiania indica,并且该野生型孔蛋白的氨基酸序列是SEQ ID NO:1,编码此氨基酸的序列的核苷酸序列由SEQ ID NO:2所示。孔蛋白单体的突变体3是野生型孔蛋白在对应SEQ ID NO:1的多处具有突变,具体为K67R、D71A、S72P、Y74缺失、和S75缺失。包括孔蛋白单体的突变体3的蛋白孔为突变孔3。蛋白单体的突变体3的氨基酸序列如SEQ ID NO:18所示。
实施例4
在实施例中,野生型孔蛋白来自Gulbenkiania indica,并且该野生型孔蛋白的氨基酸序列是SEQ ID NO:1,编码此氨基酸的序列的核苷酸序列由SEQ ID NO:2所示。孔蛋白单体的突变体4是野生型孔蛋白在对应SEQ ID NO:1的多处具有突变,具体为K67R、T69S、A70P、D71A、S72Y、S73A、和Y74缺失。包括孔蛋白单体的突变体4的蛋白孔为突变孔4。蛋白单体的突变体4的氨基酸序列如SEQ ID NO:19所示。
实施例5
采用SWISS MODEL对野生型孔蛋白进行同源建模,野生型孔蛋白单体的氨基酸由SEQ ID NO:1所示。图4A是预测蛋白结构模型的侧视图400,其中颜色较深的部分显示的为一个蛋白单体402。图4B是表面结构模型俯视图404,其中颜色较深的部分显示的为一个蛋白单体406。图4C为飘带结构模型图408,颜色较深部分为蛋白单体410。
图5示出了野生型通道缢缩区氨基酸残基分布和缢缩区直径。两个孔蛋白单体502和504中间的缢缩区孔道直径最大为
Figure PCTCN2021123209-appb-000004
其次为
Figure PCTCN2021123209-appb-000005
最小直径为
Figure PCTCN2021123209-appb-000006
中间显示的是缢缩区结构的氨基酸组成即T69、S73和Y74。
图6A显示了野生型通道单体表面电势图,其中颜色深浅代表电性强弱。图6B显示了单体飘带模型及其缢缩区氨基酸残基分布的棍棒模型,放大显示缢缩区loop氨基酸组成及其编号,其中部分602是指向蛋白孔道中心区域的氨基酸残基。
采用SWISS MODEL对突变孔1进行同源建模。图7显示的是突变孔1缢缩区氨基酸残基分布特征和缢缩区直径。棍棒模型显示了突变体孔道狭窄区域的关键氨基酸残基分布,突变结构降低了缢缩区的厚度,指向孔道中心的氨基酸残基为70位的苏氨酸,74位的丝氨酸,75位的苏氨酸。65-79位的氨基酸残基形成的氢键相互作用力可能与通道复合物的正确组装密切相关。两个孔蛋白单体702和704中间的缢缩区孔道最狭窄区域直径约为
Figure PCTCN2021123209-appb-000007
最宽区域直径约为
Figure PCTCN2021123209-appb-000008
中间直径约为
Figure PCTCN2021123209-appb-000009
图8显示了突变孔1基于同源建模的卡通示意图,区域1对应于冠状体形成区域,区域2对应于收缩和环形(constriction and loops)区域,区域3对应于跨膜β桶状体区域。
图9示出了突变孔1负染电镜照片结果,箭头指示的是目标蛋白颗粒。图10示出了突变孔1负染电镜二维分类结果,箭头指示的一类可看出突变孔1的寡聚状态为9聚体。
实施例6-制备DNA构建体
制备DNA构建体BS7-4C3-PLT。BS7-4C3-PLT的结构如图11所示,序列信息如下所示:
a:30*C3
b:5’-TTTTT TTTTT-3’(即SEQ ID NO:3)
c:控速蛋白
d:4*C18
e:5’-AATGT ACTTC GTTCA GTTAC GTATT GCT-3’(即SEQ ID NO:4)
f:5’P-GC AATAC GTAAC TGAAC GAAGT TCACTATCGCATTCTCATGA-3’(即SEQ ID NO:5)
g:胆固醇标签
h:5'-TCATG AGAAT GCGAT AGTGA-3’(即SEQ ID NO:6)
i:5’-AAAAAAAAAAAAAAAAAAAAAAAAAAAA(即SEQ ID NO:7)/dSpacer/AAAAAAAAAAAA(即SEQ ID NO:8)/dSpacer/AAAAAAAAAAAAAATCTCTGAATCTCTGAATCTCTGAATCTCTAAAAAAAAAAAAGAAAAAAAAAAAACAAAAAAAAAAAATAAAAAAAAAAAAAGCAATACGTAACTGAACGAAGTACATTAAAAAAAAAA(即SEQ ID NO:9)-3’
j:5’-ATCCTTTTTTTTTTAATGTACTTCGTTCAGTTACGTATTGCT-3’(即SEQ ID NO:10)
k:5’P-TTTTTTTTTTTTATTTTTTTTTTTTGTTTTTTTTTTTTCTTTTTTTTTTTTAGAGATTCAGAGATTCAGAGATTCAGAGATTTTTTTTTTTTTT(即SEQ ID NO:11)/dSpacer/TTTTTTTTTTTT(即SEQ ID NO:12)/iSpC3/TTTTTTTTTTTTTTTTTTTTTTTTTTTT(即SEQ ID NO:13)-3’
C3、C18、dSpacer及iSpC3是指示孔测序分辨率特征而引入的标记(marker)序列。
在本实施例中,图11中的c控速蛋白为解旋酶
Mph-MP1-E105C/A362C(具有突变E105C/A362C),氨基酸序列为SEQ ID NO:14,核酸序列为SEQ ID NO:15。
实施例7
突变孔1作为蛋白孔,采用单孔测序的技术方法进行检测。在将氨基酸序列为突变体1的单个孔蛋白插入磷脂双分子层之后,使缓冲液(625mM KCl,10mM HEPES pH 8.0,50mM MgCl 2)流经该系统,以除去任何过量的突变体1纳米孔。将DNA构建体BS7-4C3-PLT(1~2nM终浓度加入所述突变体1纳米孔实验系统中,混匀后,使缓冲液(625mM KCl,10mM HEPES pH 8.0,50mM MgCl 2)流经该系统,以除去任何过量的DNA构建体BS7-4C3-PLT。然后将解旋酶(Mph-MP1-E105C/A362C,15nM终浓度)、燃料(ATP 3mM终浓度)预混物加入所述单个突变体1纳米孔实验系统中,并在+180mV电压下监测突变体1孔蛋白的测序情况。
突变孔1在±180mV电压下开孔。图12A显示突变孔1在±180mV电压下开孔电流及其门控特征。图12B显示突变孔1在+180mV电压下的单链核酸过孔情况。核酸可以过孔。加入单链核酸后,向下的线显示的核酸过孔信号。
采用单孔测序技术方法,通过突变孔1对DNA构建体BS7-4C3-PLT进行测序,完成嵌孔后添加测序体系出现的核酸测序信号。图13A和图13B示出了当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔1移位时的示例电流轨迹。根据该信号特征可以得出突变孔1对核酸测序具备高分辨率潜力。
图14是将图13A的部分显示出电流轨迹的放大结果。具有虚线框和箭头的图(中间图)为原始信号滤波处理后的结果(两条轨迹的y轴坐标=电流(pA),x轴坐标=时间(s))。虚线箭头指示部分显示了电流轨迹的放大结果。图15示出了当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔1移位时的芯片测试电流轨迹。这些进一步表明突变孔1对核酸测序具备高分辨率。
实施例8
与实施例7类似,实施例8采用突变孔2进行空测和过孔检测。
图16A显示突变孔2在±180mV电压下开孔电流及其门控特征。图16B显示突变孔2在+180mV电压下的单链核酸过孔情况。核酸可以过孔。加入单链核酸后,向下的线显示的核酸过孔信号。
采用单孔测序技术方法,通过突变孔2对DNA构建体BS7-4C3-PLT进行测序,完成嵌孔后添加测序体系出现的核酸测序信号。图17A和17B示出了当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔2移位时的示例电流轨迹。根据该信号特征,突变孔2可以用来核酸测序。
图18显示出部分电流轨迹的放大结果。具有虚线框和箭头的图为原始信号滤波处理后的结果(两条轨迹的y轴坐标=电流(pA),x轴坐标=时间(s))。虚线箭头指示部分显示了电流轨迹的放大结果。图19显示当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔2移位时的芯片测试电流轨迹。这些表明该突变孔2可以用来核酸测序。
实施例9
与实施例7类似,实施例9采用突变孔3进行空测和过孔检测。
图20A显示突变孔3在±180mV电压下开孔电流及其门控特征。图20B显示突变孔3在+180mV电压下的单链核酸过孔情况。核酸可以过孔。加入单链核酸后,向下的线显示的核酸过孔信号。
采用单孔测序技术方法,通过突变孔3对DNA构建体BS7-4C3-PLT进行测序,完成嵌孔后添加测序体系出现的核酸测序信 号。图21A和21B示出了当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔3移位时的示例电流轨迹。根据该信号特征可以得出突变孔3核酸测序具备高分辨率潜力。
图22显示出部分电流轨迹的放大结果。具有虚线框和箭头的图为原始信号滤波处理后的结果(两条轨迹的y轴坐标=电流(pA),x轴坐标=时间(s))。虚线箭头指示部分显示了电流轨迹的放大结果。此单独一条信号的区域放大显示图,进一步表明突变孔3对核酸测序具备高分辨率。
实施例10
与实施例7类似,实施例10采用突变孔4进行空测和过孔检测。
图23A显示突变孔4在±180mV电压下开孔电流及其门控特征。图23B显示突变孔4在+180mV电压下的单链核酸过孔情况。核酸可以过孔。加入单链核酸后,向下的线显示的核酸过孔信号。
采用单孔测序技术方法,通过突变孔4对DNA构建体BS7-4C3-PLT进行测序,完成嵌孔后添加测序体系出现的核酸测序信号。图24A和24B示出了当解旋酶Mph-MP1-E105C/A362C控制DNA构建体BS7-4C3-PLT穿过突变孔4移位时的示例电流轨迹。根据该信号特征,突变孔4可以用来核酸测序。
图25显示出部分电流轨迹的放大结果。具有虚线框和箭头的图为原始信号滤波处理后的结果(两条轨迹的y轴坐标=电流(pA),x轴坐标=时间(s))。虚线箭头指示部分显示了电流轨迹的放大结果。此单独一条信号的区域放大显示图,进一步证明突变孔4可以用来核酸测序。
实施例11
将含有孔蛋白单体的突变体1核酸序列(其对应氨基酸序列如 SEQ ID NO:16)的重组质粒通过热击法转化到BL21(DE3)感受态细胞,加入0.5ml LB培养基经30℃培养1h后取适量菌液涂布于氨苄抗性固体LB平板,37℃过夜培养,次日挑取单克隆菌落,接种至50ml含有氨苄抗性的液体LB培养基中37℃培养过夜。按1%的接种量转接至氨苄抗性的TB液体培养基中进行扩大培养,37℃、220rpm条件下培养,并连续不断的测量其OD600值。当OD600=2.0-2.2时,将TB培养基中的培养液冷却至16℃,并添加异丙基硫代半乳糖苷(Isopropylβ-D-Thiogalactoside,IPTG)诱导表达,使得终浓度达到0.015mM。诱导表达20-24h后,离心收集菌体。菌体用破碎缓冲液重悬后高压破碎,通过Ni-NTA亲和层析方法进行纯化,收集目的洗脱样品。孔蛋白单体的突变体2-4按如上方法纯化得到。
示例性的,图26示出了突变体1的蛋白纯化结果,1-6泳道显示的是分离的不同组分的SDS-PAGE电泳检测结果。图27示出了突变体1的蛋白的分子筛纯化结果,箭头指示位置为目标蛋白峰。

Claims (22)

  1. 一种孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸包括SEQ ID NO:1所示的序列或与其具有至少99%、98%、97%、96%、95%、90%、80%、70%、60%或50%同一性的序列,并且所述孔蛋白单体的突变体的氨基酸包括在对应SEQ ID NO:1的K67、D71、S72、和Y74的一个或多个位置处的突变。
  2. 根据权利要求1所述的孔蛋白单体的突变体,所述孔蛋白单体的突变体的氨基酸包括在对应SEQ ID NO:1的62-209、62-74、62-75、65-79、67-209、67-75、或67-74的一个或多个位置处的突变。
  3. 根据权利要求1或2所述的孔蛋白单体的突变体,所述孔蛋白单体的突变体的氨基酸包括:
    (1)对应SEQ ID NO:1的Q62、K67、D71、S72、和Y74的一个或多个位置处具有氨基酸的插入、缺失和/或替换;(2)对应SEQ ID NO:1的Q62、K67、D71、S72、Y74、E110、E119、E126、和K209的一个或多个位置处具有氨基酸的插入、缺失和/或替换;(3)对应SEQ ID NO:1的K67、D71、S72、Y74、和S75的一个或多个位置处具有氨基酸的插入、缺失和/或替换;或者(4)对应SEQ ID NO:1的K67、T69、A70、D71、S72、S73、和Y74的一个或多个位置处具有氨基酸的插入、缺失和/或替换。
  4. 根据前述权利要求任一项所述的孔蛋白单体的突变体,其中所述SEQ ID NO:1所示的序列来源于Gulbenkiania indica。
  5. 根据前述权利要求任一项所述的孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸突变选自以下:
    (a)对应SEQ ID NO:1的Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;D71突变为N、E、D、Q 中的0至4种;S72突变为P中的0至1种;Y74突变为S、C、U、T、M中的0至5种;
    (b)对应SEQ ID NO:1的Q62突变为G、A、V、L、I中的0至5种;K67突变为R、H、K中的0至3种;D71突变为N、E、D、Q中的0至4种;S72突变为P中的0至1种;Y74突变为F、Y、W中0至3种;E110突变为N、D、E、Q中的0至4种;E119突变为N、D、E、Q中的0至4种;E126突变为N、D、E、Q中的0至4种;K209突变为P中的0至1种;
    (c)对应SEQ ID NO:1的K67突变为R、H、K中的0至3种;D71突变为G、A、V、L、I中的0至5种;S72突变为P中的0至1种;Y74突变为F、Y、W中0至3种;S75突变为C、U、S、T、M中的0至5种;和
    (d)对应SEQ ID NO:1的K67突变为R、H、K中的0至3种;T69突变为S、C、T、U、M中的0至5种;A70突变为P中的0至1种;D71突变为G、A、V、L、I中的0至5种;S72突变为F、Y、W中0至3种;S73突变为G、A、V、L、I中的0至5种;Y74突变为F、Y、W中0至3种。
  6. 根据前述权利要求任一项所述的孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸突变选自以下:
    (a)对应SEQ ID NO:1的Q62L、K67R、D71N、S72P、和Y74T;
    (b)对应SEQ ID NO:1的Q62L、K67R、D71N、S72P、Y74缺失、E110N、E119N、E126N、和K209P;
    (c)对应SEQ ID NO:1的K67R、D71A、S72P、Y74缺失、和S75缺失;和
    (d)对应SEQ ID NO:1的K67R、T69S、A70P、D71A、S72Y、S73A、和Y74缺失。
  7. 根据前述权利要求任一项所述的孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸序列包括SEQ ID NO:16、SEQ ID  NO:17、SEQ ID NO:18、或SEQ ID NO:19,或由其组成。
  8. 一种孔蛋白单体的突变体,其中所述孔蛋白单体的突变体的氨基酸包括SEQ ID NO:1所示的序列或与其具有至少99%、98%、97%、96%、95%、90%、80%、70%、60%或50%同一性的序列,并且所述孔蛋白单体的突变体包括:
    (1)在对应SEQ ID NO:1的Q62、K67、T69、A70、D71、S72、S73、Y74、S75、E110、E119、E126、和K209的一个或多个位置处具有突变;
    (2)在对应SEQ ID NO:1的Q62L、K67 R、T69S、A70P、D71N/D71A、S72P/S72Y、S73A、Y74T/Y74缺失、S75缺失、E110N、E119N、E126 N、和K209P的一个或多个位置处具有突变;
    (3)在对应SEQ ID NO:1的K67、D71、S72、和/或Y74处具有突变,并额外在Q62、T69、A70、S73、S75、E110、E119、E126、和K209的至少一个位置处具有突变;
    (4)在对应SEQ ID NO:1的K67R、D71N/D71A、S72P/S72Y、和/或Y74T/Y74缺失处具有突变;或者
    (5)在对应SEQ ID NO:1的K67R、D71N/D71A、S72P/S72Y、和/或Y74T/Y74缺失处具有突变,并额外在Q62L、T69S、A70P、S73A、S75缺失、E110N、E119N、E126N、和K209P的至少一个位置处具有突变。
  9. 一种蛋白孔,包括至少一个前述权利要求任一项所述的孔蛋白单体的突变体。
  10. 根据权利要求9所述的蛋白孔,其中所述蛋白孔包括至少两个所述的孔蛋白单体的突变体。
  11. 根据权利要求9-10任一项所述的蛋白孔,其中所述蛋白孔的缢缩区孔道直径为0.7nm-2.2nm、0.9nm-1.6nm、1.4nm-1.6nm或
    Figure PCTCN2021123209-appb-100001
  12. 一种用于表征靶分析物的复合物,其特征在于:包括权利要求9-11任一项所述的蛋白孔及与其配合使用的控速蛋白。
  13. 一种核酸,其编码权利要求1-8任一项所述的孔蛋白单体的突变体、权利要求9-11任一项所述的蛋白孔、或权利要求12所述的复合物。
  14. 根据权利要求13所述的核酸,其中孔蛋白单体的核苷酸序列为SEQ ID NO:2所示的序列。
  15. 包含权利要求13-14任一项所述的核酸的载体或遗传工程化的宿主细胞。
  16. 权利要求1-8任一项所述的孔蛋白单体的突变体、权利要求9-11任一项所述的蛋白孔、权利要求12所述复合物、权利要求13-14任一所述核酸、或权利要求15所述载体或宿主细胞在检测靶分析物存在、不存在或一个或多个特征或制备检测靶分析物存在、不存在或一个或多个特征的产品中的应用。
  17. 一种产生蛋白孔或其多肽的方法,包括用包含权利要求15所述的载体转化宿主细胞,诱导所述宿主细胞表达权利要求9-11任一所述的蛋白孔或其多肽。
  18. 一种用于确定靶分析物存在、不存在或一个或多个特征的方法,包括:
    a.使靶分析物与权利要求9-11任一项所述的蛋白孔、权利要求12所述复合物、或权利要求12所述复合物中的所述蛋白孔接触,使得所述靶分析物相对于所述蛋白孔移动;以及
    b.在所述靶分析物相对于所述蛋白孔移动时获取一个或多个测量值,从而确定所述靶分析物的存在、不存在或一个或多个特征。
  19. 根据权利要求18所述的方法,其中所述方法包括:
    所述靶分析物与存在于膜中的所述蛋白孔相互作用从而使得所述靶分析物相对所述蛋白孔移动。
  20. 一种用于确定靶分析物存在、不存在或一个或多个特征的试剂盒,包括权利要求1-8任一项所述的孔蛋白单体的突变体、权利要求9-11任一项所述的蛋白孔、权利要求12所述的复合物、权利要求13-14任一项所述的核酸、或权利要求15所述的载体或宿主,和权利要求19中限定的膜的组分。
  21. 一种用于确定靶分析物存在、不存在或一个或多个特征的装置,包括权利要求9-11任一项所述的蛋白孔或权利要求12所述复合物,和权利要求19中限定的膜。
  22. 根据权利要求16-21中任一所述的应用、方法、试剂盒或装置,其中所述靶分析物包括多糖、金属离子、无机盐、聚合物、氨基酸、肽、蛋白、核苷酸、寡核苷酸、多核苷酸、染料、药物、诊断剂、爆炸物或环境污染物;
    优选地,所述靶分析物包括多核苷酸,
    更优选地,所述多核苷酸包括DNA或RNA;和/或,所述一个或多个特征选自(i)所述多核苷酸的长度;(ii)所述多核苷酸的一致性;(iii)所述多核苷酸的序列;(iv)所述多核苷酸的二级结构和(v)所述多核苷酸是否经修饰;和/或,所述复合物中所述控速蛋白包括多核苷酸结合蛋白。
PCT/CN2021/123209 2021-10-12 2021-10-12 孔蛋白单体的突变体、蛋白孔及其应用 WO2023060419A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/123209 WO2023060419A1 (zh) 2021-10-12 2021-10-12 孔蛋白单体的突变体、蛋白孔及其应用

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/123209 WO2023060419A1 (zh) 2021-10-12 2021-10-12 孔蛋白单体的突变体、蛋白孔及其应用

Publications (1)

Publication Number Publication Date
WO2023060419A1 true WO2023060419A1 (zh) 2023-04-20

Family

ID=85988108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/123209 WO2023060419A1 (zh) 2021-10-12 2021-10-12 孔蛋白单体的突变体、蛋白孔及其应用

Country Status (1)

Country Link
WO (1) WO2023060419A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108779170A (zh) * 2016-03-02 2018-11-09 牛津纳米孔技术公司 突变孔
CN110621692A (zh) * 2017-05-04 2019-12-27 牛津纳米孔技术公司 由两个CsgG孔组成的跨膜孔
US20210147486A1 (en) * 2017-06-30 2021-05-20 Oxford Nanopore Technologies Limited Novel protein pores
CN113480620A (zh) * 2021-08-18 2021-10-08 成都齐碳科技有限公司 孔蛋白单体的突变体、蛋白孔及其应用

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108779170A (zh) * 2016-03-02 2018-11-09 牛津纳米孔技术公司 突变孔
CN110621692A (zh) * 2017-05-04 2019-12-27 牛津纳米孔技术公司 由两个CsgG孔组成的跨膜孔
US20210147486A1 (en) * 2017-06-30 2021-05-20 Oxford Nanopore Technologies Limited Novel protein pores
CN113480620A (zh) * 2021-08-18 2021-10-08 成都齐碳科技有限公司 孔蛋白单体的突变体、蛋白孔及其应用

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG SHAOYING, ZHAO ZHENGYI, HAQUE FARZIN, GUO PEIXUAN: "Engineering of protein nanopores for sequencing, chemical or protein sensing and disease diagnosis", CURRENT OPINION IN BIOTECHNOLOGY, LONDON, GB, vol. 51, 1 June 2018 (2018-06-01), GB , pages 80 - 89, XP055960007, ISSN: 0958-1669, DOI: 10.1016/j.copbio.2017.11.006 *

Similar Documents

Publication Publication Date Title
CN113480620B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
CN113754743B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
CN113896776B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
CN113773373B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
CN113912683B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
KR102222191B1 (ko) 돌연변이체 기공
JP6228128B2 (ja) 酵素法
CN107002151B (zh) 向跨膜孔输送分析物的方法
CN106459159B (zh) 突变孔
US20110311965A1 (en) Methods of enhancing translocation of charged analytes through transmembrane protein pores
CN109072295A (zh) 修饰的纳米孔,包含其的组合物及其应用
CN106414773A (zh) 利用分子孔表征目标分子的方法
KR20150003272A (ko) 돌연변이체 리세닌 기공
KR20140125874A (ko) 폴리머의 측정의 분석
CN113651876B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023060419A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023060418A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023060421A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023060420A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023060422A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023050031A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023019470A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
WO2023019471A1 (zh) 孔蛋白单体的突变体、蛋白孔及其应用
CN113735948B (zh) 孔蛋白单体的突变体、蛋白孔及其应用
CN115960182A (zh) 孔蛋白单体的突变体、蛋白孔及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21960170

Country of ref document: EP

Kind code of ref document: A1