WO2023160163A1 - 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法 - Google Patents

一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法 Download PDF

Info

Publication number
WO2023160163A1
WO2023160163A1 PCT/CN2022/140081 CN2022140081W WO2023160163A1 WO 2023160163 A1 WO2023160163 A1 WO 2023160163A1 CN 2022140081 W CN2022140081 W CN 2022140081W WO 2023160163 A1 WO2023160163 A1 WO 2023160163A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
protein
strain
dna
transcription factor
Prior art date
Application number
PCT/CN2022/140081
Other languages
English (en)
French (fr)
Inventor
倪磊
金帆
李飞旋
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2023160163A1 publication Critical patent/WO2023160163A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/21Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Pseudomonadaceae (F)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/74Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora
    • C12N15/78Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora for Pseudomonas
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04012Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4) dCMP deaminase (3.5.4.12)
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/22Vectors comprising a coding region that has been codon optimised for expression in a respective host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/38Pseudomonas
    • C12R2001/385Pseudomonas aeruginosa

Definitions

  • the invention belongs to the technical field of synthetic biology, and in particular relates to a method for in situ detection of binding positions of proteins and deoxyribonucleotides.
  • the binding site of a transcriptional regulatory protein on its genomic DNA is usually a relatively conserved nucleotide sequence. Finding the binding site of a target transcription factor on genomic DNA can help researchers discover which genes the transcription factor regulates. Therefore, the detection method of protein-DNA interaction is a key technology for analyzing gene regulatory circuits and signal transduction pathways in living systems.
  • the most commonly used method for detecting the interaction between protein and DNA is chromatin immunoprecipitation-sequencing technology.
  • the fragments are fragmented, enriched and separated, and finally the sequences of these DNAs are obtained by sequencing.
  • some people have developed the Calling-cards method, which fuses the target transcription factor with the retrotransposon guide protein Sir4, guides the transposon to transpose and insert the transcription factor binding region on the genome, and then passes enzyme digestion, ring Geneticization and sequencing methods are used to determine the position of transposon insertion, and then to determine the binding site of the target transcription factor.
  • the Calling-cards method is currently used by fewer people.
  • the object of the present invention is to provide a method for in situ detection of the binding position of protein and deoxyribonucleotide.
  • the present invention realizes global scanning of target protein binding sites on the genome, and uses bioinformatics methods to predict the conserved core of protein-binding deoxyribonucleotides (hereinafter referred to as DNA).
  • DNA protein-binding deoxyribonucleotides
  • Nucleotide sequence which can detect the interaction between target protein and DNA in microbial cells. The specific technical scheme is as follows:
  • the first aspect of the present invention provides a method for in situ detection of binding positions of proteins and deoxyribonucleotides, comprising:
  • the target bacterial strain and the control bacterial strain do not contain the uracil DNA glycosidase gene;
  • the genomic DNA of the target strain and the control strain in step 3) are each set up n groups for high-throughput sequencing to further obtain the target protein binding site, which specifically includes: Genomic DNA of n groups of target strains and n groups of control strains High-throughput sequencing, analyzing and obtaining the point mutations of the genomic DNA of the target strain and the control strain, after eliminating the point mutations shared by the target strain and the control strain, screening the common point mutations of n groups of target strains from the remaining point mutations of the target strains, and screening The consensus point mutation in the non-coding region is the target protein binding site;
  • the construction method of the target strain expressing CDN-target protein fusion protein comprises: constructing CDN-target protein fusion protein expression vector, and then the fusion The protein expression vector is introduced into a strain that does not contain the uracil DNA glycosidase gene;
  • the method for constructing the control strain for achieving the target protein comprises: constructing a target protein expression vector, and then introducing the target protein expression vector into a strain without uracil DNA glycosidase gene.
  • CDN is located at the nitrogen terminal of the fusion protein
  • the target protein is located at the carbon terminal of the fusion protein
  • cytosine The deoxynucleotide deaminase and the target protein are linked by 8 glycines.
  • the steps of inducing the protein expression of the target strain and the control strain include: resuscitating the strain, picking clone plaques, and using LB medium plus an inducer to shake the bacteria at 37°C for 6-20 hours;
  • the inducer is arabinose
  • the formula of the LB medium is: sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L.
  • the target protein is a transcription factor.
  • the method for in situ detecting the binding position of protein and deoxyribonucleotide also includes calculating the conserved DNA sequence bound by the target protein, and the specific steps include:
  • step 3 Taking the target protein binding site obtained in step 3) as the center, extending m base pairs upstream and downstream of the genomic DNA to obtain a DNA fragment sequence containing the target protein binding site; the 10 ⁇ m ⁇ 200;
  • the second aspect of the present invention provides a method for predicting the intracellular concentration of a protein transcribed and expressed by a promoter, comprising: mutating the transcription factor-targeted promoter based on the method for in situ detection of the protein-deoxyribonucleotide binding position of the present invention, And predict the intracellular concentration of the protein expressed by the promoter transcription according to the following formula;
  • x represents the proportion of mutated DNA on the promoter targeted by the transcription factor
  • Protein indicates the intracellular concentration of the protein expressed by the promoter transcription
  • t represents the total time for the mutation process to occur
  • K5 represents the equilibrium constant, and K5 is expressed as follows:
  • k TIC1 and k TIC2 are constants, k translation is the translation rate, ⁇ mRNA is the mRNA decomposition rate, and ⁇ Protein is the protein degradation rate.
  • k m represents the mutation rate of transcription factor binding sites
  • t represents the total time for the mutation process to occur.
  • is the proportion of DNA bound to transcription factor protein, and ⁇ is expressed as follows:
  • [LasR] represents the concentration of the transcription factor protein, k d represents the equilibrium constant;
  • k 3 is a constant, and k 3 is expressed as follows:
  • K 3 [RNAP] ⁇ [DNA all ] ⁇ K 1 ⁇ K 2 ⁇ k TIC1
  • RNAP represents the concentration of RNA polymerase
  • DNA all represents the total target DNA concentration in the bacterial cell
  • K 1 and K 2 are the equilibrium constants in the process of recruiting ribonucleic acid polymerase by the following transcription factors:
  • RNAP stands for RNA polymerase
  • LasR stands for transcription factor protein
  • RNAP-LasR 2 -DNA stands for complex of RNA polymerase with transcription factor and DNA
  • LasR-TIC stands for transcription initiation complex in open-loop state bound to transcription factor protein thing
  • the ⁇ is determined by the method of the present invention for in situ detection of binding positions of proteins and deoxyribonucleotides.
  • the present invention provides a method for in situ detection of the binding position of protein and deoxyribonucleotides, using the mutation ability of pyrimidine deoxyribonucleotide deaminase in bacteria to express cytosine deoxyribonucleotide deaminase
  • the target strain for the ammoniaase-target protein fusion protein after which only simple genome sequencing is required to obtain the final result.
  • the method uses basic experimental conditions such as PCR and simple molecular cloning techniques to complete the construction of CDN-target protein fusion protein expression plasmids, which is convenient, quick and time-saving, and does not require cumbersome experimental design. This method has a low threshold for operation, and after the construction of the strain is completed, only the bacteria need to be cultured and sent for sequencing.
  • the obtained target protein binding sites can also help to understand the transcription level of the target promoter bound by the transcription factor protein.
  • Fig. 1 is the experimental circuit diagram of detecting target protein and DNA binding site by fusion CDN;
  • Figure 2 is the base mutation effect on the rhlI gene promoter region after 3 hours, 6 hours, 9 hours, and 12 hours of shaking bacteria induction;
  • Figure 3 is a gel electrophoresis migration experiment to verify that the LasR protein can bind to multiple promoters detected by sequencing
  • Figure 4 is the relationship between the ratio of promoter mutations at different transcription levels and the transcription level of the corresponding promoter
  • Figure 5 is the predicted conserved nucleotide sequence of LasR binding.
  • the principle of the present invention is to fuse cytosine deoxynucleotide deaminase and target protein, and use the mutation ability of pyrimidine deoxynucleotide deaminase to DNA in bacteria to mutate the DNA near the fused target protein , and then find the position of the gene mutation by means of sequencing, which is the binding site of the target protein on the genome.
  • the specific implementation of the present invention includes the steps of designing fusion protein expression, target bacterial strain construction, bacterial culture, sequencing analysis and the like. details as follows:
  • Designing fusion protein expression is a key link of the present invention, specifically including expression optimization of CDN deaminase gene and selection of expression vector.
  • the coding sequence of the original cytosine deoxynucleotide deaminase gene was optimized using the webpage tool Jcat ( http://www.jcat.de/ ) to ensure that the codon used by the gene is the most commonly used codon in the target strain and was synthesized in a commercial company.
  • the primers for polymerase chain reaction were designed to amplify the optimized cytosine deoxynucleotide deaminase gene, the target protein gene and the plasmid vector used to induce the expression of the fusion protein, ready to be applied to the next step of Gibson assembly reaction.
  • cytosine deoxynucleotide deaminase is placed at the nitrogen end of the fusion protein
  • the target protein is placed at the carbon end of the fusion protein
  • 8 glycines are connected in the middle.
  • the target protein gene fragment was connected with the linearized fragment of the plasmid vector used to induce the expression of the fusion protein through the Gibson assembly reaction, and then transformed into E. coli competent strain Top10 by chemical transformation method, and the bacteria were spread into Mycin-resistant plates, pick clones, and sequence verification were finally obtained to obtain a correctly connected plasmid expressing the target protein.
  • the target strain that deletes the uracil DNA glycosidase gene in the bacteria.
  • the target plasmid cloned with the fusion fragment of the cytosine deoxynucleotide deaminase gene and the target protein gene and the plasmid expressing the target protein were respectively introduced into it by electroporation, Pick a single clone on a gentamicin-resistant plate, use polymerase chain reaction to identify the correct clone, and then shake the strain to preserve the strain, that is, the target strain containing the target plasmid and no fusion cytosine deoxynucleotide deaminase A control strain expressing the target protein for the gene.
  • Preserved strains were stored in glycerol at a final concentration of 35% and stored in a -80°C freezer.
  • Bacterial culture First, the above-constructed target strain containing the target plasmid and the control strain expressing the target protein without fusion of the CDN gene were resuscitated by streaking and cultured in a constant temperature incubator at 37° C. for 20 hours. Pick the colony plaque and shake the bacteria at 37°C, use LB medium (sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L) plus inducer arabinose final concentration 0.4% (mass fraction), each shake 1 ml of bacteria, the total time of shaking bacteria induction is 6-20 hours. Bacteria were collected by centrifugation and the genomic DNA of the bacteria was extracted using a commercial genome extraction kit.
  • LB medium sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L
  • inducer arabinose final concentration 0.4% mass fraction
  • CDN was placed at the nitrogen end of the fusion protein
  • LasR was placed at the carbon end of the fusion protein
  • 8 glycines were connected in the middle.
  • the amino acid sequence of the transcription factor LasR protein is shown in SEQ ID NO.1, and the nucleotide sequence is shown in SEQ ID NO.2.
  • the amino acid sequence of cytosine deoxynucleotide deaminase is as shown in SEQ ID NO.3, according to the codon usage situation of target bacterial strain Pseudomonas aeruginosa, utilize web tool Jcat ( http://www.jcat.de/ ) to optimize the coding sequence of the original cytosine deoxynucleotide deaminase gene, the nucleotide sequence of the optimized cytosine deoxynucleotide deaminase is shown in SEQ ID NO.4, and in the commercial company synthesized.
  • the plasmid vector used to induce the expression of the fusion protein is the pJN105 plasmid vector, which is a tool plasmid that can induce the expression of the target gene through arabinose, and uses the arabinose-responsive promoter on the tool plasmid to control the expression of the target gene.
  • the sequence is shown in SEQ ID NO.5, where the bold mark is the region of the arabinose promoter sequence, and the black triangle indicates the insertion position of the target gene.
  • Primers for polymerase chain reaction were designed to amplify the codon-optimized cytosine deoxynucleotide deaminase gene, LasR gene fragment and the plasmid vector pJN105 used to induce the expression of the fusion protein.
  • the codon-optimized cytosine deoxynucleotide deaminase gene, the LasR gene fragment and the plasmid vector pJN105 used to induce the expression of the fusion protein were amplified with the designed polymerase chain reaction primers. After amplifying the LasR gene fragment and the codon-optimized cytosine deoxynucleotide deaminase gene, the two DNA fragments were connected by overlapping extension polymerase chain reaction, and the fusion of the two gene fragments was obtained after purification And the DNA fragment of the 8 glycine-linked sequences in the middle.
  • the above-mentioned DNA fragment was connected with the linearized fragment of the pJN105 plasmid vector by Gibson assembly reaction, and then transformed into E. coli competent strain Top10 by chemical transformation method, the bacteria were plated with gentamicin resistance, and the clones were picked and sequenced Verify that the target plasmid expressing CDN-target protein fusion protein with correct connection is finally obtained.
  • the LasR gene fragment was connected to the linearized fragment of the pJN105 plasmid vector through the Gibson assembly reaction, and then transformed into E. coli competent strain Top10 by chemical transformation method, and the bacteria were spread on a gentamicin-resistant plate, picked Cloning and sequencing verification finally obtained a plasmid expressing LasR protein with correct connection.
  • the No. 750 gene PA0750 gene on the genome of Pseudomonas aeruginosa is knocked out by means of homologous recombination, and the Pseudomonas aeruginosa strain with the intracellular uracil DNA glycosidase gene deleted is obtained.
  • the obtained target plasmid expressing cytosine deoxynucleotide deaminase-target protein fusion protein and the plasmid expressing LasR protein were respectively introduced into the Pseudomonas aeruginosa strain with deleted intracellular uracil DNA glycosidase gene by electroporation. Pick a single clone on the mycin-resistant plate, use polymerase chain reaction to identify the correct clone, and then shake the strain to preserve the strain, that is, the target strain containing the target plasmid and the expression of the cytosine deoxynucleotide deaminase gene without fusion Control strain for LasR protein. Preserved strains were stored in glycerol at a final concentration of 35% and stored in a -80°C freezer.
  • the above-constructed target strain containing the target plasmid and the control strain expressing LasR protein without fusion of CDN gene were recovered by streaking and cultured in a constant temperature incubator at 37°C for 20 hours.
  • Pick the colony plaque and shake the bacteria at 37°C use LB medium (sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L) plus inducer arabinose final concentration 0.4% (mass fraction), each shake 1 ml of bacteria, the total time of shaking bacteria induction is 12 hours.
  • Bacteria were collected by centrifugation and the genomic DNA of the bacteria was extracted using a commercial genome extraction kit.
  • Example 2 After recovering the target strain and the control strain constructed in Example 1, pick the colony plaque and shake it at 37°C, use LB medium (sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L), Shake 1 ml each time, add inducer arabinose to a final concentration of 0.4% (mass fraction), and shake for 3, 6, 9, 12 hours total. Bacteria were collected by centrifugation and the genomic DNA of the bacteria was extracted using a commercial genome extraction kit. And adopt embodiment 1 method to analyze.
  • LB medium sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L
  • inducer arabinose to a final concentration of 0.4% (mass fraction)
  • Bacteria were collected by centrifugation and the genomic DNA of the bacteria was extracted using a commercial genome extraction kit. And adopt embodiment 1 method to analyze.
  • the known LasR-bound promoter promoter of the rhlI gene
  • the framed position is the mutated DNA base, and as the induction time prolongs, the promoter region The proportion of mutations increased significantly until it was close to complete mutation.
  • Example 2 After recovering the target strain and the control strain constructed in Example 1, pick the colony plaque and shake it at 37°C, use LB medium (sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L), After induction with 0.4% arabinose and 10 ⁇ m/L 3-oxylauroyl homoserine lactone for 12 hours, the bacteria were collected for genome sequencing. And adopt embodiment 1 method to analyze.
  • LB medium sodium chloride 10g/L, yeast powder 5g/L, peptone 10g/L
  • a total of 16 promoter region mutations were obtained, including 10 known promoters, including PA1003, PA1431, PA2426, PA2763, PA2769, PA3104, PA3326, PA3384, rhll, and PA3904.
  • 6 promoters that have not been found before were found, including PA0717, PA0727, PA0861, PA1131, PA3347, and PA5295.
  • the binding sites obtained after genome sequencing were verified by gel electrophoresis shift assay (EMSA), as shown in Figure 3, which shows that all the new promoters measured have LasR binding activity.
  • the transcriptional situation at the corresponding position is estimated based on the mutation ratio of the measured mutation position. Based on this, the present invention establishes a theoretical model based on the steady-state assumption, which is verified by experiments.
  • the model is as follows:
  • the formula (1) is the binding equilibrium reaction formula of the transcription factor and the target promoter, and the equilibrium constant is k d .
  • LasR indicates a transcription factor protein.
  • be the proportion of DNA bound to transcription factor proteins, then ⁇ can be expressed as:
  • RNA transcription factors recruiting ribonucleic acid
  • RNAP means RNA polymerase
  • RNAP-LasR 2 -DNA means the complex of RNA polymerase
  • transcription factor and DNA means the transcription factor protein.
  • LasR-TIC means the transcription initiation complex in the open-loop state combined with transcription factor protein.
  • K1 and K2 are the equilibrium constants for the two-step reactions, respectively. From formula (4), the following relationship can be obtained:
  • [LasR-TIC] [RNAP] ⁇ [LasR 2 -DNA] ⁇ K 1 ⁇ K 2 (5)
  • the mutation process is a first-order reaction that must be mediated by LasR-TIC, which can be written as:
  • t is the total time that the mutation process takes place, that is, the time when the present invention induces the protein expression of the target strain and the control strain, combined with formulas (3) and (5), the base ratio of mutation is finally expressed as:
  • mRNA means messenger RNA
  • Protein means protein
  • k transcription is the transcription rate constant
  • k translation is the translation rate
  • ⁇ mRNA is the mRNA decomposition rate
  • ⁇ Protein is the protein degradation rate.
  • the transcription process can be considered as a zero-order reaction mediated by LasR-TIC, and the transcription rate constant can be expressed as:
  • K 4 k translation ⁇ k TIC2 ⁇ [RNAP] ⁇ DNA all ⁇ K 1 ⁇ K 2 /( ⁇ mRNA ⁇ Protein ), which is a constant.
  • the relationship between the intracellular concentration [Protein] of the target protein transcribed and expressed by the promoter DNA targeted by the transcription factor protein and the mutation ratio x on the corresponding promoter DNA can be expressed as follows:
  • LasR-bound promoters with different transcription levels were further constructed, and fluorescent proteins were used to characterize the concentration of proteins expressed by the promoters, and then the results of next-generation sequencing were used to calculate the frequency of mutations at corresponding times, as shown in Figure 4 .
  • the theoretical curve can fit the experimental data very well, indicating that the predicted relationship between the transcription level of the promoter DNA and the ratio of the transcription factor protein to its mutation is correct.
  • the present invention relates to amino acid and nucleotide sequences as follows:
  • amino acid sequence (SEQ ID NO.1) that is used for the transcription factor protein LasR of test
  • the nucleotide sequence (SEQ ID NO.2) that is used for the transcription factor protein LasR of test
  • Amino acid sequence (SEQ ID NO.3) of cytosine deoxynucleotide deaminase
  • the nucleotide sequence (SEQ ID NO.4) of optimized cytosine deoxynucleotide deaminase
  • nucleotide sequence of the pJN105 plasmid vector used to construct the clone where the bold mark is the arabinose promoter sequence region, and the black triangle indicates the position where the target gene is inserted (SEQ ID NO.5)

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Medicinal Chemistry (AREA)
  • Plant Pathology (AREA)
  • Immunology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供原位检测蛋白质与脱氧核糖核苷酸结合位置的方法,包括构建表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标菌株,以及表达目标蛋白的对照菌株,所述目标菌株和对照菌株不含尿嘧啶DNA 糖苷酶基因;诱导目标菌株和对照菌株的蛋白表达后,提取目标菌株和对照菌株的基因组DNA;对基因组 DNA进行高通量测序,分析获得目标菌株和对照菌株的基因组 DNA的点突变,剔除目标菌株和对照菌株共有的点突变后,从目标菌株的余下点突变中筛选处于非编码区的点突变,即得。利用细菌中嘧啶脱氧核苷酸脱氨酶对DNA的突变能力,通过构建表达融合蛋白的目标菌株,之后只需简单的基因组测序即可得到最终结果,方便快捷。

Description

一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法 技术领域
本发明属于合成生物学技术领域,具体涉及一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法。
背景技术
转录调节蛋白在其基因组DNA上的结合位点通常是一段比较保守的核苷酸序列,寻找目标转录因子在基因组DNA上的结合位置可以帮助研究者发现该转录因子调节哪些基因的表达。因此,蛋白质与DNA相互作用的检测方法是解析生命系统基因调控线路以及信号传导通路的一项关键技术。
目前用于检测蛋白质与DNA相互作用最常用的方法是染色质免疫共沉淀-测序技术,该方法在细菌中的具体做法是首先过表达融合了标签序列的转录因子,进而对转录因子结合的DNA片段进行打断、富集和分离,最后通过测序的方法获得这些DNA的序列。另外,还有一些人开发出Calling-cards方法,将目标转录因子与逆转录转座子引导蛋白Sir4融合,引导转座子在基因组上转录因子结合区域进行转座插入,再通过酶切、环化和测序的方法来确定转座子插入的位置,进而确定目标转录因子的结合位点。Calling-cards方法目前使用的人较少。
现有的染色质免疫共沉淀-测序技术的缺点是:1)操作时间较长,劳动量大;2)步骤较多,一些实验中的微小失误可能慢慢积累,导致最终得到的数据结果不能满意,也因为这个原因,实验发生问题之后的排查也很困难;3)实验过程中需要注意的技术细节较多,比如蛋白质如何固定、DNA如何纯化、DNA打断的超声条件等都会大大影响最终结果,对于很多实验人员来说难以在短时间内掌握技术要领,因此不具备很好的推广性。
因此,寻找一种快速检测蛋白质与DNA相互作用的方法具有重要意义。
发明内容
为了解决现有技术中的不足,本发明的目的在于提供一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法。本发明通过理论推导以及实验方法的优化成熟,实现在基因组上对目标蛋白结合位点的全局扫描,并利用生物信息学方法预测蛋白质结合脱氧核糖核苷酸(以下都简写为DNA)的保守核苷酸序列,能够在微生物细胞内部实现对目标蛋白质与DNA相互作用检测。具体技术方案如下:
本发明第一方面提供一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法,包括:
1)构建表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标菌株,以及表达目标蛋白的对照菌株;
所述目标菌株和对照菌株不含尿嘧啶DNA糖苷酶基因;
2)诱导目标菌株和对照菌株的蛋白表达后,提取目标菌株和对照菌株的基因组DNA;
3)对目标菌株和对照菌株的基因组DNA进行高通量测序,分析获得目标菌株和对照菌株的基因组DNA的点突变,剔除目标菌株和对照菌株共有的点突变后,从目标菌株的余下点突变中筛选处于非编码区的点突变,即为目标蛋白结合位点。
进一步地,步骤3)中目标菌株和对照菌株的基因组DNA各设置n组进行高通量测序,进一步获得目标蛋白结合位点,具体包括:对n组目标菌株和n组对照菌株的基因组DNA进行高通量测序,分析获得目标菌株和对照菌株的基因组DNA的点突变,剔除目标菌株和对照菌株共有的点突变后,从目标菌株的余下点突变中筛选n组目标菌株的共有点突变,筛选处于于非编码区的共有点突变,即为目标蛋白结合位点;
所述2≤n≤5。
进一步地,所述表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标菌株的构建方法包括:构建胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白表达载体,然后将该融合蛋白表达载体导入不含尿嘧啶DNA糖苷酶基因的菌株中;
所述达目标蛋白的对照菌株的构建方法包括:构建目标蛋白表达载体,然后将该目标蛋白表达载体导入不含尿嘧啶DNA糖苷酶基因的菌株中。
进一步地,所述胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白表达载体中,胞嘧啶脱氧核苷酸脱氨酶位于融合蛋白的氮端,目标蛋白位于融合蛋白的碳端,胞嘧啶脱氧核苷酸脱氨酶和目标蛋白用8个甘氨酸连接。
进一步地,所述诱导目标菌株和对照菌株的蛋白表达,步骤包括:复苏菌株,挑取克隆菌斑,使用LB培养基加诱导剂37℃摇菌6-20小时;
优选地,所述诱导剂为阿拉伯糖;
优选地,所述LB培养基的配方为:氯化钠10g/L,酵母粉5g/L,蛋白胨10g/L。
进一步地,所述目标蛋白为转录因子。
进一步地,所述原位检测蛋白质与脱氧核糖核苷酸结合位置的方法还包括计算目标蛋白结合的保守DNA序列,具体步骤包括:
以步骤3)获得的目标蛋白结合位点为中心,在基因组DNA的上游和下游各延长m个碱基对,获得包含目标蛋白结合位点的DNA片段序列;所述10≤m≤200;
将包含目标蛋白结合位点的DNA片段序列存入文本文档,利用生物信息学工具MEME Su ite计算出目标蛋白结合的保守DNA序列。
本发明第二方面提供一种启动子转录表达的蛋白质胞内浓度的预测方法,包括:基于本发明原位检测蛋白质与脱氧核糖核苷酸结合位置的方法对转录因子靶向启动子进行突变,并根据下述公式预测启动子转录表达的蛋白质胞内浓度;
x=1-exp(-K 5·[Protein]·t)
其中,
x表示转录因子靶向启动子上被突变DNA所占比例;
[Protein]表示启动子转录表达的蛋白质胞内浓度;
t表示突变过程发生的总时间;
K 5表示平衡常数,K 5表示如下:
Figure PCTCN2022140081-appb-000001
k TIC1、k TIC2为常数,k translation为翻译率,γ mRNA为mRNA分解率,γ Protein为蛋白质降解率。
进一步地,所述转录因子靶向启动子上被突变DNA所占比例x表示如下:
x=1-exp(-k m·t)
其中,
k m表示转录因子结合位点的突变率;
t表示突变过程发生的总时间。
进一步地,所述转录因子靶向启动子上被突变DNA所占比例x表示如下:
x=1-exp(-K 3·θ·t)
其中,
θ为结合了转录因子蛋白的DNA所占据的比例,θ表示如下:
Figure PCTCN2022140081-appb-000002
[LasR]表示转录因子蛋白的浓度,k d表示平衡常数;
k 3为常数,k 3表示如下:
K 3=[RNAP]·[DNA all]·K 1·K 2·k TIC1
[RNAP]表示RNA聚合酶的浓度,[DNA all]表示细菌胞内总的目标DNA浓度,K 1和K 2分别是如下转录因子招募核糖核酸聚合酶的过程中的平衡常数:
Figure PCTCN2022140081-appb-000003
RNAP表示RNA聚合酶,LasR表示转录因子蛋白,RNAP-LasR 2-DNA表示RNA聚合酶与转录因子和DNA的复合物,LasR-TIC表示结合了转录因子蛋白的开环状态下的转录起始复合物;
优选地,所述θ采用本发明原位检测蛋白质与脱氧核糖核苷酸结合位置的方法确定。
本发明的有益效果为:
(1)本发明提供一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法,利用细菌中嘧啶脱氧核苷酸脱氨酶对DNA的突变能力,通过构建表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标菌株,之后只需简单的基因组测序即可得到最终结果。该方法通过基本的实验条件如PCR、简单分子克隆技术等,完成胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白表达质粒的构建,方便快捷又省时,无需繁琐的实验设计。该方法操作门槛低,完成了菌株构建之后只需细菌培养送测序即可。
(2)基于本发明原位检测蛋白质与脱氧核糖核苷酸结合位置的方法,得到的目标蛋白结合位点还可以帮助理解转录因子蛋白结合的目标启动子的转录水平。
附图说明
图1为融合胞嘧啶脱氧核苷酸脱氨酶检测目标蛋白与DNA结合位点的实验线路图;
图2为摇菌诱导3小时、6小时、9小时、12小时后rhlI基因启动子区域上碱基突变效果;
图3为凝胶电泳迁移实验验证LasR蛋白与测序检出的多个启动子可以结合;
图4为不同转录水平启动子突变比例与相应启动子转录水平的关系;
图5为LasR结合的保守核苷酸序列预测。
具体实施方式
为了更清楚地理解本发明,现参照下列实施例及附图进一步描述本发明。实施例仅用于解释而不以任何方式限制本发明。实施例中,各原始试剂材料均可商购获得,未注明具体条件的实验方法为所属领域熟知的常规方法和常规条件,或按照仪器制造商所建议的条件。
本发明的原理是将胞嘧啶脱氧核苷酸脱氨酶和目标蛋白融合起来,在细菌中利用嘧啶脱氧核苷酸脱氨酶对DNA的突变能力,对其融合的目标蛋白附近的DNA进行突变,再通过测序的手段找到基因突变的位置,即为目标蛋白在基因组上的结合位点。本发明的具体实施包括设计融合蛋白表达、目标细菌菌株构建、细菌培养和测序分析等步骤。具体如下:
1)设计融合蛋白表达:设计融合蛋白表达是本发明的关键环节,具体包括胞嘧啶脱氧核苷酸脱氨酶基因的表达优化和表达载体的选取。首先,根据目标菌株的密码子使用情况,利用网页工具Jcat( http://www.jcat.de/)对原先胞嘧啶脱氧核苷酸脱氨酶基因的编码序列进行优化,确保该基因使用的是目标菌株最常使用的密码子,并在商业化公司合成出来。然后,分别设计聚合酶链式反应的引物用于扩增优化后的胞嘧啶脱氧核苷酸脱氨酶基因、目标蛋白基因和用于诱导表达融合蛋白的质粒载体,准备应用到下一步的吉布森组装反应中。优选地,在设计中将胞嘧啶脱氧核苷酸脱氨酶放在融合蛋白的氮端,将目标目标蛋白放在融合蛋白碳端,中间用8个甘氨酸连接。
2)目标菌株的构建:扩增得到目标蛋白基因片段和密码子优化后的胞嘧啶脱氧核苷酸脱氨酶基因之后,利用重叠延伸聚合酶链式反应的方法把两段DNA连接起来,纯化之后得到融合了两个基因以及中间连接序列的DNA片段。再通过吉布森组装反应将上述DNA片段与用于诱导表达融合蛋白的质粒载体的线性化片段连接起来,进而用化学法转化的方法转入大肠杆菌感受态菌株Top10中,将细菌铺庆大霉素抗性平板、挑克隆测序验证最终获得连接正确的目标质粒。
同时,通过吉布森组装反应将目标蛋白基因片段与用于诱导表达融合蛋白的质粒载体的线性化片段连接起来,进而用化学法转化的方法转入大肠杆菌感受态菌株Top10中,将细菌铺庆大霉素抗性平板、挑克隆测序验证最终获得连接正确的表达目标蛋白的质粒。
另一方面,着手构建删除细菌胞内尿嘧啶DNA糖苷酶基因的目标菌株。首先找到细菌基因组上的尿嘧啶DNA糖苷酶基因,利用同源重组的方式将该基因敲除,以防止该基因的编码蛋白对突变的DNA碱基进行修复。
获得删除细菌胞内尿嘧啶DNA糖苷酶基因的目标菌株后,通过电转将克隆了胞嘧啶脱氧核苷酸脱氨酶基因与目标蛋白基因融合片段的目标质粒和表达目标蛋白的质粒分别导入其中,在庆大霉素抗性平板上挑取单克隆,用聚合酶链式反应鉴定正确的克隆后摇菌保种,即得到包含目标质粒的目标菌株和没有融合胞嘧啶脱氧核苷酸脱氨酶基因的表达目标蛋白的对照菌株。保存的菌株存储在35%终浓度的甘油中,存放于-80℃冰箱。
3)细菌培养:首先划线复苏上面构建的包含目标质粒的目标菌株和没有融合胞嘧啶脱氧核苷酸脱氨酶基因的表达目标蛋白的对照菌株,在37℃的恒温培养箱培养20小时。挑取克隆菌斑37℃摇菌,使用LB培养基(氯化钠10g/L,酵母粉5g/L,蛋白胨10g/L)加诱导剂阿拉伯糖终浓度0.4%(质量分数),每次摇菌1毫升,摇菌诱导总时间6-20小时。离心收集细菌,并使用商业化的基因组提取试剂盒来提取细菌的基因组DNA。
4)测序和分析:测序设置三组目标菌株样品和三组对照菌株。细菌的基因组测序由商业化公司完成。获得全基因组的点突变结果之后,首先将目标菌株和对照菌株共有的点突变剔除,然后 从余下的点突变中找到三组目标菌株都包含的共有点突变。进一步的,将处于基因非编码区的共有点突变筛选出来,即为目标目标蛋白结合的位点。然后以每个突变位置为中心,在基因组上向前和向后各延长50个碱基对,对于每个点突变即截取得到100个碱基对长度的DNA片段序列。然后将这些序列批量存入文本文档,再用生物信息学工具MEME Suite( Introduction-MEME Suite (meme-suite.org))算出转录因子蛋白结合的保守DNA序列。
实施例1
以下以在铜绿假单胞菌胞内检测转录因子LasR蛋白的DNA结合位点为例,详细说明本发明方法的可行性。实验路线图如图1,具体方案如下:
1)设计胞嘧啶脱氧核苷酸脱氨酶-LasR融合蛋白表达
在设计融合蛋白时,将胞嘧啶脱氧核苷酸脱氨酶放在融合蛋白的氮端,将LasR放在融合蛋白碳端,中间用8个甘氨酸连接。
转录因子LasR蛋白的氨基酸序列如SEQ ID NO.1所示,核苷酸序列如SEQ ID NO.2所示。
胞嘧啶脱氧核苷酸脱氨酶的氨基酸序列如SEQ ID NO.3所示,根据目标菌株铜绿假单胞菌的密码子使用情况,利用网页工具Jcat( http://www.jcat.de/)对原先胞嘧啶脱氧核苷酸脱氨酶基因的编码序列进行优化,优化后的胞嘧啶脱氧核苷酸脱氨酶的核苷酸序列如SEQ ID NO.4所示,并在商业化公司合成出来。
用于诱导表达融合蛋白的质粒载体为pJN105质粒载体,是一种可以通过阿拉伯糖诱导表达目标基因的工具质粒,利用工具质粒上的阿拉伯糖响应的启动子控制目标基因的表达,其核苷酸序列如SEQ ID NO.5所示,其中加粗标识为阿拉伯糖启动子序列区域,黑色三角形指示目标基因插入的位置。
分别设计聚合酶链式反应的引物用于扩增密码子优化后的胞嘧啶脱氧核苷酸脱氨酶基因、LasR基因片段和用于诱导表达融合蛋白的质粒载体pJN105。
2)目标菌株和对照菌株的构建
用所设计聚合酶链式反应的引物扩增密码子优化后的胞嘧啶脱氧核苷酸脱氨酶基因、LasR基因片段和用于诱导表达融合蛋白的质粒载体pJN105。扩增得到LasR基因片段和密码子优化后的胞嘧啶脱氧核苷酸脱氨酶基因之后,利用重叠延伸聚合酶链式反应的方法把两段DNA连接起来,纯化之后得到融合了两个基因片段以及中间8个甘氨酸连接序列的DNA片段。
通过吉布森组装反应将上述DNA片段与pJN105质粒载体的线性化片段连接起来,进而用化学法转化的方法转入大肠杆菌感受态菌株Top10中,将细菌铺庆大霉素抗性平板、挑克隆测序验证最终获得连接正确的表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标质粒。
同时,通过吉布森组装反应将LasR基因片段与pJN105质粒载体的线性化片段连接起来,进而用化学法转化的方法转入大肠杆菌感受态菌株Top10中,将细菌铺庆大霉素抗性平板、挑克隆测序验证最终获得连接正确的表达LasR蛋白的质粒。
利用同源重组的方式将铜绿假单胞菌基因组上第750号基因PA0750基因敲除,获得删除胞内尿嘧啶DNA糖苷酶基因的铜绿假单细菌菌株。
通过电转将所得表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标质粒和表达LasR蛋白的质粒分别导入删除胞内尿嘧啶DNA糖苷酶基因的铜绿假单细菌菌株中,在庆大霉素抗性平板上挑取单克隆,用聚合酶链式反应鉴定正确的克隆后摇菌保种,即得到包含目标质粒的目标菌株和没有融合胞嘧啶脱氧核苷酸脱氨酶基因的表达LasR蛋白的对照菌株。保存的菌株存储在35%终浓度的甘油中,存放于-80℃冰箱。
3)细菌培养
首先划线复苏上面构建的包含目标质粒的目标菌株和没有融合胞嘧啶脱氧核苷酸脱氨酶基因的表达LasR蛋白的对照菌株,在37℃的恒温培养箱培养20小时。挑取克隆菌斑37℃摇菌,使用LB培养基(氯化钠10g/L,酵母粉5g/L,蛋白胨10g/L)加诱导剂阿拉伯糖终浓度0.4%(质量分数),每次摇菌1毫升,摇菌诱导总时间12小时。离心收集细菌,并使用商业化的基因组提取试剂盒来提取细菌的基因组DNA。
4)测序和分析:测序设置三组目标菌株样品和三组对照菌株。细菌的基因组测序由商业化公司完成。获得全基因组的点突变结果之后,首先将目标菌株和对照菌株共有的点突变剔除,然后从余下的点突变中找到三组目标菌株都包含的共有点突变。进一步地,将处于基因非编码区的共有点突变筛选出来,即为目标转录因子结合位点。然后以每个突变位点为中心,在基因组上向前和向后各延长50个碱基对,对于每个点突变即截取得到100个碱基对长度的DNA片段序列。然后将这些序列批量存入文本文档,再用生物信息学工具MEME Suite( Introduction-MEME Suite (meme-suite.org))算出转录因子蛋白结合的保守DNA序列。
实施例2
对实施例1构建的目标菌株和对照菌株划线复苏后,挑取克隆菌斑37℃摇菌,使用LB培养基(氯化钠10g/L,酵母粉5g/L,蛋白胨10g/L),每次摇菌1毫升,加入诱导剂阿拉伯糖终浓度0.4%(质量分数),摇菌总时间3、6、9、12小时。离心收集细菌,并使用商业化的基因组提取试剂盒来提取细菌的基因组DNA。并采用实施例1方法进行分析。
如图2,摇菌不同时间检测的已知LasR结合的启动子(rhlI基因的启动子)区域突变情况,框出的位置为发生突变的DNA碱基,随着诱导时间的延长,启动子区域的突变比例明显上升,直至接近于完全突变。
实施例3
对实施例1构建的目标菌株和对照菌株划线复苏后,挑取克隆菌斑37℃摇菌,使用LB培养基(氯化钠10g/L,酵母粉5g/L,蛋白胨10g/L),用0.4%阿拉伯糖和10μm/L的3-氧十二烷酰高丝氨酸内酯诱导12小时后,收集细菌做基因组测序。并采用实施例1方法进行分析。
得到共计16个启动子区域的突变,其中有10个已知的启动子,包括PA1003、PA1431、PA2426、PA2763、PA2769、PA3104、PA3326、PA3384、rhlI、PA3904。另外发现6个为以往未曾发现的启动子,包括PA0717、PA0727、PA0861、PA1131、PA3347、PA5295。对基因组测序后得到的结合位点进行了凝胶电泳迁移实验(EMSA)的验证,如图3,显示测得的新启动子均有LasR结合活性。
实施例4
基于胞嘧啶脱氧核苷酸脱氨酶突变DNA的能力依赖于目标DNA的转录活性这一特点,根据测得突变位置的突变比例来估计对应位置的转录情况。基于此,本发明建立了一个以稳态假设为基础的理论模型,并通过实验进行了验证。模型如下:
Figure PCTCN2022140081-appb-000004
(1)式为转录因子与目标启动子的结合平衡反应式,平衡常数为k d。LasR表示转录因子蛋白。令θ为结合了转录因子蛋白的DNA所占据的比例,则θ可以表示为:
Figure PCTCN2022140081-appb-000005
令细菌胞内总的目标DNA浓度为[DNA all],为一定值,有
[LasR 2-DNA]=DNAall·θ    (3)
转录因子招募核糖核酸(以下简称为RNA)聚合酶的过程可以写成
Figure PCTCN2022140081-appb-000006
其中RNAP表示RNA聚合酶,RNAP-LasR 2-DNA表示RNA聚合酶与转录因子和DNA的复合物,LasR-TIC表示结合了转录因子蛋白的开环状态下的转录起始复合物。K 1和K 2分别是两步反应的平衡常数。由(4)式可得如下关系:
[LasR-TIC]=[RNAP]·[LasR 2-DNA]·K 1·K 2    (5)
突变过程是必须由LasR-TIC介导的一个一级反应,可以写成:
Figure PCTCN2022140081-appb-000007
其中突变率k m正比于LasR-TIC的浓度,即k m=k TIC1·[LasR-TIC],这里k TIC1为一个常数。则根据(6)式可以导出被突变的DNA所占比例x的表达式:
x=1-exp(-k m·t)
其中t为突变过程发生的总时间,也就是本发明诱导目标菌株和对照菌株蛋白表达的时间,结合式(3)和(5),突变的碱基比例最终表示为:
x=1-exp(-K 3·θ·t),(K 3=[RNAP]·[DNA all]·K 1·K 2·k TIC1,常数)    (7)
DNA经过转录、翻译表达蛋白质的过程可以根据中心法则写成如下反应组:
Figure PCTCN2022140081-appb-000008
Figure PCTCN2022140081-appb-000009
Figure PCTCN2022140081-appb-000010
其中mRNA表示信使RNA,Protein表示蛋白质。k transcription是转录速率常数,k translation为翻译率,γ mRNA为mRNA分解率,γ Protein为蛋白质降解率。根据2006年谢晓亮等发表的工作(DOI:10.1103/PhysRevLett.97.168302),细菌胞内蛋白质的平均浓度为
Figure PCTCN2022140081-appb-000011
转录过程可以认为是一个由LasR-TIC介导的一个零级反应,转录速率常数可以表示为:
k transcription=k TIC2·[LasR-TIC]   (9)
这里k TIC2为一个常数。根据式(8)、(9),结合式(3)和(5),得到
[Protein]=K 4·θ    (10)
其中K 4=k translation·k TIC2·[RNAP]·DNA all·K 1·K 2/(γ mRNA·γ Protein),为常数。
进一步地,转录因子蛋白靶向的启动子DNA其转录表达的目标蛋白胞内浓度[Protein]和相应启动子DNA上突变比例x的关系可以表示如下:
Figure PCTCN2022140081-appb-000012
由式(11)可见,转录因子蛋白结合的目标启动子DNA的突变比例与其所表达的蛋白质的浓度成负指数关系,在无限长时间后突变比例趋于1。
基于本发明实施例1和4,确定得到转录因子蛋白LasR结合的目标启动子DNA的突变比例(y)与其所表达的蛋白质的浓度(x)的理论曲线为y=1-exp(-K*x),K=2.075。
在本实施例中中进一步构建了不同转录水平的LasR结合的启动子,并利用荧光蛋白来表征启动子表达蛋白的浓度,再用一代测序的结果算出相应时间下突变的频率,结果如图4。理论曲线可以很好的拟合实验数据,说明预计的启动子DNA的转录水平和转录因子蛋白对其突变比例之间的关系是正确的。
实施例5
进一步根据实施例3所发现的启动子,将这些序列批量存入文本文档,再用生物信息学工具MEME Suite( Introduction-MEME Suite(meme-suite.org))LasR结合的保守DNA序列,如图5。
显然,上述实施例仅仅是为清楚地说明所作的举例,而并非对实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引伸出的显而易见的变化或变动仍处于本发明创造的保护范围之中。
本发明涉及到氨基酸及核苷酸序列如下:
用于测试的转录因子蛋白LasR的氨基酸序列(SEQ ID NO.1)
Figure PCTCN2022140081-appb-000013
用于测试的转录因子蛋白LasR的核苷酸序列(SEQ ID NO.2)
Figure PCTCN2022140081-appb-000014
胞嘧啶脱氧核苷酸脱氨酶的氨基酸序列(SEQ ID NO.3)
Figure PCTCN2022140081-appb-000015
优化后的胞嘧啶脱氧核苷酸脱氨酶的核苷酸序列(SEQ ID NO.4)
Figure PCTCN2022140081-appb-000016
构建克隆所用的pJN105质粒载体的核苷酸序列,其中加粗标识为阿拉伯糖启动子序列区域,黑色三角形指示目标基因插入的位置(SEQ ID NO.5)
Figure PCTCN2022140081-appb-000017
Figure PCTCN2022140081-appb-000018
Figure PCTCN2022140081-appb-000019

Claims (10)

  1. 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法,其特征在于,包括:
    1)构建表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标菌株,以及表达目标蛋白的对照菌株;
    所述目标菌株和对照菌株不含尿嘧啶DNA糖苷酶基因;
    2)诱导目标菌株和对照菌株的蛋白表达后,提取目标菌株和对照菌株的基因组DNA;
    3)对目标菌株和对照菌株的基因组DNA进行高通量测序,分析获得目标菌株和对照菌株的基因组DNA的点突变,剔除目标菌株和对照菌株共有的点突变后,从目标菌株的余下点突变中筛选处于非编码区的点突变,即为目标蛋白结合位点。
  2. 根据权利要求1所述的方法,其特征在于,步骤3)中目标菌株和对照菌株的基因组DNA各设置n组进行高通量测序,进一步获得目标蛋白结合位点,具体包括:对n组目标菌株和n组对照菌株的基因组DNA进行高通量测序,分析获得目标菌株和对照菌株的基因组DNA的点突变,剔除目标菌株和对照菌株共有的点突变后,从目标菌株的余下点突变中筛选n组目标菌株的共有点突变,筛选处于于非编码区的共有点突变,即为目标蛋白结合位点;
    所述2≤n≤5。
  3. 根据权利要求1所述的方法,其特征在于,所述表达胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白的目标菌株的构建方法包括:构建胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白表达载体,然后将该融合蛋白表达载体导入不含尿嘧啶DNA糖苷酶基因的菌株中;
    所述达目标蛋白的对照菌株的构建方法包括:构建目标蛋白表达载体,然后将该目标蛋白表达载体导入不含尿嘧啶DNA糖苷酶基因的菌株中。
  4. 根据权利要求3所述的方法,其特征在于,所述胞嘧啶脱氧核苷酸脱氨酶-目标蛋白融合蛋白表达载体中,胞嘧啶脱氧核苷酸脱氨酶位于融合蛋白的氮端,目标蛋白位于融合蛋白的碳端,胞嘧啶脱氧核苷酸脱氨酶和目标蛋白用8个甘氨酸连接。
  5. 根据权利要求1所述的方法,其特征在于,所述诱导目标菌株和对照菌 株的蛋白表达,步骤包括:复苏菌株,挑取克隆菌斑,使用LB培养基加诱导剂37℃摇菌6-20小时;
    优选地,所述诱导剂为阿拉伯糖;
    优选地,所述LB培养基的配方为:氯化钠10g/L,酵母粉5g/L,蛋白胨10g/L。
  6. 根据权利要求1所述的方法,其特征在于,所述目标蛋白为转录因子。
  7. 根据权利要求1所述的方法,其特征在于,还包括计算目标蛋白结合的保守DNA序列,具体步骤包括:
    以步骤3)获得的目标蛋白结合位点为中心,在基因组DNA的上游和下游各延长m个碱基对,获得包含目标蛋白结合位点的DNA片段序列;所述10≤m≤200;
    将包含目标蛋白结合位点的DNA片段序列存入文本文档,利用生物信息学工具MEME Suite计算出目标蛋白结合的保守DNA序列。
  8. 一种启动子转录表达的蛋白质胞内浓度的预测方法,其特征在于,包括:基于权利要求1所述方法对转录因子靶向启动子进行突变,并根据下述公式预测启动子转录表达的蛋白质胞内浓度;
    x=1-exp(-K 5·[Protein]·t)
    其中,
    x表示转录因子靶向启动子上被突变DNA所占比例;
    [Protein]表示启动子转录表达的蛋白质胞内浓度;
    t表示突变过程发生的总时间;
    K 5表示平衡常数,K 5表示如下:
    Figure PCTCN2022140081-appb-100001
    k TIC1、k TIC2为常数,k translation为翻译率,γ mRNA为mRNA分解率,γ Protein为蛋白质降解率。
  9. 根据权利要求8所述的预测方法,其特征在于,所述转录因子靶向启动子上被突变DNA所占比例x表示如下:
    x=1-exp(-k m·t)
    其中,
    k m表示转录因子结合位点的突变率;
    t表示突变过程发生的总时间。
  10. 根据权利要求8所述的预测方法,其特征在于,所述转录因子靶向启动子上被突变DNA所占比例x表示如下:
    x=1-exp(-K 3·θ·t)
    其中,
    θ为结合了转录因子蛋白的DNA所占据的比例,θ表示如下:
    Figure PCTCN2022140081-appb-100002
    [LasR]表示转录因子蛋白的浓度,k d表示平衡常数;
    k 3为常数,k 3表示如下:
    K 3=[RNAP]·[DNA all]·K 1·K 2·k TIC1
    [RNAP]表示RNA聚合酶的浓度,[DNA all]表示细菌胞内总的目标DNA浓度,K 1和K 2分别是如下转录因子招募核糖核酸聚合酶的过程中的平衡常数:
    Figure PCTCN2022140081-appb-100003
    RNAP表示RNA聚合酶,LasR表示转录因子蛋白,RNAP-LasR 2-DNA表示RNA聚合酶与转录因子和DNA的复合物,LasR-TIC表示结合了转录因子蛋白的开环状态下的转录起始复合物;
    优选地,所述θ采用权利要求1所述方法确定。
PCT/CN2022/140081 2022-02-22 2022-12-19 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法 WO2023160163A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210163961.9A CN115094127A (zh) 2022-02-22 2022-02-22 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法
CN202210163961.9 2022-02-22

Publications (1)

Publication Number Publication Date
WO2023160163A1 true WO2023160163A1 (zh) 2023-08-31

Family

ID=83287479

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/140081 WO2023160163A1 (zh) 2022-02-22 2022-12-19 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法

Country Status (2)

Country Link
CN (1) CN115094127A (zh)
WO (1) WO2023160163A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115094127A (zh) * 2022-02-22 2022-09-23 中国科学院深圳先进技术研究院 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法
WO2024065721A1 (en) * 2022-09-30 2024-04-04 Peking University Methods of determining genome-wide dna binding protein binding sites by footprinting with double stranded dna deaminase

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102482639A (zh) * 2009-04-03 2012-05-30 医学研究会 活化诱导胞苷脱氨酶(aid)突变体及使用方法
WO2017215619A1 (zh) * 2016-06-15 2017-12-21 中国科学院上海生命科学研究院 在细胞内产生点突变的融合蛋白、其制备及用途
CN115094127A (zh) * 2022-02-22 2022-09-23 中国科学院深圳先进技术研究院 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102482639A (zh) * 2009-04-03 2012-05-30 医学研究会 活化诱导胞苷脱氨酶(aid)突变体及使用方法
WO2017215619A1 (zh) * 2016-06-15 2017-12-21 中国科学院上海生命科学研究院 在细胞内产生点突变的融合蛋白、其制备及用途
CN115094127A (zh) * 2022-02-22 2022-09-23 中国科学院深圳先进技术研究院 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
CAREY LUCAS B., VAN DIJK DAVID, SLOOT PETER M. A., KAANDORP JAAP A., SEGAL ERAN: "Promoter Sequence Determines the Relationship between Expression Level and Noise", PLOS BIOLOGY, vol. 11, no. 4, 2 April 2013 (2013-04-02), pages e1001528, XP093087135, DOI: 10.1371/journal.pbio.1001528 *
DUKE JAMIE L., LIU MAN, YAARI GUR, KHALIL ASHRAF M., TOMAYKO MARY M., SHLOMCHIK MARK J., SCHATZ DAVID G., KLEINSTEIN STEVEN H.: "Multiple Transcription Factor Binding Sites Predict AID Targeting in Non-Ig Genes", THE JOURNAL OF IMMUNOLOGY, WILLIAMS & WILKINS CO., US, vol. 190, no. 8, 15 April 2013 (2013-04-15), US , pages 3878 - 3888, XP093087129, ISSN: 0022-1767, DOI: 10.4049/jimmunol.1202547 *
GALLAGHER LARRY A., VELAZQUEZ ELENA, BROOK PETERSON S., CHARITY JAMES C., HSU FOSHENG, RADEY MATTHEW C., GEBHARDT MICHAEL J., DE M: "Genome-wide protein-DNA interaction site mapping using a double strand DNA-specific cytosine deaminase", BIORXIV, 2 August 2021 (2021-08-02), XP093087128, [retrieved on 20230929], DOI: 10.1101/2021.08.01.454665 *
KAMANU FREDERICK KINYUA, MEDVEDEVA YULIA A., SCHAEFER ULF, JANKOVIC BORIS R., ARCHER JOHN A. C., BAJIC VLADIMIR B.: "Mutations and Binding Sites of Human Transcription Factors", FRONTIERS IN GENETICS, vol. 3, 1 June 2012 (2012-06-01), pages 100, XP093087132, DOI: 10.3389/fgene.2012.00100 *
LAGATOR MATO, SARIKAS SRDJAN, STEINRUECK MAGDALENA, TOLEDO-APARICIO DAVID, BOLLBACK JONATHAN P, GUET CALIN C, TKAČIK GAŠPER: "Predicting bacterial promoter function and evolution from random sequences", INSTITUTE OF SCIENCE AND TECHNOLOGY AUSTRIA, vol. 11, 9 January 2022 (2022-01-09), pages e64543, XP093087133, DOI: 10.7554/eLife.64543 *
STORMO G. D., ZUO Z., CHANG Y. K.: "Spec-seq: determining protein-DNA-binding specificity by sequencing", BRIEFINGS IN FUNCTIONAL GENOMICS, OXFORD UNIVERSITY PRESS, OXFORD, UK, vol. 14, no. 1, 1 January 2015 (2015-01-01), Oxford, UK , pages 30 - 38, XP093087134, ISSN: 2041-2649, DOI: 10.1093/bfgp/elu043 *
ZHAO MEI, ZHOU SHENGHU, WU LONGTAO, DENG YU: "Model-driven promoter strength prediction based on a fine-tuned synthetic promoter library in Escherichia coli", BIORXIV, 1 July 2020 (2020-07-01), XP093087130, [retrieved on 20230929], DOI: 10.1101/2020.06.25.170365 *

Also Published As

Publication number Publication date
CN115094127A (zh) 2022-09-23

Similar Documents

Publication Publication Date Title
WO2023160163A1 (zh) 一种原位检测蛋白质与脱氧核糖核苷酸结合位置的方法
Casini et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast
Rapley et al. Molecular biology and biotechnology
Ravikumar et al. An orthogonal DNA replication system in yeast
US20170088845A1 (en) Vectors and methods for fungal genome engineering by crispr-cas9
CN110747187B (zh) 识别TTTV、TTV双PAM位点的Cas12a蛋白、植物基因组定向编辑载体及方法
Čuboňová et al. An archaeal histone is required for transformation of Thermococcus kodakarensis
WO2015144045A1 (zh) 包含两个随机标记的质粒库及其在高通量测序中的应用
CN112430586B (zh) 一种VI-B型CRISPR/Cas13基因编辑系统及其应用
CN113234701B (zh) 一种Cpf1蛋白及基因编辑系统
She et al. Genetic analyses in the hyperthermophilic archaeon Sulfolobus islandicus
CN116179512B (zh) 靶标识别范围广的核酸内切酶及其应用
US10036007B2 (en) Method of synthesis of gene library using codon randomization and mutagenesis
US20090111099A1 (en) Promoter Detection and Analysis
Höllerer et al. Ultradeep characterisation of translational sequence determinants refutes rare-codon hypothesis and unveils quadruplet base pairing of initiator tRNA and transcript
CN117210437A (zh) 两种基因编辑工具酶鉴定及其在核酸检测中的应用
Weir et al. PCR mutagenesis and gap repair in yeast
CN116286737B (zh) 无pam限制的核酸内切酶及其介导的基因编辑系统
Weeks et al. Fitness and Functional Landscapes of the E. coli RNase III Gene rnc
Agier et al. A versatile protocol to generate translocations in yeast genomes using CRISPR/Cas9
CN106636065B (zh) 一种全基因组高效基因区富集测序方法
Wadley et al. Nanopore sequencing for detection and characterization of phosphorothioate modifications in native DNA sequences
CN107475257B (zh) 高效启动表达外源蛋白的启动子样基因及其应用
Prykhozhij et al. Genome editing in zebrafish using high-fidelity Cas9 nucleases: Choosing the right nuclease for the task
CN107603979B (zh) 一种高效表达外源蛋白的启动子样基因及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928416

Country of ref document: EP

Kind code of ref document: A1