CN118255868A

CN118255868A - Type I humanized collagen and high-yield strain thereof

Info

Publication number: CN118255868A
Application number: CN202211676324.8A
Authority: CN
Inventors: 张翀; 余心宇; 邢新会
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2024-06-28

Abstract

The application relates to I-type humanized collagen with high secretion, engineering bacteria for secreting and producing the I-type humanized collagen, and a screening method and application thereof. The fermentation yield of the humanized collagen can be further improved through the humanized collagen and the engineering bacteria, and the extraction difficulty is reduced.

Description

Type I humanized collagen and high-yield strain thereof

Technical Field

The application relates to the field of protein synthesis, in particular to collagen with improved secretion and engineering secreted protein high-yield bacteria.

Background

To date, recombinant proteins have played an indispensable role in our lives, and various products thereof, including pharmaceutical proteins (polypeptide hormones, cytokines, monoclonal antibodies, etc.), protein polymers (collagen, elastin, etc.), industrial enzymes (xylanases, starch fats, etc.), are widely used in various fields of medical treatment, foods, chemical industry, scientific research reagents, etc.

At present, the expression system of the recombinant protein can be divided into three types of microorganisms, animals and plants according to different host cells, wherein the microbial fermentation system has the advantages of low cost, short period, easier culture and easier commercial production. The vast number of species of microorganisms provides unlimited possibilities for the development of novel protein expression systems.

Along with the increasing demand of people, people need to reform the microbial engineering bacteria so as to further improve the fermentation yield of the recombinant protein and reduce the difficulty of extraction and separation.

In addition, the recombinant humanized collagen is a full-length or partial amino acid sequence fragment encoded by a specific type gene of human collagen prepared by a DNA recombination technology, or a combination containing a functional fragment of human collagen. In recent years, along with the development of medical and health care industries, the use amount of recombinant human collagen is continuously increased. Improving the secretion can further reduce the separation difficulty, simplify the extraction steps and improve the production efficiency, and is also a problem to be solved in the industry at present.

Disclosure of Invention

To further enhance the yield of protein fermentation, in one aspect the present application provides a type I humanized collagen comprising 88 Gly-X-Y motifs of the type I human collagen alpha 1 chain, wherein X and Y are selected from any amino acid residue. In some embodiments, the type I humanized collagen consists of 88 Gly-X-Y motifs of the type I human collagen a 1 chain. In some embodiments, the amino acid residue is a natural amino acid residue. In some embodiments, the amino acid residue may be an unnatural amino acid, e.g., a natural amino acid derivative that contains a keto group, an aldehyde group, an azide group, an alkyne group, an alkenyl group, an amide group, a nitro group, a phosphate, a sulfonate, or the like. In some embodiments, the type I humanized collagen comprises the amino acid sequence from position 347 to 610 of the type I human collagen a 1 chain, wherein the amino acid position numbers correspond to SEQ ID NO:1 are consistent. In some embodiments, the type I humanized collagen further comprises one or more amino acid residues between one or more Gly-X-Y motifs. In some embodiments, the amino acid sequence of the type I humanized collagen comprises the amino acid sequence set forth in SEQ ID NO:2 or SEQ ID NO:2, and a conservative substitution variant thereof. In some embodiments, the amino acid sequence of the type I humanized collagen is as set forth in SEQ ID NO:2 or is SEQ ID NO:2, and a conservative substitution variant thereof.

In a second aspect of the present application, a biomaterial useful for efficient production of collagen is provided. In some embodiments, the biological material comprises:

1) A nucleic acid molecule encoding the collagen of the first aspect;

2) An expression cassette comprising the nucleic acid molecule of 1);

3) A recombinant vector comprising the nucleic acid molecule of 1);

4) Phage, virus, bacteria comprising 1) said nucleic acid molecule, or 2) said expression cassette, or 3) said recombinant vector.

In some embodiments, the nucleic acid molecule is RNA, DNA, or a hybrid of RNA and DNA. In some embodiments, the nucleic acid molecule comprises a nucleotide sequence as set forth in SEQ ID NO:3 or a synonymous mutant thereof. In some embodiments, the nucleic acid molecule is as set forth in SEQ ID NO:3 or a synonymous mutant thereof. In some embodiments, the nucleic acid molecule comprises a nucleotide sequence that hybridizes to SEQ id no:3, and a polynucleotide sequence having 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% or more sequence identity. In some embodiments, the nucleic acid molecule is as set forth in SEQ ID NO:3 or a synonymous mutant thereof. In some embodiments, the nucleic acid molecule comprises SEQ ID NO:3 or a polynucleotide sequence having 80%, 85%, 90%, 95%, 96%, 97%, 98% or more than 99% sequence identity thereto.

In some embodiments, the biological material is a bacterium selected from the group consisting of: corynebacterium (Corynebacterium), escherichia (Escherichelloa), bacillus (Bacillus), brevibacterium (Brevibacterium), mycobacterium (Mycobacterium), or Actinomycetes (Actinomycetes). In some embodiments, the bacterium is corynebacterium glutamicum (Corynebacterium glutamicum). In some embodiments, the bacteria are derived from corynebacterium glutamicum ATCC13032 or progeny thereof. In some embodiments, the bacterium has a cosR and/or rshA gene deletion or a cosR and/or rshA gene expression inhibition. In some embodiments, the bacterium has and only has cosR or rshA gene deletion, or cosR or rshA gene expression inhibition, as compared to its standard strain. In some embodiments, the bacterium has and only has cosR and rshA gene deletions, or cosR and rshA gene expression inhibition, as compared to its standard strain. In some embodiments, the bacterium has only cosR gene deletions and only rshA gene expression repressions compared to its standard strain. In some embodiments, the bacterium has only rshA gene deletions and only cosR gene expression repressions compared to its standard strain.

In some embodiments, the biological material is an expression cassette comprising at least a promoter, a nucleoprotein body binding site (RBS), and a secretory signal peptide coding sequence. In some embodiments, the recombinant vector is a secretory expression vector comprising a promoter, an RBS, and a secretory signal peptide coding sequence. In some embodiments, the secretory signal peptide is selected from the group consisting of: SPcspB, SPcspA, SPporB, SPcg1514, SPtorA, SPompA and SPyncM. In some embodiments, the RBS is selected from any RBS in the biobrick registry in iGEM. In some embodiments, the RBS is selected from: biobrick in iGEM are numbered bba_k2753054, bba_b0034, bba_k2560010, bba_k2560016, bba_k3202021 and bba_b 0030. In some embodiments, the promoter is selected from the group consisting of: PL10, PL26, PI16, PI51, PH30, and PH36.

In a third aspect the application provides an engineered bacterium having a cosR and/or rshA gene deletion or whose cosR and/or rshA gene expression is inhibited. In some embodiments, the engineered bacterium has and only has cosR or rshA gene deletion, or cosR or rshA gene expression inhibition, as compared to its standard strain. In some embodiments, the engineered bacteria have and only have cosR and rshA gene deletions, or cosR and rshA gene expression inhibition, as compared to their standard strain. In some embodiments, the engineered bacteria have only cosR gene deletions and only rshA gene expression repressions compared to their standard strain. In some embodiments, the engineered bacteria have only rshA gene deletions and only cosR gene expression repressions compared to their standard strain. In some embodiments, the engineered bacteria are selected from the group consisting of: corynebacterium (Corynebacterium), escherichia (Escherichelloa), bacillus (Bacillus), brevibacterium (Brevibacterium), mycobacterium (Mycobacterium), or Actinomycetes (Actinomycetes). In some embodiments, the engineered bacterium is corynebacterium glutamicum (Corynebacterium glutamicum). In some embodiments, the engineered bacteria are derived from corynebacterium glutamicum ATCC13032 or progeny thereof. In some embodiments, the engineered bacterium comprises a secreted expression vector comprising a promoter, a nucleoprotein body binding site (RBS), and a secreted signal peptide coding sequence. In some embodiments, the secretory signal peptide is selected from the group consisting of: SPcspB, SPcspA, SPporB, SPcg1514, SPtorA, SPompA and SPyncM. In some embodiments, the RBS is selected from any RBS in the biobrick registry in iGEM. In some embodiments, the RBS is selected from: biobrick in iGEM are numbered bba_k2753054, bba_b0034, bba_k2560010, bba_k2560016, bba_k3202021 and bba_b 0030. In some embodiments, the promoter is selected from the group consisting of: PL10, PL26, PI16, PI51, PH30, and PH36.

In a fourth aspect, the application provides a method of screening for secreted protein high yielding organisms comprising:

1) Obtaining a monoclonal colony;

2) Detecting the expression level of cosR and/or rshA genes at the mRNA level and/or protein level in the monoclonal colonies; and

3) When a decrease in cosR and/or rshA gene expression level in the monoclonal colony compared to the standard strain, or no expression of cosR and/or rshA is detected, the bacteria in the monoclonal colony are identified as secreted protein high producing bacteria.

In some embodiments, the secreted protein high yielding strain is selected from the group consisting of: corynebacterium (Corynebacterium), escherichia (Escherichelloa), bacillus (Bacillus), brevibacterium (Brevibacterium), mycobacterium (Mycobacterium), or Actinomycetes (Actinomycetes). In some embodiments, the secreted protein high-yielding strain is corynebacterium glutamicum (Corynebacterium glutamicum). In some embodiments, the secreted protein high-yielding strain is derived from corynebacterium glutamicum ATCC13032 or progeny thereof.

In some embodiments, the secreted protein high-yielding strain occurs naturally. In some embodiments, the secreted protein high-yielding strain is physically or chemically mutagenized. In some embodiments, the physical mutagenesis is irradiation. In some embodiments, the secreted protein high-producing bacterium is further subjected to a knock-out or knock-down operation on cosR and/or rshA genes prior to screening.

In a fourth aspect, the present application provides a method for fermentatively producing a protein of interest, comprising using the biological material of the second aspect or the engineered bacterium of the third aspect. In some embodiments, the protein of interest is VHH, PINP, or a humanized collagen type I of the first aspect described above.

The preferred embodiments of the present application have been described in detail above, but the present application is not limited thereto. Within the scope of the technical idea of the application, a number of simple variants of the technical solution of the application are possible, including combinations of the individual technical features in any other suitable way, which simple variants and combinations should likewise be regarded as being disclosed by the application, all falling within the scope of protection of the application. Aspects and embodiments of the application described herein include aspects and embodiments that "comprise," consist of, "and" consist essentially of … ….

Drawings

FIG. 1 shows the results of DNA gel electrophoresis of 32 different truncated lengths of cgCOL A1 gene fragments. The number of "Gly-X-Y" triamino motifs contained in the encoded protein corresponding to the lane DNA is marked below each lane.

FIG. 2 shows the results of a screening of Corynebacterium glutamicum containing a library of truncated cgCOL A1 gene fragments. Wherein p19HXT refers to a control strain containing a p19HXT empty vector. The figure analysis compares protein secretion levels of 64 strains in the library compared to control strains.

FIG. 3 shows the VHH secretory expression levels of inactivated strains of 24 candidate genes. Where NO_sgRNA refers to the control strain containing pEC-dCAS9-subs empty vector and NT_sgRNA refers to the control strain containing non-targeted sgRNA vector. The figure analysis compares the VHH protein production differences of inactivated strains of 24 candidate genes compared to control strains. The dark bar graph shows the results of the biarsen-tetracysteine reaction of the fermentation supernatants of the different strains, and the light bar graph shows the VHH protein secretion yield (DCW refers to dry cell weight) of the different strains.

Panel A in FIG. 4 shows the VHH secretion expression levels of cosR single-knockout strains, rshA single-knockout strains, gluB single-knockout strains, cosR and gluB double-knockout strains, cosR and rshA double-knockout strains, rshA and gluB double-knockout strains, cosR, rshA and gluB triple-knockout strains, and panel B shows the cell growth status of these knockout strains. Wherein wt refers to a wild-type control strain.

Panel A of FIG. 5 shows the results of polyacrylamide gel electrophoresis (SDS-PAGE) of cosR and rshA gene knockout strain CgΔ cosR Δ rshA compared to PINP secretion yield of non-knockout strain CgWT. 1. Lanes 2 and 7 and 8 are fermentation supernatants of two strains CgWT/p19HPH, two strains CgDelta cosR Delta rshA/p19HPH, cgWT/p19HPH, and two strains CgDelta cosR Delta rshA/p19 HPH. Panel B of FIG. 5 shows Western blot (WesternBlot) results comparing PINP secretion yields of CgDelta cosR Delta rshA and CgWT bacteria. 1. Lanes 2 and 7 and 8 are the fermentation supernatants of two CgΔ cosR Δ rshA/p19HPH strains, respectively, and of two CgWT/p19HPH strains, respectively, and of two CgΔ cosR Δ rshA/p19HPH strains, respectively, and of an ultrafiltration tube.

Panel A of FIG. 6 shows the SDS-PAGE results of the cosR and rshA gene knockout bacteria CgΔ cosR Δ rshA secreting CC 1. 1. Lanes 2, 3,4 are the fermentation supernatants of four CgΔ cosR Δ rshA/p19HCC1H, and lanes 5,6,7,8 are the intracellular proteins of the four CgΔ cosR Δ rshA/p19HCC1H strains (disrupted cell supernatants). Panel C of FIG. 6 shows the result of WesternBlot secretion of cosR and rshA gene knockout CgΔ cosR Δ rshA to produce CC 1. 1. Lanes 2, 3,4 are the fermentation supernatants of four CgΔ cosR Δ rshA/p19HCC1H, and lanes 5,6,7,8 are the intracellular proteins of the four CgΔ cosR Δ rshA/p19HCC1H strains (disrupted cell supernatants). FIG. 6B is a graph showing SDS-PAGE results of cosR and rshA gene knockout bacteria CgΔ cosR.DELTA. rshA secreting C1L 2T. 1. Lanes 2, 3 and 4 are the fermentation supernatants of four strains CgDelta cosR Delta rshA/p19HL2H, and lanes 5,6,7 and 8 are the intracellular proteins (disrupted cell supernatants) of the four strains CgDelta cosR Delta rshA/p19HL 2H. FIG. 6D is a graph showing the result of WesternBlot secretion of cosR and rshA gene knockout CgΔ cosR Δ rshA to produce C1L 2T. 1. Lanes 2, 3 and 4 are the fermentation supernatants of four strains CgDelta cosR Delta rshA/p19HL2H, and lanes 5,6,7 and 8 are the intracellular proteins (disrupted cell supernatants) of the four strains CgDelta cosR Delta rshA/p19HL 2H.

Detailed description of the preferred embodiments

In order to further improve the yield of recombinant proteins such as collagen, the application provides I-type humanized collagen with improved secretion, secretory protein high-yield bacteria, a method for screening the secretory protein high-yield bacteria, and a method for producing recombinant proteins by using the secretory protein high-yield bacteria.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the technology belongs. All technical and patent disclosures cited herein are incorporated herein by reference in their entirety. Unless otherwise indicated, those skilled in the art will employ routine techniques of tissue culture, immunology, molecular biology, microbiology, cell biology, and recombinant DNA within the skill of the art. See, e.g., sambrook and Russell edition (2001) Molecular Cloning: A Laboratory Manual, 3 rd edition.

Terminology

As used herein, a "conservatively substituted variant" of a protein, polypeptide, or amino acid sequence refers to a protein in which one or more amino acid residues have been substituted with amino acids without altering the overall conformation and function of the protein or enzyme, including, but not limited to, amino acids in the amino acid sequence in the "conservatively substituted" parent protein. Thus, the similarity of two proteins or amino acid sequences of similar function may be different. For example, 70% to 99% similarity (identity) based on the megasign algorithm. "conservatively substituted variants" also include polypeptides or enzymes having more than 60% amino acid identity, more preferably more than 75%, and most preferably more than 85% and even more preferably more than 90% as determined by BLAST or FASTA algorithms, and having the same or substantially similar properties or functions as the native or parent protein or enzyme. As used herein, "conservative substitutions" may be polar versus polar amino acids, such as glycine (G, gly), serine (S, ser), threonine (T, thr), tyrosine (Y, tyr), cysteine (C, cys), asparagine (N, asn), and glutamine (Q, gln); nonpolar to nonpolar amino acids such as alanine (a, ala), valine (V, val), tryptophan (W, trp), leucine (L, leu), proline (P, pro), methionine (M, met), phenylalanine (F, phe); acidic versus acidic amino acids, such as aspartic acid (D, asp), glutamic acid (E, glu); basic para-basic amino acids such as arginine (R, arg), histidine (H, his), lysine (K, lys); charged amino acids versus charged amino acids such as aspartic acid (D, asp), glutamic acid (E, glu), histidine (H, his), lysine (K, lys), and arginine (R, arg)); hydrophobic versus hydrophobic amino acids such as alanine (a, ala), leucine (L, leu), isoleucine (I, ile), valine (V, val), proline (P, pro), phenylalanine (F, phe), tryptophan (W, trp) and methionine (M, met).

As used herein, a "synonymous mutant" of a nucleic acid molecule or polynucleotide sequence encoding a protein refers to a nucleic acid molecule or polynucleotide sequence obtained by substitution of one or more codons in the nucleic acid molecule or polynucleotide sequence with codons encoding the same amino acid as the codons.

As used herein, "derived from" a bacterium means that the bacterium was the bacterium prior to engineering. For example, "the secretory protein-producing bacterium is derived from Corynebacterium glutamicum ATCC13032" means that the secretory protein-producing bacterium is obtained by mutating or transducing Corynebacterium glutamicum ATCC 13032.

As used herein, the term "gene deletion" refers to the absence of expression of the gene or the absence of the function of the wild-type gene product as a result of naturally or artificially introduced mutations. Wherein the mutation includes, but is not limited to, a deletion, insertion, or substitution of a sequence fragment, or a deletion, insertion, or substitution of an individual nucleotide.

As used herein, when describing a gene as "expression inhibited," it is meant that the expression of the gene is down-regulated using means such that the level of transcription and/or translation of the gene is reduced without altering the sequence of the gene in the genome, technical means for "expression inhibition" include, but are not limited to: RNAi, CRISPRi, etc.

As used herein, the term "knockout" or "knockout" is a technical means or procedure for the purpose of gene deletion, including, but not limited to, introduction of DNA fragments into a cell by infection, transfection, or transformation, which can undergo homologous recombination with the gene sequence to be knocked out, introduction of a transposon system, or introduction of a gene editing tool, etc., to alter the genomic sequence; or by changing the genomic sequence by irradiation or chemical drugs, etc.

As used herein, the term "knockdown", i.e., "knockdown" (knock down) is a technical means or process aimed at inhibiting gene expression, including but not limited to, introducing into a cell a substance such as a nucleic acid or a functional protein by infection, transfection, or transformation, and down-regulating the level of transcription or translation of a gene of interest without altering the genomic sequence; or down-regulating the level of transcription or translation of the gene of interest by chemical agents.

"Corynebacterium glutamicum (Corynebacterium glutamicum)" is a common producer in the amino acid fermentation industry and belongs to the genus Corynebacterium of the class Actinobacillus under gram-positive eubacteria in classification. Standard strains of Corynebacterium glutamicum include ATCC21493, ATCC13032, and the like. The genome and related functions of ATCC13032 are described in detail, for example, in Kalinowski J et al ,The complete Corynebacterium glutamicumATCC 13032genome sequence and its impact on the production of L-aspartate-derived amino acids and vitamins.JBiotechnol.2003Sep 4;104(1-3):5-25.doi:10.1016/s0168-1656(03)00154-8.PMID:12948626.

As used herein, the term "standard strain", also known as "model strain", refers to a strain deposited by the national or international strain deposit institution, whose genetic properties are confirmed and guaranteed and which is traceable, being preserved in a purely live (reproducible) state and its progeny whose functional state is consistent. When reference is made to a standard strain of a strain, reference is made to a standard strain of the same species as the strain. Wherein the "functional status agreement" includes: under the same culture conditions, when the same protein is expressed using the same secretory expression vector, the amount of the protein secreted by the "functionally identical offspring" is not significantly different from that of the protein secreted by the standard strain, for example, when detected using a western blot, spectrophotometer, or enzyme-labeled instrument, etc., the amount of the protein secreted by the "functionally identical offspring" is judged to be indistinguishable from that of the protein secreted by the standard strain. The "functional status coincidence" also includes that, in "its functional status coincidence offspring", the transcription level and/or expression level of the autologous gene thereof, for example cosR or rshA, coincides with that of the standard strain under the same culture conditions, for example, when detected using agarose gel electrophoresis or q-PCR, the transcription level of the autologous gene of the "functional status coincidence offspring" can be judged to be no difference from that of the autologous gene of the standard strain, or, for example, when detected using an agar Westernblot, spectrophotometer, enzyme-labeled instrument, or the like, the expression level of the autologous gene of the "functional status coincidence offspring" can be judged to be no difference from that of the autologous gene of the standard strain. As used herein, unless specifically indicated, "progeny" of a standard strain refers to the progeny that are directly amplified by division of the standard strain, or the progeny of the progeny, without engineering.

As used herein, a "secreted protein high-yielding strain" is a strain that has greater protein secretion capacity relative to its standard strain. Under the same culture conditions, when the same protein is expressed using the same secretion type expression vector, the amount of the protein secreted by the "secretion protein high yield strain" is significantly increased compared to the amount of the protein secreted by the standard strain, for example, when detected using Western blot, the band of the protein in the culture broth of the "secretion protein high yield strain" is significantly deeper and/or wider than the band of the protein in the culture broth of the standard strain; or, for example, the amount of the protein secreted by the "secreted protein high-producing strain" is quantitatively increased by 5% or more, 10% or more, 15% or more, 20% or more, 25% or more, 30% or more, 40% or more, 50% or more, 60% or more, 70% or more, 80% or more, 90% or more, 100% or more, 200% or more, as measured by a light absorption photometry method or an enzyme-labeled method or the like, as compared with the amount of the protein secreted by the standard strain.

As used herein, the term "cosR" may refer to cosR encoding gene or CosR protein, or a homologous protein having the same function as CosR or encoding gene thereof. CosR, which is a type MarR transcription regulator, is fully called as a Corynebacterium glutamicum oxidative stress regulator (C.glutamicum oxidant-sensing regulator) and can sense peroxide stress, so that the oxidative stress resistance of the Corynebacterium glutamicum is mediated. CosR homologous genes are found in bacteria of the genera Escherichia, staphylococcus, bacillus, brevibacterium, mycobacterium, actinomyces, etc. Exemplary CosR is described in Si,Meiru et al."CosR is an oxidative stress sensing a MarR-type transcriptional repressor in Corynebacterium glutamicum."The Biochemical journal vol.475,24 3979-3995.19Dec.2018,doi:10.1042/BCJ20180677. An exemplary coding sequence for CosR proteins is shown in SEQ ID NO:24 or SEQ ID NO:24 homologous sequences.

As used herein, the term "rshA" may refer to rshA encoding gene or RshA protein, or a homologous protein having the same function as RshA or encoding gene thereof. RshA is a redox-sensitive anti-sigma factor that responds to oxidative and thermal stress, releasing sigma factor H and thus regulating a range of gene expression. rshA homologous genes are found widely in coryneform, mycobacteria, rhodococcus and other bacteria. . An exemplary coding sequence for RshA proteins is shown in SEQ ID NO:25 or SEQ ID NO:25 homologous sequences.

The Gly-X-Y motif is repeatedly appeared in large quantity in the primary structure of the type I collagen, and is a typical characteristic of the primary structure of the type I collagen. The 'Gly-X-Y' motif is formed by sequentially connecting three amino acids, the first amino acid at the N end of the motif is Gly (glycine), the middle amino acid X and the amino acid Y at the C end can be any natural or artificial amino acid, and the X and the Y can be the same or different. In the secondary structure of natural collagen, gly-X-Y motif is a basic unit for forming triple helix structure.

As used herein, the term "type I humanized collagen" refers to a polypeptide or protein comprising more than two repeated Gly-X-Y motifs in human type I collagen.

As used herein, the term "secretability" is a property of a protein or polypeptide itself that refers to the ease with which the protein or polypeptide is secreted out of the cell membrane by a cell, preferably a prokaryotic cell, when the protein or polypeptide is expressed by the cell. For example, when protein A and protein B are expressed under the same conditions, respectively, using the same strain, if the content of protein A in the strain fermentation supernatant is higher than the content of protein B in the strain fermentation supernatant after a certain culture time, for example, after 24 hours of culture, it is considered that protein A has higher secretion properties than protein B.

As used herein, "nucleic acid molecule" refers to a polynucleotide molecule or nucleic acid fragment comprising any two or more nucleotide sequences. When describing an expression cassette comprising a "nucleic acid molecule", the nucleic acid molecule refers to a fragment in the expression cassette. As used herein, an "expression cassette" is a segment of a polynucleotide that can be used to transcribe or express a gene of interest in a cell, which when the "expression cassette" is in its meaning as conventionally understood in the art is a segment of a DNA molecule, and which comprises at least a promoter, one or more open reading frames, and a terminator. The coding sequence of the fragment of interest may be in the same open reading frame or may be split into two or more portions that are in two or more reading frames. The "expression cassette" of the present application also encompasses fragments of an RNA molecule useful for expressing a protein of interest, which RNA molecule may be a mature mRNA or a pre-mRNA, and which "expression cassette" comprises an RNA coding sequence for the protein of interest.

"Identity" of sequences as used herein refers to the degree of similarity between amino acid sequences or between nucleotide sequences as determined by sequence alignment software, such as BLAST. "percent identity" such as 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% identity refers to the degree of similarity between amino acid sequences or between nucleotide sequences, as determined by sequence alignment, being 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%. For example, by introducing gaps or the like, it is possible to determine the ratio of the number of positions having the same base or amino acid residue to the total number of positions after two sequences have the same residue at as many positions as possible. The percentage of "identity" may be determined using software programs known in the art. Preferably, the alignment is performed using default parameters. One preferred alignment program is BLAST. Preferred programs are BLASTN and BLASTP. Details of these programs can be found at the following internet addresses: ncbi.nlm.nih.gov/cgi-bin/BLAST.

As used herein, "complementarity" of nucleic acids refers to the ability of one nucleic acid to form hydrogen bonds with another nucleic acid through traditional Watson-Crick base pairing. Percent complementarity means the percentage of residues in a nucleic acid molecule that can form hydrogen bonds (i.e., watson-Crick base pairing) with another nucleic acid molecule (e.g., about 50%,60%,70%,80%,90% and 100% complementary for about 5,6, 7, 8, 9, 10 of 10, respectively). "fully complementary" means that all consecutive residues of a nucleic acid sequence form hydrogen bonds with the same number of consecutive residues in a second nucleic acid sequence. As used herein, "substantially complementary" refers to a degree of complementarity of any one of at least about 70%,75%,80%,85%,90%,95%,97%,98%,99% or 100% over a region of about 40, 50, 60, 70, 80, 100, 150, 200, 250 or more nucleotides, or to two nucleic acids that hybridize under stringent conditions. For a single base or a single nucleotide, pairing a with T or U, C with G or I is referred to as complementary or matching, and vice versa, according to Watson-Crick base pairing rules; and the other base pairing is referred to as non-complementary or non-matching.

Humanized collagen

The application provides I-type humanized collagen with high secretion. The number of Gly-X-Y motifs optimally synthesized and secreted using prokaryotic cells was selected by comparison of collagen chains containing different numbers of Gly-X-Y motifs. As a result, the inventors of the present application found that when 88 Gly-X-Y motifs are included, collagen chains can have optimal secretion. Thus, in some embodiments, the highly secreted type I humanized collagen of the present application comprises 88 Gly-X-Y motifs. In some embodiments, the highly secreted type I humanized collagen of the present application consists of 88 Gly-X-Y motifs. In some embodiments, the 88 Gly-X-Y motifs are consecutive 88 Gly-X-Y motifs on the type I human collagen a1 chain. In this embodiment, while the 88 Gly-X-Y motifs contained in the type I humanized collagen are contiguous in the type I human collagen alpha 1 chain, they may themselves contain one or more additional amino acids between each Gly-X-Y motif from the type I human collagen alpha 1 chain.

As used herein, the "type I human collagen alpha 1 chain" is encoded by the Gene COL1A1 (Gene ID: 1277). Exemplary type I human collagen a1 chain amino acid sequences are as set forth in SEQ id no: 1.

In some embodiments, the 88 consecutive Gly-X-Y motifs correspond to amino acids 347 to 610 of the human collagen type I α1 chain, wherein the amino acid numbers 374 and 1102 are identical to the amino acid numbers of SEQ ID NOs: 1, and the amino acid sequences shown in 1 are numbered identically. In some embodiments, the amino acid sequence of the 88 consecutive Gly-X-Y motifs is as set forth in SEQ ID NO:2, or to SEQ ID NO:2, has 75%, 80%, 85%, 87%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or more than 99% sequence identity.

The type I humanized collagen of the present application can be prepared by DNA recombinant techniques, and the yield in a particular strain can be further improved by codon optimization for that particular strain. For the above SEQ ID NO:2, and obtaining the nucleotide sequence shown as SEQ ID NO:3, and a polynucleotide sequence shown in 3. SEQ ID NO:3 are particularly suitable for expressing the humanized collagen of the application in corynebacterium glutamicum. To accommodate different host strains, one skilled in the art can obtain different codon-optimized polynucleotide sequences from the amino acid sequences of the type I humanized collagen provided by the present application, and nucleic acid molecules comprising these polynucleotide sequences are also contemplated as falling within the scope of the present application. The nucleotide molecules may be RNA, DNA or hybrid molecules of RNA and DNA according to the expression requirements. The nucleic acid molecule may be a circular, linear, double-stranded, single-stranded molecule. The nucleic acid molecule may also comprise a stem loop, hairpin structure, or comprise double-and single-stranded structures. The application also provides a cell genome, such as a prokaryotic cell genome, comprising the humanized collagen coding sequence. The application further provides a plasmid containing the humanized collagen coding sequence, wherein the plasmid is a secretion type expression vector.

Secretion type expression vector

The present application provides secreted expression vectors. As used herein, a "secretory expression vector" is a vector adapted to be expressed in a prokaryotic cell and to cause the prokaryotic cell to secrete the foreign protein. A "secretory expression vector" comprises a coding sequence for a secretory signal peptide, and a promoter and RBS. The secretory signal peptide may direct the foreign protein outside the prokaryotic cell envelope. The coding sequence for the secretory signal peptide is typically located after the start codon and is an RNA region encoding a hydrophobic amino acid sequence. After the localization of the secretory signal peptide-directed protein is completed, it is usually cleaved off by a signal peptidase, but the present application does not exclude the case where the secretory signal peptide is not cleaved off. In some embodiments, the secretory signal peptide is located N-terminal to the foreign protein. In some embodiments, the secretory signal peptide is located at the C-terminus of the exogenous protein. In some embodiments, the secretory signal peptide consists of 15 to 30 amino acids. In some embodiments, the secretory signal peptide comprises: a positively charged N-terminus (basic amino terminus); an intermediate hydrophobic sequence, based on neutral amino acids, capable of forming a segment of alpha helix; and a negatively charged C-terminal, containing small amino acids, called signal sequence cleavage sites or processing regions. Signal peptides of any prokaryotic secreted protein may be used in the present application protocols, including, but not limited to, cspB signal peptide (SPcspB), cspA signal peptide (SPcspA), porB signal peptide (SPporB), cg1514 signal peptide (SPcg 1514), torA signal peptide (SPtorA), ompA signal peptide (SPompA), yncM signal peptide (SPyncM). Wherein, the amino acid sequence and polynucleotide sequence of the exemplary SPcspB are shown as SEQ ID NO:6 and SEQ ID NO: shown at 7.

A "nucleoprotein body binding site" or "RBS" is a small base sequence in a prokaryotic mRNA, typically a small purine-rich sequence located about 6 nucleotide positions upstream of the start codon. The RBS incorporates ribosomes in conjunction with the initiation codon to initiate translation and control the accuracy and efficiency of translation initiation. In the present application, RBS includes not only sequences rich in purine binding to ribosomes in mRNA sequences, but also sequences around which translation initiation accuracy and efficiency are affected. "nucleoprotein binding site" or "RBS" may also be used to refer to the DNA coding sequence of "nucleoprotein binding site" or "RBS" when in the DNA context. In some embodiments of the application, the RBS is selected from the biobrick numbered bba_k2753054, bba_b0034, bba_k2560010, bba_k2560016, bba_k3202021, and bba_b0030 elements of iGEM. In some embodiments, the RBS is as set forth in the end sequence listing SEQ ID NO: shown at 5. Wherein "biobrick number of iGEM", i.e., the number of standardized biological elements as embodied in the biobrick registry of iGEM. iGEM, international genetic engineering machine design, is an International academic competition in the field of synthetic biology created by the Massachusetts, and is known as International GENETICALLY ENGINEERED MACHINE Competition. Methods of searching and using the biobrick registry of iGEM are known in the art.

When the recombinant protein is expressed in a prokaryotic cell using the secretory expression vector of the present application, any promoter that can initiate transcription of a gene in the prokaryotic cell may be used for the secretory expression vector. For example, in Corynebacterium glutamicum, exemplary promoters include P _L10、P_L26、P_I16、P_I51、P_H30 and P _H36, among others. For information on P _L10、P_L26、P_I16、P_I51、P_H30 and P _H36, see Yim,Sung Sun et al."Isolation of fully synthetic promoters for high-level gene expression in Corynebacterium glutamicum."Biotechnology and bioengineering vol.110,11(2013):2959-69.doi:10.1002/bit.24954. An exemplary coding sequence for P _H36 is shown in SEQ ID NO. 4.

Exemplary secretory expression vectors may further comprise tag sequences such as a coding sequence for a tetracysteine tag, a coding sequence for a His tag, and the like. The tag sequence can be used for separating target protein or quantitatively detecting target protein peptide. In some embodiments, the secretory expression vector comprises the sequence set forth in SEQ ID NO:8 (P19 HXT) or SEQ ID NO:9 (P19 HXH). Wherein P19HXT uses a tetracysteine tag and P19HXT uses a His tag.

In some embodiments, the secretory expression vector comprises an inserted exogenous protein-encoding sequence of interest and is useful for expressing the exogenous protein of interest. In some embodiments, the exogenous protein of interest is a humanized recombinant protein. In some embodiments, the exogenous protein of interest is a humanized collagen. In some embodiments, the exogenous protein of interest is an antibody or antigen-binding fragment thereof. In some embodiments, the exogenous protein of interest is a VHH.

Engineered bacteria

The application also provides an engineering bacterium (engineering bacterium). By CRISPRi treatment of various genes of various common fermentation engineering bacteria, a bacterial library containing different gene expression inhibition is established. And finally, 24 genes which can improve the yield of the secreted protein of the engineering bacteria after inactivation are locked through bioinformatics analysis and experimental screening.

Thus, the engineering bacterium of the present application has one or more gene deletions or one or more gene expression repression, wherein the one or more genes are selected from ：cosR、efp、NCgl0613、NCgl1167、NCgl0448、NCgl2722、blt2924、NCgl2533、whcE、hisH、cobA、rshA、blt0956、NCgl2791、NCgl2047、NCgl1092、ilvA、ilvD、pepA、thrB、ilvA、NCgl2644、gluB、NCgl2227、NCgl1174、NCgl0777、urtC、NCgl0210、NCgl0607、NCgl0875、NCgl1034、NCgl0484. of the above 24 genes, and inactivation or expression repression of cosR gene or rshA gene, relative to inactivation or expression repression of other individual genes, can result in further improvement of secreted protein yield of the engineering bacterium.

In order to explore whether inactivation of cosR gene and rshA gene has a synergistic effect in improving the yield of secreted protein of engineering bacteria, the inventors of the present application have also studied Corynebacterium glutamicum as an example. The results show that corynebacterium glutamicum from which cosR gene and rshA gene were knocked out has higher secreted protein yield for either VHH, PINP or type I humanized collagen. Those skilled in the art will appreciate that cosR gene and rshA gene have homologous proteins in many bacteria, and thus, the aim of increasing the yield of secreted proteins can be achieved by knocking out or knocking down cosR homologous protein gene and/or rshA homologous protein gene in other engineering bacteria having cosR gene and rshA homologous protein.

Thus, in the present application, the engineering bacteria are selected from the group consisting of bacteria of the genus Corynebacterium (Corynebacterium), the genus Escherichia (Escherichelloa), the genus Bacillus (Bacillus), the genus Brevibacterium (Brevibacterium), the genus Mycobacterium (Mycobacterium), and the genus Actinomyces. Representative bacteria of the genus Corynebacterium include, for example, corynebacterium glutamicum (Corynebacterium glutamicum). Representative bacteria of the genus Escherichia include, for example, escherichia coli (ESCHERICHIA COLI). Representative bacteria of the genus Bacillus include, for example, bacillus subtilis (Bacillus subtilis).

In some embodiments, the engineered bacterium is engineered from a standard strain, e.g., by knocking out or knocking down one or more genes ：cosR、efp、NCgl0613、NCgl1167、NCgl0448、NCgl2722、blt2924、NCgl2533、whcE、hisH、cobA、rshA、blt0956、NCgl2791、NCgl2047、NCgl1092、ilvA、ilvD、pepA、thrB、ilvA、NCgl2644、gluB、NCgl2227、NCgl1174、NCgl0777、urtC、NCgl0210、NCgl0607、NCgl0875、NCgl1034、NCgl0484. selected from the group consisting of a standard strain of corynebacterium glutamicum ATCC13032 or progeny thereof.

In some embodiments, the engineered bacterium further comprises an exogenously introduced nucleic acid encoding a protein of interest. In some embodiments, the protein-encoding nucleic acid of interest may be inserted into the genome of the engineered bacterium. In some embodiments, the protein-encoding nucleic acid of interest is a free nucleic acid located in the cytoplasm of the engineered bacterium. In some embodiments, the nucleic acid encoding the protein of interest may be expressed in the engineered bacterium. In some embodiments, the engineered bacterium does not express the protein of interest, but can be used to replicate the free nucleic acid. In some embodiments, the episomal nucleic acid is a plasmid. In some embodiments, the free nucleic acid is a secretory expression vector.

Screening method

The application provides a method for screening secretory protein high-yield bacteria. The screening can be carried out on naturally occurring bacteria or bacteria subjected to irradiation, drug or genetic engineering means to generate mutation. Wherein the genetic engineering means includes, but is not limited to, knockout or knockout manipulation.

Wherein the bacteria are selected from the group consisting of Corynebacterium (Corynebacterium), escherichia (Escherichelloa), bacillus (Bacillus), brevibacterium (Brevibacterium), mycobacterium (Mycobacterium), and Actinomyces (Actinomyces). For example, in some embodiments, the bacterium is corynebacterium glutamicum. In some embodiments, the bacterium is corynebacterium glutamicum standard strain ATCC13032 or progeny thereof.

The screening method comprises the following steps:

1) Obtaining a monoclonal colony; any of these methods in which monoclonal colonies are available, such as by plating after dilution, are used to obtain monoclonal colonies.

2) The expression of ,cosR、efp、NCgl0613、NCgl1167、NCgl0448、NCgl2722、blt2924、NCgl2533、whcE、hisH、cobA、rshA、blt0956、NCgl2791、NCgl2047、NCgl1092、ilvA、ilvD、pepA、thrB、ilvA、NCgl2644、gluB、NCgl2227、NCgl1174、NCgl0777、urtC、NCgl0210、NCgl0607、NCgl0875、NCgl1034 and/or NCgl0484 genes at the mRNA level and/or protein level in each monoclonal colony was examined to screen for monoclonal strains having reduced levels of one or more of the genes relative to their standard strain.

3) When the level of mRNA and/or protein of the one or more genes in the monoclonal colony is detected to be reduced or lower than the detection lower limit, the bacteria in the monoclonal colony are identified as secretory protein high-producing bacteria.

Method for fermenting proteins

The application also provides a method for fermenting the protein. The method comprises the following steps:

1) The secretory expression vector is constructed so as to contain an exogenous target protein coding sequence. The construction method comprises the steps of sequence amplification or synthesis, enzyme digestion, splicing and the like.

2) Introducing the expression vector constructed in 1) into the engineering bacterium.

3) Culturing the engineering bacteria.

4) And harvesting the target protein from the culture solution and/or the thalli of the engineering bacteria.

In addition, the method may further comprise a protein purification step.

In some embodiments, the exogenous protein of interest is a recombinant protein. In some embodiments, the exogenous protein of interest is a humanized protein. In some embodiments, the recombinant protein is the aforementioned humanized collagen. In some embodiments, the recombinant protein is an antibody or antigen-binding fragment thereof. In some embodiments, the recombinant protein is a VHH.

In addition, the application further provides a composition comprising the engineering bacteria and the secretory expression vector. The composition may be marketed as a kit for fermentation of recombinant proteins.

It is to be understood that the present application encompasses the various aspects, embodiments, and combinations of the aspects and/or embodiments described herein. The above description and the examples which follow are intended to illustrate and not limit the scope of the application. Within the scope of the technical idea of the application, a number of simple variants of the technical solution of the application are possible, including combinations of the individual technical features in any other suitable way, which simple variants and combinations should likewise be regarded as being disclosed by the application, all falling within the scope of protection of the application.

The practice of the application will employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, molecular biology, transgenic biology, microbiology, recombinant DNA and immunology. The prior art documents describe said conventional techniques, see, for example, ,Current Protocols in Molecular Biology(FrederickM.AUSUBEL,2000,WileyandsonInc,Library of Congress,USA);Molecular Cloning:A Laboratory Manual,Third Edition,(Sambrooketal,2001,Cold Spring Harbor,NewYork:Cold Spring Harbor Laboratory Press);Oligonucleotide Synthesis(M.J.Gaited.,1984);Mullis et al.U.S.Pat.No.4,683,195;Nucleic Acid Hybridization(B.D.Harries&S.J.Higginseds.1984);Transcription And Translation(B.D.Hames&S.J.Higginseds.1984);Culture Of Animal Cells(R.I.Freshney,Alan R.Liss,Inc.,1987);Immobilized Cells And Enzymes(IRL Press,1986);B.Perbal,APractical Guide To Molecular Cloning(1984);the series,Methods In ENZYMOLOGY(J.Abelson and m.simon, eds. -in-coef, ACADEMIC PRESS, inc., new York), vols.154 and 155 (wuetal.eds.) and Vol.185,"Gene Expression Technology"(D.Goeddel,ed.);Gene Transfer Vectors For Mammalian Cells(J.H.Miller and M.P.Caloseds.,1987,Cold Spring Harbor Laboratory);Immunochemical Methods In Cell And Molecular Biology(Mayer and Walker, eds., ACADEMIC PRESS, london, 1987); hand book Of Experimental Immunology, volumes I-IV (D.M. Weir and C.C. Blackwell, eds., 1986); and Manipulating the Mouse Embryo (Cold Spring Harbor Laboratory Press, cold Spring Harbor, n.y., 1986).

Examples

Example 1 type I human collagen sequence engineering

Nucleotide sequence information of a human collagen alpha 1 chain coding Gene COL1A1 (Gene ID: 1277) of type I is obtained according to NCBI database, and a cgCOL A1 polynucleotide sequence suitable for expression in corynebacterium glutamicum is obtained through codon optimization. cgCOL1A1 gene is total to 4395 base pairs, the coded amino acid sequence contains 1464 amino acid residues, and the theoretical protein molecular weight is 138.9kDa. In the application, the amino acid sequence composed of 1464 amino acid residues is taken as a reference sequence, and is marked as SEQ ID NO:1, and the specific sequence is shown in at foot sequence table.

Primer binding sites capable of specifically annealing are searched based on the nucleotide sequence of cgCOL A1 gene, and the human collagen I alpha 1 chain is truncated and pooled by taking a Gly-X-Y tri-amino acid motif as a unit. The truncated proteins of 32I-type human collagen alpha 1 chains are obtained by total design, the truncated positions are respectively the "179-610 bit", "197-610 bit", "206-610 bit", "233-610 bit", "248-610 bit", "263-610 bit", "272-610 bit", "296-610 bit", "323-610 bit", "347-610 bit", "383-610 bit", "431-610 bit", "464-610 bit", "500-610 bit", "179-1102 bit", "197-1102 bit", "206-1102 bit", "233-1102 bit", "248-1102 bit", "263-1102 bit", "272-1102 bit", "296-1102 bit", "323-1102 bit", "347-1102 bit", "383-1102 bit", "431-1102 bit", "464-1102 bit", "500-1102 bit", "545-1102 bit", "650-1102 bit", "707-1102 bit", and the amino acids of the natural I-type human collagen alpha 1 chains are respectively corresponding to each other 144、138、135、126、121、116、113、105、96、88、76、60、49、37、308、302、299、290、285、280、277、269、260、252、240、224、213、201、186、

175. 151, 132 "Gly-X-Y" triamino motifs.

The PCR amplification is carried out on the gene fragments of the 32 truncated proteins, and as shown in figure 1, the result of DNA gel electrophoresis shows that the 32 truncated gene fragments are correctly amplified. The secretion expression vector p19HXT (self-made, empty vector sequence is shown as SEQ ID NO: 10) of corynebacterium glutamicum is linearized by restriction enzyme XhoI (NEB), and the 32 DNA fragments are mixed and spliced into the p19HXT after enzyme digestion by using a Gibson Assembly kit (NEB) so as to obtain a mixed plasmid library of I-type human collagen alpha 1 chains with different truncated lengths.

And (3) electrically transforming the mixed plasmid library into competent cells of cereal strains, randomly picking 64 transformants, inoculating LBB liquid culture medium, fermenting for 24 hours by a deep hole plate at 30 ℃, centrifuging to obtain a fermentation supernatant, and screening the strain with the highest protein yield by a double arsenic-tetracysteine reaction experiment.

The formula of the LBB medium in the above steps is: 10g/L peptone, 10g/L sodium chloride, 5g/L yeast extract and 10g/L brain-heart extract.

The P19HXT vector used in the above steps contains a promoter P _H36, RBS sequence, signal peptide SPcspB and a tetracysteine tag, wherein the recognition sites of restriction enzyme XhoI are separated between the coding sequences of the signal peptide SPcspB and the tetracysteine tag, the protein coding sequence is allowed to be inserted into the vector and the finally expressed protein is enabled to carry the signal peptide at the N end to guide the secretion of the finally expressed protein to the outside, the tetracysteine tag carried at the C end is used for the biarsen-tetracysteine reaction, and when the yield of the target protein is higher, the fluorescent signal formed by the reaction is stronger (Yu et al, (2022), SYNTHETIC AND SYSTEMS Biotechnology,7 (2): 765-774).

The operation steps of the double arsenic-tetracysteine reaction are as follows: after fermentation, 100ul of supernatant is centrifugally taken and mixed with 100ul of arsenic compound solution (2 mu M FlAsH-EDT2,2mM DTT), after shaking and incubation for one hour at room temperature, a multifunctional enzyme-labeled instrument is used for detecting fluorescent signals, and parameters for detecting the fluorescent signals are set as follows: excitation wavelength is 485nm, and emission wavelength is 528nm.

As shown in FIG. 2, the experimental results of the double arsenic-tetracysteine reaction showed that the reaction signals of the fermentation supernatants of strains 38 and 8 were increased by 5.49 and 5.48 times, respectively, compared to the control strain (containing p19HXT empty vector). Sequencing the plasmids shows that the coding proteins of truncated cgCOL A1 gene fragments contained in the two high-yield strains are truncated proteins of amino acids 347-610. The protein contains 264 amino acids, namely 88 Gly-X-Y three amino acid motifs, and has a theoretical protein molecular weight of 23.3kDa, and is named as recombinant type I humanized collagen CC1. The amino acid sequence and the nucleotide sequence of CC1 are respectively shown as SEQ ID NO in the last sequence table: 2 and SEQ ID NO: 3.

Example 2 CRISPRi based screening found that cosR and rshA gene inactivation increased recombinant protein production

CRISPR interference (CRISPRi) technology refers to down-regulating target gene expression using sgRNA-guided endonuclease-inactivated dCas9 protein as a transcriptional barrier. We have previously constructed plasmid vectors named pEC-dCAs9-subs for the CRISPRi technology in Corynebacterium glutamicum (Yu et al, (2022), SYNTHETIC AND SYSTEMS Biotechnology,7 (2): 765-774).

By bioinformatics analysis and early experimental screening, we locked 24 genes that could increase the secreted protein yield of C.glutamicum after inactivation. The sgRNA was designed for each of the 24 candidate genes cosR、efp、NCgl0613、NCgl1167、NCgl0448、NCgl2722、blt2924、NCgl2533、whcE、hisH、cobA、rshA、blt0956、NCgl2791、NCgl2047、NCgl1092、ilvA、ilvD、pepA、thrB、ilvA、NCgl2644、gluB、NCgl2227、NCgl1174、NCgl0777、urtC、NCgl0210、NCgl0607、NCgl0875、NCgl1034、NCgl0484, to be integrated into pEC-dCAS9-subs vector (GeneBank: OQ 065450), thereby obtaining vectors that can inactivate each of the 24 genes.

To verify the effect of the inactivation of the above genes on recombinant protein production, we first transformed the p19-H36-VHH-TC plasmid (Yu et al, (2022), SYNTHETIC AND SYSTEMS Biotechnology,7 (2): 765-774) into Corynebacterium glutamicum, which was used to secrete and express the C-terminally tetrastein tagged VHH protein. The VHH protein used in the examples of the present application is a camelidae nanobody (VHH) cAb-HuL22, an antibody against human lysozyme. Further, the above-mentioned 24 gene inactivated vector, empty vector pEC-dCAS9-subs, control vector with non-targeted sgRNA were transformed into Corynebacterium glutamicum harboring p19-H36-VHH-TC plasmid, respectively, 3 transformants were inoculated with LBB liquid medium, and after shaking tube fermentation at 30℃for 24 hours, the fermentation supernatants were centrifuged, and the respective VHH yield differences were analyzed by the double arsenic-tetracysteine reaction.

The operation steps of the double arsenic-tetracysteine reaction are as follows: after fermentation, 100ul of supernatant is centrifugally taken and mixed with 100ul of arsenic compound solution (2 mu M FlAsH-EDT2,2mM DTT), after shaking and incubation for one hour at room temperature, a multifunctional enzyme-labeled instrument is used for detecting fluorescent signals, and parameters for detecting the fluorescent signals are set as follows: excitation wavelength is 485nm, and emission wavelength is 528nm. Setting a VHH protein standard with gradient concentration, and synchronously carrying out double-arsenic-tetracysteine reaction to draw a standard curve for converting the actual VHH protein yield.

As shown in FIG. 3, the experimental results of the double arsenic-tetracysteine reaction show that the above 24 candidate genes have different degrees of influence on the yield of VHH protein after inactivation, wherein the inactivation of cosR genes improves the unit yield of VHH protein by 1.60 times, and the inactivation of rshA genes improves the unit yield of VHH protein by 1.54 times, which is the two genes with the highest improvement on the yield of VHH protein in the 24 candidate genes.

EXAMPLE 3 construction of CgDelta cosR Delta rshA engineering bacteria

Construction of cosR and rshA knockout Corynebacterium glutamicum CgDelta cosR Delta rshA comprising the steps of:

1) Taking the genome DNA of a corynebacterium glutamicum standard strain ATCC13032 as a template, taking CosR-UP-F and CosR-UP-R as primers for PCR, obtaining a homologous fragment CosR-UP of about 1000bp upstream of a cosR gene, and purifying a PCR product;

2) Taking the genome DNA of corynebacterium glutamicum ATCC13032 as a template, taking CosR-Down-F and CosR-Down-R as primers to carry out PCR, obtaining a homologous fragment CosR-Down of about 1000bp downstream of the cosR gene, and carrying out PCR product purification;

3) Performing PCR with genomic DNA of Corynebacterium glutamicum ATCC13032 as template and RshA-UP-F and RshA-UP-R as primers to obtain homologous fragment RshA-UP of about 1000bp upstream of rshA gene and purifying PCR product;

4) Taking the genome DNA of corynebacterium glutamicum ATCC13032 as a template, taking RshA-Down-F and RshA-Down-R as primers to carry out PCR, obtaining a homologous fragment RshA-Down of about 1000bp downstream of the rshA gene, and carrying out PCR product purification;

5) Linearizing the Corynebacterium glutamicum suicide plasmid pK18mobsacB (Journal of Biotechnology (2003) 287-299) with the restriction enzyme SmaI (NEB), splicing CosR-UP and CosR-Down fragments to the digested pK18mobsacB with the Gibson Assembly kit (NEB) to obtain recombinant plasmid pK18-CosR, splicing RshA-UP and RshA-Down fragments to the digested pK18mobsacB to obtain recombinant plasmid pK18-RshA;

6) The pK18-CosR is transformed into Corynebacterium glutamicum ATCC13032, and the CosR gene knockout bacterium Cgdelta cosR is obtained after secondary screening of LB plates containing 10% (m/v) sucrose; further transformation of pK18-RshA into CgDelta cosR strain gave, after a second screening with 10% (m/v) sucrose-containing LB plates, a double knockout of CosR and RshA Corynebacterium glutamicum CgDelta cosR Delta rshA.

Example 4 characterization of CgDelta cosR Delta rshA protein secretion levels and growth

The p19-H36-VHH-TC plasmid (Yu et al, (2022), SYNTHETIC AND SYSTEMS Biotechnology,7 (2): 765-774) was transformed into Corynebacterium glutamicum CgΔ cosR Δ rshA to obtain cosR and rshA double knockout strains that secrete VHH proteins. Meanwhile, p19-H36-VHH-TC plasmids were transformed into cosR single-knockout strains, rshA single-knockout strains, gluB single-knockout strains, cosR double-knockout strains, rshA double-knockout strains, cosR double-rshA double-knockout strains and gluB triple-knockout strains, respectively, for comparison of VHH secretion levels and growth differences of different gene combination knockout strains.

Six transformants of the strain are inoculated to LBB liquid culture medium, after deep hole plate fermentation is carried out for 24 hours at 30 ℃, fermentation supernatant is centrifugally taken, respective VHH yield differences are analyzed through a double arsenic-tetracysteine reaction, and respective growth differences are analyzed through measuring absorbance values of bacterial liquid at 600nm wavelength.

The experimental results are shown in fig. 4A, which shows that cosR single knockout increased VHH protein secretion by 17%, rshA single knockout increased VHH protein secretion by 7%, gluB single knockout increased VHH protein secretion by 3%, cosR and gluB double knockout increased VHH protein secretion by 20%, rshA and gluB double knockout increased VHH protein secretion by 14%, cosR and rshA and gluB triple knockout increased VHH protein secretion by 20%, and cosR and rshA double knockout increased VHH protein secretion by 28%, which shows that cosR and rshA double knockout increased protein secretion yield over single knockout of either gene and over double knockout of either gene with gluB gene. Furthermore, rshA and gluB double knocks, gluB single knocks, rshA single knocks all had higher relative growth levels than cosR and rshA double knocked out strains, as shown in fig. 4B. It can be seen in connection with FIGS. 4A and 4B that the cosR and rshA double knockout strains are not grown fastest relative to the other strains, and therefore the single well is not the highest in the detection of secreted protein production, but it has the highest secreted protein production, which is unexpected to those skilled in the art.

EXAMPLE 5 secretion of recombinant type I humanized collagen PINP Using Corynebacterium glutamicum CgDelta cosR Delta rshA

1) The N-terminal of the type I human collagen has a non-triple helix region, which is called type I procollagen amino terminal propeptide (PINP), and because the N-terminal propeptide and the C-terminal propeptide can be excised by protease and enter into blood circulation in a large amount when the collagen in the human body is secreted outside the cell, the PINP is a clinical diagnosis marker of bone metabolism related diseases, and the recombinant PINP protein has larger industrial and commercial value as a standard substance in a PINP detection kit;

2) Carrying out PCR (polymerase chain reaction) by taking the DNA of cgCOL A1 genes as templates and taking PINP-F and PINP-R as primers to obtain a DNA fragment cgPINP for encoding PINP protein and purifying a PCR product; linearizing a secretion expression vector p19HXH of corynebacterium glutamicum by using restriction enzyme XhoI (NEB), splicing cgPINP fragments to the digested p19HXH (empty vector sequence shown as SEQ ID NO: 9) by using Gibson Assembly kit (NEB) to obtain recombinant plasmid p19HPH;

3) The p19HPH plasmid was electrotransformed into competent cells of Corynebacterium glutamicum CgDelta cosR Delta rshA and non-knocked-out wild type strain, respectively, to obtain CgDelta cosR Delta rshA/p19HPH and CgWT/p19HPH transformants, respectively;

4) Two transformants of CgDelta cosR Delta rshA/p19HPH and CgWT/p19HPH strain were inoculated with LBB liquid medium, and after shaking flask fermentation at 30℃for 24 hours, the fermentation supernatants were centrifuged and compared for differences in yield of CgDelta cosR Delta rshA and recombinant type I humanized collagen PINP of the non-knocked-out wild type strain by SDS-PAGE and WesternBlot experiments.

The P19HXH vector used in the above steps contains a promoter P _H36, RBS sequence, signal peptide SPcspB and 6 XHis tag, wherein the recognition site of restriction enzyme XhoI is separated between the coding sequences of signal peptide SPcspB and 6 XHis tag, allowing the protein coding sequence to be inserted therein and allowing the finally expressed protein to carry the signal peptide on the N-terminal and direct its secretion to the outside, and the 6 XHis tag on the C-terminal for WesternBlot analysis and Ni column affinity purification.

As shown in FIG. 5A, the protein band of PINP-His was clearly seen in the fermentation supernatants of the two strains CgDelta cosR Delta rshA/p19HPH, whereas the protein band of PINP-His was not observed in the fermentation supernatants of the two strains CgWT/p19 HPH. After three times concentration by a 3kDa ultrafiltration tube, PINP-His protein bands are seen in both concentrated fermentation supernatants of CgDelta cosR Delta rshA/p19HPH and CgWT/p19HPH, and as shown in FIG. 5B, westernBlot results show that the PINP-His protein yield of CgDelta cosR Delta rshA/p19HPH strain is significantly higher than that of CgWT/p19HPH strain. Therefore cosR and rshA gene knockouts are able to secrete recombinant PINP proteins more efficiently than non-knockouts.

Example 5 test of the secretion of CC1 at CgDelta cosR Delta rshA

Secretion of recombinant type I humanized collagen CC1 using corynebacterium glutamicum cgΔ cosR Δ rshA, comprising the steps of:

1) Carrying out PCR (polymerase chain reaction) by taking the DNA of cgCOL A1 genes as templates and taking CC1-F and CC1-R as primers to obtain a DNA fragment cgCC1 for encoding the CC1 protein and purifying a PCR product; linearizing the secretion expression vector p19HXH of corynebacterium glutamicum with restriction enzyme XhoI (NEB), splicing cgCC < 1 > fragment to the digested p19HXH with Gibson Assembly kit (NEB) to obtain recombinant plasmid p19HCC1H;

2) Recombinant type I humanized collagen C1L2T protein is disclosed in Chinese patent (publication No. CN 113683678A) and is used as a positive control of the experiment. Carrying out PCR (polymerase chain reaction) by taking DNA of cgCOL A1 genes as templates and L2-F and L2-R as primers to obtain a DNA fragment cgL2 for encoding C1L2T protein and purifying a PCR product; linearizing a secretion expression vector p19HXH of corynebacterium glutamicum with restriction enzyme XhoI (NEB), splicing cgL2 fragments to the digested p19HXH by using Gibson Assembly kit (NEB) to obtain recombinant plasmid p19HL2H;

3) Electrotransformation of the p19HCC1H plasmid into competent cells of Corynebacterium glutamicum CgΔ cosR Δ rshA, resulting in CgΔ cosR Δ rshA/p19HCC1H transformants; electrotransformation of the p19HL2H plasmid into competent cells of Corynebacterium glutamicum CgDelta cosR Delta rshA to obtain CgDelta cosR Delta rshA/p19HL2H transformant;

4) Four transformants of CgDelta cosR Delta rshA/p19HCC1H strain and CgDelta cosR Delta rshA/p19HL2H strain were inoculated with LBB liquid medium, shake flask fermented at 30℃for 24 hours, and the fermented supernatant was centrifuged to verify the secretion yield of recombinant type I humanized collagen CC1 of CgDelta cosR Delta rshA by SDS-PAGE and WesternBlot experiments.

As shown in FIG. 6A, the protein band of CC1-His was clearly seen in the fermentation supernatants of four CgDelta cosR Delta rshA/p19HCC1H strains, and the obvious hybridization band was observed by WesternBlot analysis (FIG. 6C), and the CgDelta cosR Delta rshA/p19HCC1H strain was able to secrete and produce recombinant CC1 protein with high efficiency. Meanwhile, by comparing the results of SDS-PAGE experiments (FIG. 6A and FIG. 6B) and WesternBlot experiments (FIG. 6C and FIG. 6D), it can be observed that the yield of CC1-His protein in fermentation supernatant is significantly higher than that of C1L2T-His, which indicates that the recombinant type I humanized collagen CC1 of the application has better secretion property than that of control protein C1L 2T.

And (3) a sequence table:

Note that: the polynucleotide sequences in the above sequence listing may represent DNA sequences or RNA sequences. When it represents an RNA sequence, t represents uridine.

Claims

A type I humanized collagen comprising 88 Gly-X-Y motifs of the alpha 1 chain of human collagen type I, wherein X and Y are selected from any amino acid residue.
2. The collagen according to claim 1, comprising the amino acid sequence of amino acid positions 347 to 610 of the type I human collagen α1 chain, wherein the amino acid position numbers correspond to SEQ ID NO:1 are consistent.
3. The collagen according to claim 1, which has an amino acid sequence as set forth in SEQ ID NO: 2.
4. A biomaterial characterized in that the biomaterial is any one of the following:

1) A nucleic acid molecule encoding the collagen of any one of claims 1-3;

2) An expression cassette comprising the nucleic acid molecule of 1);

3) A recombinant vector comprising the nucleic acid molecule of 1);

4) Phage, virus or bacteria comprising 1) said nucleic acid molecule, or 2) said expression cassette or 3) said recombinant vector.
5. The biomaterial according to claim 4, wherein said nucleic acid molecule is as set forth in SEQ ID NO:3 or the complement thereof.
6. The biomaterial according to claim 4, wherein said bacterium has cosR and/or rshA gene deleted or cosR and/or rshA gene expression inhibited, said bacterium being selected from the group consisting of: corynebacterium (Corynebacterium), escherichia (Escherichelloa), bacillus (Bacillus), brevibacterium (Brevibacterium), mycobacterium (Mycobacterium), or Actinomycetes (Actinomycetes).
7. An engineered bacterium having a cosR and/or rshA gene deletion or having an inhibition of cosR and/or rshA gene expression, wherein the bacterium is selected from the group consisting of: corynebacterium (Corynebacterium), escherichia (Escherichelloa), bacillus (Bacillus), brevibacterium (Brevibacterium), mycobacterium (Mycobacterium), or Actinomycetes (Actinomycetes).
8. The bacterium according to claim 7, which is Corynebacterium glutamicum (Corynebacterium glutamicum).
9. A method of screening for secreted protein high yielding organisms comprising:

1) Obtaining a monoclonal colony;

2) Detecting expression of cosR and/or rshA genes at mRNA level and/or protein level in the monoclonal colonies;

3) When a decrease in cosR and/or rshA gene expression level in the monoclonal colony compared to the standard strain, or no expression of cosR and/or rshA is detected, the bacteria in the monoclonal colony are identified as secreted protein high producing bacteria.
10. A method for fermentative production of a protein of interest, comprising fermentatively producing a protein of interest using a biological material according to any one of claims 4-6 or an engineered bacterium according to claim 7 or 8.