CN112921052A

CN112921052A - In vivo cell proliferation marking and tracing system and application thereof

Info

Publication number: CN112921052A
Application number: CN201911242892.5A
Authority: CN
Inventors: 周斌; 刘秀秀; 何灵娟; 蒲文娟
Original assignee: Center for Excellence in Molecular Cell Science of CAS
Current assignee: Center for Excellence in Molecular Cell Science of CAS
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2021-06-08
Anticipated expiration: 2039-12-06
Also published as: CN112921052B

Abstract

The present invention relates to a polynucleotide sequence comprising a chimeric recombinase coding sequence, a fragment comprising a cell proliferation factor gene, a coding sequence of a first recombinase, a coding sequence of an estrogen receptor ER ligand binding domain, and a recognition site by a second recombinase. The invention also provides polynucleotide products, nucleic acid constructs, host cells and the like comprising the polynucleotide sequences. The invention also provides methods of in vivo proliferative cell marking or tracking using the polynucleotide products, nucleic acid constructs and host cells of the invention. The invention provides a proliferating cell marking system through the chimeric recombinase which is artificially modified, and realizes continuous long-time cell marking and tracing. The invention tracks the proliferation of various types of cells in vivo for the first time, and has important significance for understanding the dynamic change of various cell groups in vivo.

Description

In vivo cell proliferation marking and tracing system and application thereof

Technical Field

The present discovery relates to cell proliferation markers and tracking systems and uses thereof.

Background

The proliferation of cells in vivo is a dynamic and unstable process, the proliferation of cells of the same type in the same time is different, and the capture of the proliferation state of cells in vivo is crucial to the understanding of biological physiology and pathological processes, so the capture of the proliferation of cells in vivo is always a hot point of research in the biological field. Conventional methods for capturing cell proliferation include mainly proliferation molecular marker staining and incorporation of DNA analogs and isotopes. However, the three have disadvantages that the cell proliferation in a short time can only be captured by utilizing the proliferation molecular marker for dyeing; secondly, cell proliferation within a certain time period can be only researched by utilizing the DNA analogue incorporation, because the long-term incorporation of the DNA analogue is likely to influence the normal proliferation of the cells; the isotope incorporation method, however, has a problem that the influence on the normal proliferation of cells is reduced, but the detection is difficult.

Studies of in vivo proliferation cell tracing using genetic lineage tracing techniques have been conducted in the current field. For example, researchers have used Ki67 cell proliferation markers as promoters to initiate inducible homologous recombinases to homologously recombine and label cells, i.e., Ki67-CreER tool mice crossed with the corresponding constantly open reporter gene. When cells express Ki67 and proliferate, the activity of Ki67 promoter promotes the expression of subsequent creER, and meanwhile, under the induction of tamoxifen (tamoxifen), Cre enters nucleus to carry out homologous recombination on LoxP sites of corresponding reporter genes so as to mark the cells with fluorescence corresponding to the reporter genes. Although this technique has some effect on tracking proliferating cells in vivo, it has a significant drawback in that the system for in vivo tracing proliferating cells using Ki67-CreER relies on the simultaneous presence of two conditions, one being the activation of Ki67 that initiates the expression of CreER and the other being the simultaneous presence in vivo of tamoxifen (tamoxifen) that nucleates CreER. If the cells which proliferate rapidly are tracked, the cells which proliferate can be basically captured by the tamoxifen which acts in vivo for a period of time, but if the cells which proliferate slowly act in vivo for a shorter period of time, the cells can not be labeled to fluoresce if the cells have Ki67 activity to drive the expression of creER but do not have the existence of tamoxifen which can lead Cre to enter the nucleus in vivo, and the signals are missed. It is also unrealistic to carry out tamoxifen treatment all the time in the long-time tracking process, which on the one hand will affect the physiological state of mice, and on the other hand, there is a certain efficiency problem in tamoxifen induction of Cre to enter nucleus. The research aims to establish a set of genetic lineage tracing technology for tracing in-vivo proliferation cells, capture signals which are easy to be missed due to the instantaneity of the expression of proliferation genes, and continuously reduce the in-vivo cell proliferation situation for a long time.

Disclosure of Invention

In a first aspect, the invention provides a nucleic acid molecule selected from

(1) A nucleic acid molecule comprising a 5 'homology arm and a 3' homology arm, the 5 'homology arm and the 3' homology arm being capable of recombining, into a genome, sequences therebetween such that the sequences therebetween are co-expressed with a cell proliferation factor gene in the genome, and a first recombinase coding sequence, an estrogen receptor ER coding sequence, and a recognition site for a second recombinase located between the 5 'homology arm and the 3' homology arm, and

(2) (1) the complementary sequence of said nucleic acid molecule.

In one or more embodiments, the polynucleotide sequence of the nucleic acid molecule is, in order from the 5 'end to the 3' end, a 5 'homology arm, a first recombinase coding sequence, a recognition site for a second recombinase, an estrogen receptor ER coding sequence, a recognition site for a second recombinase, and a 3' homology arm, the 5 'homology arm and the 3' homology arm being capable of recombining sequences therebetween at the 5 'or 3' end of the cell proliferation factor gene.

In one or more embodiments, the cell proliferation factor is Ki67 or PCNA.

In one or more embodiments, the nucleic acid sequence of the 5' homology arm is as set forth in nucleotides 1-3000 of SEQ ID NO 1.

In one or more embodiments, the nucleic acid sequence of the 3' homology arm is as shown in SEQ ID NO 1 nucleotides 5128-8127.

In one or more embodiments, the first recombinase and the second recombinase are Cre and Dre or Dre and Cre, respectively, wherein the recognition site for Cre is LoxP and the recognition site for Dre is rox.

In one or more embodiments, the amino acid sequence of Dre is as set forth in amino acids 1-356 of SEQ ID NO. 2.

In one or more embodiments, the amino acid sequence of Cre is set forth in SEQ ID NO 3. In one or more embodiments, the nucleic acid sequence of Cre is as set forth in 3067-4099 of SEQ ID NO: 1.

In one or more embodiments, the nucleic acid sequence of LoxP is shown in SEQ ID NO 4.

In one or more embodiments, the nucleic acid sequence of rox is set forth in SEQ ID NO 5.

In one or more embodiments, the amino acid sequence of the estrogen receptor ER is as shown in amino acids 357 and 666 of SEQ ID NO: 2. In one or more embodiments, the nucleic acid sequence of the estrogen receptor ER is as shown in nucleotides 4153-5085 of SEQ ID NO: 1.

The invention also provides a nucleic acid construct comprising a nucleic acid molecule as described herein.

The present invention also provides a recombinase system comprising (1) a nucleic acid molecule as described herein, and (2) optionally, a nucleic acid molecule encoding a fusion protein of a second recombinase and an estrogen receptor ER, (3) optionally, a nucleic acid molecule comprising the structure: from the 5 'end to the 3' end, the recognition site of the first recombinase, the termination sequence, the first recombination site and the marker coding sequence are arranged in sequence.

The present invention also provides a recombinase system comprising (1) a nucleic acid construct as described herein, and (2) optionally a second nucleic acid construct having a polynucleotide sequence encoding a fusion protein of a second recombinase and an estrogen receptor ER, (3) optionally a third nucleic acid construct having a polynucleotide sequence comprising the structure: from the 5 'end to the 3' end, the recognition site of the first recombinase, the termination sequence, the recognition site of the first recombinase and the marker coding sequence are arranged in sequence.

The invention also provides a host cell comprising one or more of: (1) a polynucleotide sequence as described herein; (2) a nucleic acid construct as described herein; (3) a system as described herein.

The present invention also provides a method for constructing a transgenic animal, comprising:

f is to be₀First generation animal and F₀Second animal or F₀Mating the animals of the third generation, and homologous recombination occurs in the progeny animals to obtain a transgenic animal comprising the first and second polynucleotide sequences or a transgenic animal comprising the first and third polynucleotide sequences, or

Subjecting the three kinds of F₀Mating any two of the generations of animals, then F, which undergoes homologous recombination₁Animal and a third F₀Mating the animals at F₂Homologous recombination occurs in the animal generations to obtain transgenic animals comprising the first, second and third polynucleotide sequences.

In one or more embodiments, F₀The genome of the first animal comprises a first polynucleotide sequence, wherein the first polynucleotide sequence is the polynucleotide sequence of the nucleic acid molecule described herein, or the first polynucleotide sequence comprises, in order from 5 'to 3', a first recombinase coding sequence, a second recombinase recognition site, an Estrogen Receptor (ER) coding sequence, and a second recombinase recognition site, and the first polynucleotide sequence is co-expressed with a cell proliferation factor gene in the genome.

In one or more embodiments, F₀The genome of the second animal comprises a second polynucleotide sequence encoding a fusion protein of a second recombinase and an estrogen receptor ER.

In one or more embodiments, F₀The genome of the third animal comprises a third polynucleotide sequence comprising, in order from the 5 'end to the 3' endA recognition site for a first recombinase, a termination sequence, a recognition site for a first recombinase, and a marker-encoding sequence.

In one or more embodiments, the animal is a mouse.

In one or more embodiments, the first and second recombinant enzymes, estrogen receptor ER and cell proliferation factor are as described herein.

The present invention also provides a method for constructing a transgenic animal comprising introducing any one, two or three of the first, second and third polynucleotide sequences into animal cells, culturing the cells, and screening the transgenic animal whose genome comprises any one, two or three of the first, second and third polynucleotide sequences, wherein

The first polynucleotide sequence is a first recombinase coding sequence, a recognition site of a second recombinase, an estrogen receptor ER coding sequence and a recognition site of the second recombinase from 5 'end to 3' end in sequence, and the first polynucleotide sequence is co-expressed with a cell proliferation factor gene in a genome in the transgenic animal,

the second polynucleotide sequence encodes a fusion protein of a second recombinase and an estrogen receptor ER,

the third polynucleotide sequence comprises a recognition site of the first recombinase, a termination sequence, a recognition site of the first recombinase and a marker coding sequence from the 5 'end to the 3' end in sequence.

In one or more embodiments, the cell is an animal ES cell.

In one or more embodiments, the animal is a mouse.

The invention also provides an in vivo long-term cell labeling method comprising labeling cells expressing a cell proliferation factor gene in an animal comprising a first, second and third polynucleotide sequence in the presence of an inducer that interacts with the estrogen receptor ER, wherein,

the second polynucleotide sequence encodes a fusion protein of a second recombinase and an estrogen receptor ER, and

the third polynucleotide sequence is provided with a recognition site of the first recombinase, a termination sequence, a first recombination site and a marker coding sequence from the 5 'end to the 3' end in sequence.

In one or more embodiments, the animal is a mouse.

The invention also provides the use of a nucleic acid molecule, nucleic acid construct, system or host cell as described herein in long-term cell labeling or cell tracking.

The invention also provides a kit comprising a nucleic acid molecule, nucleic acid construct, system or host cell as described herein, and reagents required to knock the nucleic acid molecule into the genome of the cell.

Drawings

FIG. 1 shows a conventional cell proliferation tracing technique in accordance with one embodiment of the present invention. (a) Cartoon representation of Ki67 expression profile and fate profile, Ki67 is dynamically expressed at different time points from T1 to Tn, Ki67 fate profile can mean that all expression of Ki67 from T1 to Tn is captured. (b) DreER nuclear excision of ER behind Cre under tamoxifen induction allows Ki67 to enter cells that proliferate as a homologous recombination marker of the reporter gene upon Cre expression, allowing seamless tracking of cell proliferation over a long period of time. (c) The proliferation of cells is tracked by using the traditional genetic lineage tracing technology, and the cell proliferation can be detected only in a short time due to the short action time window of tamoxifen.

FIG. 2 shows the construction and validation of an exemplary chimeric recombinase in accordance with an embodiment of the present invention. (a) Schematic diagram of construction strategy of Ki67-CrexER knock-in mouse. (b) Results of immunofluorescence staining of sections of Ki67-CrexER knock-in mouse adult small intestine. Staining results of ESR with Ki67 and EdU and statistics showed that the expression profile of ESR was substantially consistent with a marker of proliferation. (c) Ki67-CrexER was mated with R26-GFP and induced by tamoxifen (Tam) in adults, and GFP and VE-Cad immunofluorescent-stained two-photon confocal pictures of aortic sections were obtained after 5 days. Statistical results show that Ki67-CrexER marks endothelial cells existing in pairs, and the fact that the knock-in of Ki67-CrexER gene into mice captures the proliferation activity of cells in vivo is suggested. The scale bar represents 100 μm in b and 500 μm in c. Each picture represents at least three biological replicate samples.

FIG. 3 shows a method of labeling and tracking proliferating cells (ProTracker) according to an embodiment of the present invention. (a) Cartoon representation of conversion of Ki67-CrexER to Ki67-Cre under the induction of DreER and tamoxifen. (b) Three mouse strains required for cell proliferation were traced using the methods described herein. (c) Traditional methods and a schematic of time for tamoxifen induction and sample collection of ProTracker proliferation cells. (d) The traditional method and signals of proliferation of each tissue organ of an adult mouse in two days and four weeks of ProTracer tracing show a full-tissue fluorescence and immunofluorescence result graph after each tissue section, and an embedded small graph represents a bright-field full-tissue picture of the same tissue or organ. The upper panel of the two graphs for each result is the result of Ki67-CreER for the conventional method, and the lower panel is the result of ProTracker system. The scale bar represents 1mm in the full tissue picture and 100 μm in the slice fluorescence picture. Each picture represents five independent biological replicate samples.

FIG. 4 shows that DreER does not undergo a mixed homologous recombination reaction with R26-GFP. Mice obtained by mating R26-DreER with R26-GFP were subjected to tamoxifen induction after adult life, and full tissue fluorescence from each tissue or organ or GFP immunofluorescence after sectioning were collected after four weeks and photographed. The embedded panels represent full tissue brightfield photographs of the same tissue organ. The full tissue fluorescence of GFP is shown at 1mm scale with 100 μm for the other scales, and each panel represents 5 biological replicates.

FIG. 5, Ki 67-CrexER; R26-GFP and Ki 67-CrexER; R26-DreER; R26-GFP shows substantially no signal leakage. (a, b) Ki67-CrexER at 12 weeks of age; R26-GFP and Ki 67-CrexER; R26-DreER; R26-GFP mouse whole tissue fluorescence and each tissue section immunofluorescence staining photo. Scale bar, 1mm in total tissue, 100 μm in each tissue section. Each picture represents 5 individual biological replicate samples.

FIG. 6, proliferation of hepatocytes in adult liver. (a) The experimental strategy for tracing the proliferation of the liver cells by using the method of the invention is a cartoon schematic diagram. (b) The liver is divided into cartoon representations of three different metabolic regions from the portal venous region to the central venous region, with arrows indicating the direction of blood flow. (c) Immunofluorescence staining results of GFP, GS and E-CAD of liver tissue sections on day 0, day 2 of tamoxifen-induced protacker mice are shown. (d) Immunofluorescent staining results of GFP, GS and E-CAD of liver tissue sections of mice at 2, 4, 6, 8, 10, 12 weeks after induction by tamoxifen in ProTracker mice. (e) 2 GFP/mm in different regions of hepatic lobule for different sampling time points⁺Cell count statistics of (2). 1. 2 and 3 respectively represent Zone1 (E-CAD)⁺)、Zone2(E-CAD^-GS^-)、Zone3(GS⁺). Scale bar 100 μm, each picture represents 5 individual biological replicate samples.

Fig. 7, proliferation of cardiomyocytes in adult hearts. (a) Cartoon representation of lineage tracing of Ki67 expressing cells from day 1 to day n, with green representing the proliferation signal of Ki67 expressed. (b) Cartoon representation of seamlessly tracking proliferating cells. (c) Lineage tracing three-month heart sections of ProTracker mice are shown by GFP, TNNI3 immunofluorescent staining, with 1, 2 showing partial enlarged areas in the left panel. (d) Pedigree tracing is a graph of fluorescent staining results of GFP, WGA of heart tissue sections of three-month protacker mice, with an enlargement of the outlined region on the left side, and GFP, TNNI3 immunofluorescent staining results of the same enlarged region below the right side.

FIG. 8, capturing nuclear division and cell division of cardiomyocytes in the adult heart. (a) Ki67 expressing cells from day 1 to day nThe green color represents the proliferation signal of Ki67 expression and subsequent analysis of the number of nuclei can be performed. (b) Lineage-traced heart of three-month ProTracer mice cardiomyocytes were isolated and Hoechst stained, and nuclear number statistics were performed on GFP + as well as GFP-cardiomyocytes. (c) 3D representation of multi-slice scan of GFP + cardiomyocytes (xyz:500X 100. mu.m). (d) The magnified multislice scan shows a GFP⁺The yellow arrows indicate the nuclei of the cardiomyocytes, and the inset is a cartoon representation of the binuclear cardiomyocytes. (e) Partial area xy and yz magnified 3D plots in the c plot show two immediately adjacent cardiomyocytes. (f) The magnified multislice scan shows that both of the two immediately adjacent GFP + cardiomyocytes are mononuclear cardiomyocytes. Yellow arrows indicate myocardial nuclei, hollow white arrows indicate non-myocardial nuclei of cardiomyocytes next to GFP +. The inset is a cartoon representation of two adjacent mononuclear cardiomyocytes. (g) GFP (green fluorescent protein)⁺The number of adjacent cardiomyocytes in the cardiomyocytes in (a). The scale bar represents 100 μm. Each picture represents five independent biological replicate samples.

Detailed Description

It is understood that within the scope of the present invention, the above-described technical features of the present invention and the technical features described in detail below (e.g., the embodiments) can be combined with each other to constitute a preferred technical solution.

The present invention aims to provide a genetic lineage tracing technique for tracing proliferating cells in vivo, which captures signals that are likely to be missed due to the transient nature of the expression of proliferating genes, and reduces the proliferation of long-term cells in vivo.

Mouse Ki67-CreER established by using traditional genetic lineage tracing technology cannot track cell proliferation with slow proliferation for a long time because of the short window time of tamoxifen action in vivo, and can only track cell proliferation during the action time period of tamoxifen (FIG. 1, c). The dynamic expression of cell proliferation factor genes such as Ki67 in vivo, single proliferation marker staining and the traditional lineage tracing method can only capture the proliferation expression profile at a single time point, and only capture the fate map of the cell proliferation factor genes can really reflect the cell proliferation status in vivo within a certain time period (FIG. 1, a). But direct lineage tracing using Ki67-Cre was not feasible because all progeny cells were derived from the same zygote. In order to be able to track slowly proliferating cells for a long period of time in vivo, it is necessary to convert the inducible CreER under certain conditions into Cre which is expressed and then incorporated into the nucleus.

In order to solve the problem, the invention introduces a second recombination system Dre-rox independent of Cre-LoxP to modify the creER in the prior art lineage tracing technology, recognition sites rox of Dre homologous recombinase are respectively added at two ends of an ER DNA sequence, and the modified creERT2 is converted into Cre-rox-ERT2-rox, which is called crexER in the text. When dreER and Tamoxifen (Tamoxifen) coexist in vivo, Tamoxifen induces Dre to enter nucleus to perform homologous recombination of rox at two ends of ER DNA, and crexeR is changed from inducible homologous recombinase into homologous recombinase Cre which can be directly inserted into nucleus, so that proliferation specific genes in subsequent cells can directly recognize reporter genes to mark the cells once the expression of Cre is started, as shown in (figure 1, b). The invention utilizes a Cell proliferation tracker (ProTracker) to realize the continuous tracing of the expression of intracellular proliferation specific genes in a long time course.

The engineered tracer system of the invention comprises a nucleic acid molecule encoding a chimeric recombinase, the polynucleotide sequence of said nucleic acid molecule being selected from the group consisting of: a polynucleotide sequence comprising a fragment of a cell proliferation factor gene, a coding sequence for a first recombinase, a coding sequence for an estrogen receptor ER, and a recognition site for a second recombinase, and/or a complement of the polynucleotide sequence. In one embodiment, the polynucleotide sequence of the nucleic acid molecule is, in order from the 5 'end to the 3' end, a first fragment of the cell proliferation factor gene, a coding sequence for a first recombinase, a recognition site for a second recombinase, a coding sequence for an estrogen receptor ER, a recognition site for a second recombinase, and a second fragment of the cell proliferation factor gene. Preferably, the nucleic acid molecule of the invention is selected from (1) a nucleic acid molecule comprising a 5 'homology arm and a 3' homology arm, and a first recombinase coding sequence, an estrogen receptor ER coding sequence, and a recognition site for a second recombinase located between the 5 'homology arm and the 3' homology arm, the 5 'homology arm and the 3' homology arm being capable of recombining sequences therebetween into a genome such that the sequences therebetween are co-expressed with a cell proliferation factor gene in the genome, and (2) (1) a complement of the nucleic acid molecule.

The polynucleotides herein may be in the form of DNA or RNA. The form of DNA includes cDNA or artificially synthesized DNA. The DNA may be single-stranded or double-stranded. The DNA may be the coding strand or the non-coding strand. In certain embodiments, the polynucleotide sequence is set forth in SEQ ID NO 1.

The "cell proliferation factor gene" or "cell proliferation specific gene" as used herein refers to a gene that is expressed only during cell proliferation. Cell proliferation as described herein includes, but is not limited to, mitosis, amitoses, meiosis, binary fission, and the like. In one embodiment, the cell proliferation factor is Ki67 and/or PCNA. Ki-67 is a cell proliferation factor which is used more frequently at present, and Ki-67 is expressed at each stage of cell proliferation except that the cell proliferation factor is not expressed in G0 stage. PCNA is a short term for Proliferating Cell Nuclear Antigen (Proliferating Cell Nuclear Antigen), and is present only in normal Proliferating cells and tumor cells.

Herein, the 5 'homology arm and the 3' homology arm may be a fragment of a cell proliferation factor gene, as long as the fragment can be used for the homology arm for homologous recombination. The fragment may comprise or be located after the promoter of the cell proliferation factor gene, or the fragment may be located before or within the 3 'UTR of the gene or comprise the 3' UTR. Such that expression of a sequence of interest (e.g., a chimeric recombinase as described herein) is placed under the control of a specific promoter such that expression occurs in a particular tissue or organ or at a particular developmental stage. In one embodiment, the fragments are selected such that the cell proliferation factor gene is expressed or not expressed following homologous recombination. In one embodiment, the fragments are selected such that expression of the cell proliferation factor gene is not affected following homologous recombination. The fragment of the cell proliferation factor gene includes a first and a second fragment, which serve as the 5 'and 3' homology arms required for the set of homologous recombination, respectively. In one embodiment, the first and second fragments are selected such that the sequence of interest is inserted downstream of a promoter of a cell proliferation factor gene. In one embodiment, the first and second segments are selected such that the sequence of interest is inserted between the last exon and the 3' UTR of the cell proliferation factor gene. In one or more embodiments, fragments of the cell proliferation factor gene are described in SEQ ID NO:1, 1-3000 and 5128-8127.

A recombinase suitable for use herein can be any recombinase, including a first recombinase and a second recombinase. The first and second recombinant enzymes herein may be different. In one embodiment, the first and second recombining enzymes are Cre and Dre or Dre and Cre, respectively. The Cre recombinase herein may be a Cre recombinase known in the art, the gene coding region sequence of which has a full length of 1029bp (EMBL database accession number X03453), and encodes a 38kDa monomeric protein consisting of 343 amino acids. The Cre recombinase not only has catalytic activity, but also can recognize specific DNA sequences, namely LoxP sites, similar to restriction enzymes. The Cre recombinase can mediate the specific recombination between two LoxP sites (sequences), so that the gene sequences between the LoxP sites are deleted or recombined. Cre recombinase suitable for use herein also includes mutants of Cre that retain recombinase enzyme activity. In certain embodiments, the amino acid sequence of recombinase Cre is as set forth in SEQ ID NO 3 and the nucleic acid sequence is as set forth in 3067-4099 of SEQ ID NO 1. Dre is a homologous recombinase similar to Cre, and it specifically recognizes another recombination site rox, similar to Cre specifically recognizing LoxP site. Dre recombinases suitable for use herein also include mutants of Dre that retain recombinase enzyme activity. In certain embodiments, the amino acid sequence of recombinase Dre is as set forth in 1-356 of SEQ ID NO 2. "LoxP" and "rox" as described herein have art-recognized meanings, and the sequences thereof are well known in the art. Illustratively, the nucleic acid sequence of LoxP is shown in SEQ ID NO. 4, and the nucleic acid sequence of rox is shown in SEQ ID NO. 5, wherein N represents any nucleotide.

Estrogen Receptors (ERs) are members of the steroid hormone receptor protein superfamily. Chimeric recombinases can be generated by fusing a ligand-binding domain (LBD) of the estrogen receptor with the recombinase. Because of the presence of the estrogen receptor binding region, the chimeric recombinase cannot enter the nucleus to bind to the recombination site and can only localize in the cytoplasm. Only after the addition of estrogen can the chimeric recombinase enter the nucleus to play a role. The ligand binding domain of the estrogen receptor may be mutated such that it is unable to bind physiological estrogen in the body, but binds only to exogenous inducers. The inducer forms a stable complex with the ligand binding domain of the estrogen receptor and is transported into the nucleus. The inducing agent includes estrogen analogues such as tamoxifen or 4-OHT. Suitable estrogen receptor ligand binding regions for use herein also include mutants known in the art in which the ligand binding region is mutated, but the resulting mutated transmembrane domain retains the biological function of the ligand binding region (i.e., the chimeric recombinase is still capable of binding to an inducing agent after the mutation). The mutation may be an insertion, deletion or substitution, and the number of the mutated amino acids may be one or more, for example, within 20, preferably within 10, more preferably within 5. In some embodiments, the mutation is a substitution mutation. It is well known in the art that substitution of an amino acid with one that is chemically similar (i.e., conservative substitution) has little to no effect on the function of the resulting protein. Thus, in some preferred embodiments of the invention, the substitution is a conservative substitution. Examples of conservative substitutions include, but are not limited to, substitutions between amino acids having the same polarity of the side chain group, such as between non-polar amino acids such as Ala, Val, Leu, Ile, Pro, Phe, Trp and Met, or between polar amino acids such as hydrophilic amino acids Gly, Ser, Thr, Cys, Tyr, Asn and Gln, or between polar positively charged amino acids such as Lys, Arg and His, or between polar negatively charged amino acids such as Asp and Glu; and substitutions between fatty acid amino acids such as Ala, Val, Leu, Ile, Met, Asp, Glu, Lys, Arg, Gly, Ser, Thr, Cys, Asn, and Gln, etc., between aromatic amino acids such as Phe and Tyr, between heterocyclic amino acids such as His and Trp, etc. In exemplary embodiments, the amino acid sequence of the estrogen receptor ligand binding region described herein is depicted as SEQ ID NO 2 at position 357 and 666; the nucleic acid sequence is shown as position 4153-5085 of SEQ ID NO. 1.

The polypeptides forming the fusion protein of the invention may be linked directly or may be linked by a linker sequence. The linker may be a linking sequence capable of expressing multiple polycistrons on a single vector, such as an Internal Ribosome Entry Site (IRES) or a 2A peptide. It is well known in the art that 2A peptide is a short peptide capable of inducing self-cleavage of proteins, and includes F2A, P2A, T2A peptide, and the like. The 2A peptide may also be attached to the flanking polypeptides by a conventional G and S containing linker. In one or more embodiments, the coding sequence for the P2A peptide comprises or consists of nucleotides 3001-3066 of SEQ ID NO: 1.

The polynucleotide sequence encoding the chimeric recombinase herein may also include one or more regulatory sequences operatively linked to the chimeric recombinase. The control sequence may be an appropriate promoter sequence. The promoter sequence is typically operably linked to the coding sequence of the protein to be expressed. The promoter may be any nucleotide sequence which shows transcriptional activity in the host cell of choice including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell. The control sequence may also be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription. The terminator sequence is operably linked to the 3' terminus of the nucleotide sequence encoding the polypeptide. Any terminator which is functional in the host cell of choice may be used in the present invention.

Also included herein are nucleic acid molecules encoding a fusion protein of a second recombinase and an estrogen receptor ligand binding region. The second recombinase and estrogen receptor ligand binding region are as defined elsewhere herein. In certain embodiments, the polynucleotide sequence of the nucleic acid molecule encodes the fusion protein Dre-ER shown as SEQ ID NO. 2. In certain embodiments, the fusion protein of the second recombinase and the estrogen receptor ligand binding region is constitutively expressed in the host. In certain embodiments, the fusion protein of the second recombinase and the estrogen receptor ligand binding region induces expression or specific expression in the host. In certain embodiments, the mouse of the invention has the coding sequence for Dre-ER inserted at the Rosa26 gene site.

Also included herein are nucleic acid molecules comprising the structure: and the nucleic acid molecule comprises a recognition site of the first recombinase, a termination sequence, a recognition site of the first recombinase and a marker coding sequence from the 5 'end to the 3' end in sequence. Thereby allowing the use of a recombinase to regulate the expression of the marker. Exemplary polynucleotide sequences of the nucleic acid molecules include the following structures: LoxP-termination sequence-LoxP-marker. In certain embodiments, the fusion protein encoded by the polynucleotide is constitutively expressed in a host. Suitable termination sequences for use herein can be any termination sequence known in the art. Suitable labels for use herein may be any label known in the art. The label can be a fluorescent protein, including but not limited to green fluorescent labels (e.g., GFP, ZsGreen), red fluorescent labels (e.g., tdTomato, DsRed, mCherry), Yellow Fluorescent Protein (YFP), Cyan Fluorescent Protein (CFP), and the like. In one or more embodiments, the marker is GFP, the amino acid sequence of which is shown in SEQ ID NO 6. In certain embodiments, the mouse of the invention has the LoxP-stop-LoxP-GFP coding sequence inserted at the Rosa26 gene site.

Also provided herein are polynucleotide products. By "polynucleotide product" as used herein is meant a product comprising one or more polynucleotide sequences as described herein. The polynucleotide sequences contained in the polynucleotide product may be the same or different. The plurality of polynucleotide sequences may be related to each other in any number or independent of each other. In one embodiment, the polynucleotide product comprises a plurality of polynucleotide sequences that are separate from each other.

In exemplary embodiments, the polynucleotide products described herein comprise: a fragment comprising a cell proliferation factor gene, a first recombinase coding sequence, an estrogen receptor ER coding sequence, and a recognition site for a second recombinase; and optionally, a polynucleotide sequence encoding a fusion protein of a polynucleotide sequence of a second recombinase and an estrogen or a complement thereof with an ER receptor, and a polynucleotide sequence comprising the structure: from the 5 'end to the 3' end, the polynucleotide sequence of the recognition site of the first recombinase, the termination sequence, the recognition site of the first recombinase and the marker coding sequence.

Also provided herein are one or more nucleic acid constructs comprising one or more polynucleotide sequences described herein. The nucleic acid construct may also contain one or more regulatory sequences or sequences required for homologous recombination with the genome operably linked to the sequence of the polynucleotide sequence. The nucleic acid construct may be a vector. For example, the polynucleotide sequences herein can be inserted into a recombinant expression vector or a gene knock-in vector. In some embodiments, the polynucleotide sequences herein are contained on the same nucleic acid construct. In certain embodiments, the polynucleotide sequences herein are contained on different nucleic acid constructs.

The term "recombinant expression vector" refers to a bacterial plasmid, bacteriophage, yeast plasmid, plant cell virus, mammalian cell virus such as adenovirus, retrovirus, or other vectors well known in the art. Any plasmid or vector may be used as long as it can replicate and is stable in the host. An important feature of expression vectors is that they generally contain an origin of replication, a promoter, a marker gene and translation control elements. The expression vector may also include a ribosome binding site for translation initiation and a transcription terminator. The polynucleotide sequences described herein are operably linked to an appropriate promoter in an expression vector to direct mRNA synthesis via the promoter. Representative examples of such promoters are: lac or trp promoter of E.coli; a lambda phage PL promoter; eukaryotic promoters include CMV immediate early promoter, HSV thymidine kinase promoter, early and late SV40 promoter, LTR of retrovirus, and other known promoters capable of controlling gene expression in prokaryotic or eukaryotic cells or viruses. Marker genes can be used to provide phenotypic traits useful for selection of transformed host cells, including but not limited to dihydrofolate reductase, neomycin resistance, and Green Fluorescent Protein (GFP) for eukaryotic cell culture, or tetracycline or ampicillin resistance for E.coli. When the polynucleotides described herein are expressed in higher eukaryotic cells, transcription will be enhanced if an enhancer sequence is inserted into the vector. Enhancers are cis-acting elements of DNA, usually about 10 to 300 base pairs, that act on a promoter to increase transcription of a gene.

The vectors described herein may be transformed into an appropriate host cell to enable expression of the proteins described herein. In certain embodiments, the polynucleotide or cell marker system described herein is contained in the genome of the host cell. The host cell may be a prokaryotic cell, such as a bacterial cell; or lower eukaryotic cells, such as yeast cells; filamentous fungal cells, or higher eukaryotic cells, such as mammalian cells. The host cell may also be a plant cell. Representative examples of host cells are: e.coli; streptomyces; bacterial cells of salmonella typhimurium; fungal cells such as yeast, filamentous fungi; a plant cell; insect cells of Drosophila S2 or Sf 9; CHO, COS, 293 cells, or Bowes melanoma cells.

Transformation of a host cell with recombinant DNA can be carried out using conventional techniques well known to those skilled in the art. When the host is prokaryotic, e.g., E.coli, competent cells capable of DNA uptake can be harvested after exponential growth phase using CaCl₂Methods, the steps used are well known in the art. Another method is to use MgCl₂. If desired, transformation can also be carried out by electroporation. When the host is a eukaryote, the following DNA transfection methods may be used: calcium phosphate coprecipitation, conventional mechanical methods such as microinjection, electroporation, liposome encapsulation, etc. The mouse DNA transfection method may be a fertilized egg injection method.

After transformation of the host cell, the resulting transformant can be cultured by conventional methods to allow expression of the fusion protein described herein. The medium used in the culture may be selected from various conventional media depending on the host cell used. The recombinant fusion proteins herein can be isolated and purified using various isolation methods known in the art. Such methods are well known to those skilled in the art and include, but are not limited to: conventional renaturation treatment, treatment with a protein precipitant (such as salt precipitation), centrifugation, cell lysis by osmosis, sonication, ultracentrifugation, molecular sieve chromatography (gel filtration), adsorption chromatography, ion exchange chromatography, High Performance Liquid Chromatography (HPLC), and other various liquid chromatography techniques, and combinations thereof.

Accordingly, host cells comprising the proteins or polynucleotides or expression vectors described herein are also included herein. Such host cells can constitutively express the proteins described herein, can also express the proteins described herein under certain induction conditions, and can also specifically express the proteins described herein in different host cell types. Methods of how to make host cells constitutively express, inducibly express, or specifically express a protein of the invention are well known in the art. For example, in certain embodiments, inducible expression of a protein is achieved by constructing an expression vector of the invention using an inducible promoter. In certain embodiments, tissue-specific expression of a protein is achieved using a tissue-specific expression promoter or associating the coding sequence of the protein with a tissue-specific gene.

The knock-in vector is used to knock the polynucleotide sequences described herein into a region of interest in the genome. Typically, the knock-in vector will contain, in addition to the polynucleotide sequence, a 5 'homology arm and a 3' homology arm required for homologous recombination of the genome. In certain embodiments, the nucleic acid constructs herein comprise a first segment of a cell proliferation factor gene that is a 5 'homology arm, a coding sequence for a chimeric recombinase described herein, and a second segment of a cell proliferation factor gene that is a so-called 3' homology arm. In other embodiments, the nucleic acid constructs herein comprise a 5 'homology arm, a polynucleotide sequence described herein, and a 3' homology arm. When using knock-in vectors, the CRISPR/Cas9 technique can be simultaneously utilized to homologously recombine a polynucleotide sequence to a location of interest. The CRISPR/Cas9 technology is used for guiding Cas9 nuclease to modify genome at an insertion position by designing a guide RNA aiming at a target gene, so that the homologous recombination efficiency of a gene modification region is increased, and a target fragment contained in a gene knock-in vector is subjected to homologous recombination to the target site. Cas9 nuclease may be a Cas9 nuclease well known in the art.

When the polynucleotide sequence described herein is recombined to a position of interest using CRISPR/Cas9 technology, a guide RNA target sequence for the genome of the site of interest is designed and transcribed in vitro according to the sequence to obtain a guide RNA for the gene. Also, knock-in vectors are constructed for recombination of fragments of interest, which may be the coding sequences of proteins (e.g., chimeric recombinases) described herein or polynucleotide sequences described herein. Then, the in vitro transcribed guide RNA and the constructed knock-in vector are co-transformed into a cell of interest (for example, a fertilized egg), and then cells in which the desired fragment is knocked in the site of interest in the genome are selected.

The invention also includes methods of introducing (e.g., by genetic recombination) a polynucleotide sequence described herein into the genome of a mouse, thereby obtaining a transgenic mouse. Methods for obtaining transgenic animals such as transgenic mice are known in the art, such as fertilized egg injection, embryonic stem cell injection, and the like. In addition, different transgenic mice can be mated to obtain progeny mice with multiple polynucleotides of interest, thereby creating a transgenic mouse model.

The invention provides methods for constructing transgenic animals comprising introducing a nucleic acid molecule, polynucleotide product or nucleic acid construct as described herein into animal cells containing a cell proliferation factor gene and selecting transgenic animals that undergo homologous recombination.

The invention provides a method for constructing a transgenic animal, which comprises the following steps: (1) providing a first transgenic animal having in its genome one or more nucleic acid constructs; (2) providing a second transgenic animal whose genome comprises another one or more nucleic acid constructs that are different from the one or more nucleic acid constructs; and (3) mating the first transgenic animal and the second transgenic animal, and performing homologous recombination in a progeny animal to obtain a progeny transgenic animal.

The invention provides a method for long-term in vivo cell labeling comprising providing an animal comprising a nucleic acid molecule, polynucleotide product or nucleic acid construct as described herein, and labeling cells expressing the cell proliferation factor gene in the animal in the presence of an inducing agent.

For example, in the co-presence of Ki67-CrexER, DreER, and Rosa26-LoxP-stop-LoxP-GFP, tamoxifen induces Dre in DreER to recognize rox in the crexER gene sequence, and homologous recombination occurs to change crexER to Cre, so that Ki67 is co-expressed and released once Cre is expressed. Cre recognizes LoxP and activates GFP expression (fig. 1, b and fig. 3, a). The cell marking method of the present invention enables continuous tracking of gene expression over a long period of time.

The invention also provides kits comprising a chimeric recombinase, fusion protein, coding sequence, nucleic acid molecule, polynucleotide sequence product, nucleic acid construct, or host cell as described herein, and reagents necessary for knocking-in the nucleic acid molecule of the invention into the genome of the cell.

Embodiments of the present invention will be described in detail with reference to examples. It will be appreciated by those skilled in the art that the following examples are illustrative of the invention only and should not be taken as limiting the scope of the invention. The examples do not show the specific techniques or conditions, and the techniques or conditions are described in the literature in the art (for example, refer to molecular cloning, a laboratory Manual, third edition, scientific Press, written by J. SammBruker et al, Huang Petang et al) or according to the product instructions. The reagents or instruments used are not indicated by the manufacturer, and are all conventional products commercially available.

Examples

Example 1 construction of mice

Ki67-CrexER mouse

The strategy for constructing a genetic tool mouse Ki67-CrexER (figure 2, a) is to adopt CRISPR/Cas9 technology and knock in a 2A-CrexER2 expression frame at the stop codon site of Mki67 gene in a homologous recombination mode. The brief procedure is as follows: obtaining Cas9 mRNA and gRNA (SEQ ID NO:11) by means of in vitro transcription; a homologous recombinant vector (donor vector) was constructed In the frame of the PBR322 plasmid by the In-Fusion cloning method, which contained a 4.1kb 5 'homology arm, a 2.1kb KI fragment and a 4.0kb 3' homology arm (primer sequences used: SEQ ID NOS: 7-8). Cas9 mRNA, gRNA, and donor vector were microinjected into fertilized eggs of C57BL/6J mice to obtain F0-generation mice. Obtaining correct homologous recombination F0 generation mice through long fragment PCR identification; mice of F0 generation were mated with C57BL/6J mice to obtain positive F1 generation mice.

The constructed tool mice were validated by staining with ESR (estrogen receptor immunofluorescence staining) and Ki67 or EDU (thymidine analogue staining) (fig. 2, b), and the results showed that in this faster proliferating tissue of the small intestine, both Ki67 expression and ESR expression were concentrated in the bottom fossa structure. While ESR and EdU staining results showed that both were co-localized in the same cells, indicating that the tool mouse was constructed to consistently label proliferating cells. It was further verified by mating with R26-GFP mice whether the tool mice could develop inducible homologous recombination reactions (fig. 2, c). The results show that under the induction of tamoxifen, the Ki67-CrexER tool mouse can mark paired cells in the aortic endothelium of adult mice, which indicates that the tool mouse can generate inducible homologous recombination reaction and capture the proliferative activity of the cells. The above results demonstrate the success of the genetic tool, mouse Ki67-CrexER construction.

R26-DreER mice

An R26-DreER mouse is constructed by adopting an ES cell targeting mode, and a CAG promoter-DREERT2-polyA expression frame is inserted into a Rosa26 gene site at a fixed point. The brief procedure is as follows: an ES cell targeting vector was constructed by In-Fusion cloning, which contained 1.087kb 5 'homology arm, CAG promoter, DREERT2 coding region, FRT-PGK-Neo-polyA-FRT, 4.259kb 3' homology arm, and MC1-DTA-polyA negative selection marker (primer sequence: SEQ ID NO: 9-10). After linearization, the vector was electrotransfected into C57/129ES cells. After G418 and Ganc drug screening, 144 resistant ES cell clones are obtained in total; through long-fragment PCR identification, 7 positive clones with correct homologous recombination are obtained. Positive ES cell clones were expanded and injected into blastocysts of C57/129 mice to obtain chimeric mice. A high proportion of chimeric mice were mated with C57/129 mice to obtain 4 positive F1 generation Neo-containing mice.

R26-GFP mouse

R26-GFP mice were obtained from the Allen institute for Brain science laboratory. The mouse has LoxP-stop-LoxP-GFP inserted into the Rosa26 gene site.

Example 2 construction and validation of the proliferating cell marker System ProTracker

Ki 67-CrexER; R26-DreER; the system consisting of R26-GFP is called ProTracker (FIG. 3, b). In the presence of Ki67-CrexER in combination with R26-DreER, tamoxifen induces Dre in DreER to recognize rox in CrexER into the nucleus, and homologous recombination occurs to change CrexER to Cre so that Ki67 is released once Cre is expressed (fig. 1, b, fig. 3, a). Ki67-CrexER can be regarded as a traditional method for tracking proliferation. Adult mice obtained by mating Ki67-CrexER with R26-GFP and protacker mice were induced with tamoxifen after adulthood, and were sampled at the beginning of the chase (two days) and four weeks later, and the difference in proliferation signals traced by both were observed (fig. 3, b and c). The results show that the traditional method of tracking proliferation did not differ from ProTracker in the onset, and little GFP positive cell signal was detected in tissues other than the small intestine, indicating that adult tissues or organs proliferated less (FIG. 3, d). The ProTracker can realize long-time continuous tracking, and samples collected around the traditional proliferation tracing method and the ProTracker show that the proliferation signals tracked by the traditional proliferation tracing method and the ProTracker are obviously different. In some tissue organs that proliferate relatively slowly, such as the heart, lung, liver, pancreas, and kidney, the ProTracker traces significantly more signals than traditional proliferation traces, suggesting that we can see almost all of the cell signals that proliferate during the tracing, thanks to the accumulation of signals by the ProTracker system. In muscle and brain, since cells themselves proliferate slowly, neither system captures proliferation signals. Almost all small intestine epithelial cells were captured by the ProTracker in the rapidly proliferating tissue of the small intestine, indicating that the ProTracker system is more efficient (FIG. 3, d).

The above results demonstrate that compared with the traditional method for tracing proliferation signals, ProTracker performs signal accumulation by seamlessly tracing cell proliferation signals induced by tamoxifen, finally presents all the proliferated cell signals, and restores the cell proliferation status in vivo for a certain period of time.

Example 3 verification of reliability

Mating R26-DreER tool mice with LoxP reporter mice R26-GFP mice and tamoxifen induction in adults, collecting individual tissues for full tissue fluorescence or slice immunofluorescence imaging (fig. 4), results show that essentially all tissue organs have no GFP fluorescence signal, indicating that DreER and LoxP in the present invention tracking cell proliferation system, protacker, do not mix homologous recombination reactions.

The ProTracker tracing in vivo proliferation signals are initiated by tamoxifen induction, and in order to prove that the whole set of system is really controlled by tamoxifen, the fossa sible of the tracer proliferation tool mice of the traditional tracer system and the ProTracker system is selected as a control without tamoxifen induction, and the result shows that the mice without tamoxifen induction in the traditional tracer proliferation system basically have no green fluorescence signals (figure 5, a), which indicates that the traditional tracer system basically has no leakage. The ProTracker system does not induce tamoxifen and basically has no green fluorescent signals in all tissues and organs of the whole body, which indicates that the ProTracker system is really controlled by the induction of tamoxifen. This indicates that the proliferating cell marking and tracking system of the present invention can truly and reliably track the in vivo cell proliferation signal.

Example 4 tracking of proliferating cells of liver Using ProTracker

The ProTracker system constructed in example 2 was used to specifically study cells proliferating in vivo. The liver is an organ having a regenerating ability composed of many liver lobules, in which hepatocytes have a certain proliferative ability. Meanwhile, the liver as a metabolic organ liver cell performs different metabolic functions in different regions of the hepatic lobule, and the hepatic cell along the blood flow flowing from the portal vein to the central vein in the hepatic lobule structure constituting the liver can be roughly divided into three regions: region 1 of E-CAD +, region 3 of GS +, and region 2 located therebetween (FIG. 6, b). Since cells in different regions exert different metabolic functions, it is also considered in the art that these cells have different proliferative capacities. The previous research uses molecular markers of hepatocytes in different regions to construct a genetic tool, and mice carry out lineage tracing on the hepatocytes in the regions, and the conclusion is that the hepatocytes in the central venous region have stronger proliferation capacity under physiological steady state. However, the lineage tracing technology using the hepatocyte markers in specific areas can only trace the change of a single group of hepatocytes at a time, and cannot probe the proliferation change of the hepatocytes at the whole level. To study hepatocyte proliferation at a global level, a set of genetic lineage tracing techniques independent of molecular markers in specific regions of hepatocytes is used. The ProTracker system of the present invention can meet this requirement and study the source and fate of newly generated hepatocytes.

The cells proliferation tracking was initiated by tamoxifen induction in mice of the ProTracker system after adult life, and samples were collected at various time points after induction (FIG. 6, a) to observe where the area in which hepatocytes initially started to proliferate and then the proliferating cells slowly migrated. The results show that at the very beginning of the tracking (zero and two days after tamoxifen induction), essentially no proliferating hepatocyte signals were captured (fig. 6, c). When the tracking time was extended to two weeks, sporadic proliferation signals appeared at the 1 and 2 positions of the liver lobules and essentially no proliferation signals appeared in zone 3. By the fourth week, the proliferation signal of region 2 was significantly increased, the proliferation signal of region 1 was also increased, and there was still substantially no proliferation in region 3. This was followed by week 6, week 8, week 10, and week 12 with hepatocyte proliferation remaining, with zone 2 proliferating a maximum of 1 time and zone 3 showing less hepatocyte proliferation signals (fig. 6, d and e). We therefore believe that the most vigorous hepatocyte proliferation in hepatic lobules of the liver at physiological homeostasis is hepatocytes in region 2, which is located in the middle of the portal and central venous regions.

Example 5 tracking of proliferating cells of the Heart Using ProTracker

Cardiomyocytes were initially considered incapable of proliferation as a terminally differentiated cell type. Recent studies suggest that cardiomyocytes in adult hearts can produce new cardiomyocytes by proliferation. However, the studies mainly used isotope incorporation methods, which on the one hand introduce the problem of detection difficulties and on the other hand, because cardiomyocytes are polyploidy and isotope incorporation mainly detects nuclei, it is difficult to distinguish nuclear polyploidization phenomena and cell division phenomena occurring in cardiomyocytes. In addition, whether the newly generated cardiomyocytes have regionality and whether the generation of new cardiomyocytes by cell division into two can be detected is still a problem to be solved.

Cardiomyocytes proliferate relatively slowly and no potential proliferation signal was captured by traditional proliferation marker staining and DNA analogue incorporation. While the traditional lineage tracing technique also missed most of the proliferated cardiomyocytes because of the short duration of tamoxifen action window, the ProTracer system of the present invention allows us to continuously capture the active signal of Ki67 because DreER nucleates Ki67-CrexER to Ki67-Cre under tamoxifen action (FIG. 7, b), thereby allowing signals to be superimposed from the very beginning of signal capture until the time of detection (FIG. 7, a). The superimposed proliferation signals are eventually present in the larger volume of cells, the cardiomyocytes, and facilitate the overall observation of the location and distance between all the signals.

Adult ProTracker mice were induced with tamoxifen and three months later heart tissue harvested for immunofluorescent staining to reveal many GFP + cardiomyocytes in the heart. As a result of immunofluorescence staining, it was statistically found that about 0.7% + -0.14% of cardiomyocytes had active expression of Ki67 within three months. Furthermore, we found trapped GFP⁺Most of the cardiomyocytes in (a) are located on the side of the left ventricular wall close to the endocardium, and the cardiomyocytes in the ventricular septum are also GFP + cardiomyocytes on the side facing the left ventricular cavity significantly more than those on the side facing the right ventricular cavity. The right ventricle wall of the heart was found to be essentially free of GFP + cardiomyocytes. To this end, we have discovered a population of actively proliferating cardiomyocytes encircling the left ventricular cavity in an adult heart using the ProTracker system.

Actively proliferating cells expressed Ki67 in G1, S, G2, and M phases of the cell cycle, so the proliferation signal of GFP + captured by the protacker included cell division as well as nuclear division (fig. 8, a). Next, the captured GFP + cardiomyocytes were further investigated. The isolated cardiomyocytes were subjected to Hoechst staining to find that GFP + cardiomyocytes containedMononuclear, binuclear and multinuclear individuals (FIG. 8, b), and GFP was found by counting the number of nuclei of these isolated cardiomyocytes⁺The number of mononuclear nuclei of the cardiomyocytes in comparison with GFP-is larger, and the number of binuclear and multinuclear nuclei is smaller, indicating that cell division occurs in the cardiomyocytes. The continuous multi-slice confocal scan captures both individual cardiomyocytes and two close-by cardiomyocytes (fig. 8, c and e). Successive slice scans showed that the individual cardiomyocytes captured were multinucleated cardiomyocytes and the two next cardiomyocytes were mononuclear cardiomyocytes (fig. 8, d and f). If the captured immediate cardiomyocytes are considered to be the result of cell division, actively proliferating cells within 12 weeks of the adult heart (GFP)⁺Cardiomyocytes) about 8% of the time that cell division occurs. Thus, we studied the proliferation of the adult heart using the ProTracker system and found that a population of proliferating cardiomyocytes located in the left ventricular cavity of the annulus captured the cardiomyocyte division phenomenon in the adult heart.

The invention realizes the goal of seamlessly tracking in-vivo cell proliferation signals by designing a new genetic tool mouse, and researches the proliferation of liver cells and myocardial cells in an adult mouse. The proliferation of various types of cells in vivo can also be followed using this technique. In addition, by replacing different DreER tools for hybridization, mice can individually track the proliferation of cells of a certain group in vivo, and the method has great significance for understanding the dynamic change of cells of various groups in vivo.

After reading the above teachings of the present invention, those skilled in the art may make various changes or modifications to the present invention, and such equivalents fall within the scope of the invention as defined by the appended claims.

Sequence listing

<110> Shanghai Life science research institute of Chinese academy of sciences

<120> in vivo cell proliferation marking and tracing system and application thereof

<130> 194954

<160> 11

<170> SIPOSequenceListing 1.0

<210> 1

<211> 8127

<212> DNA

<213> Artificial Sequence

<400> 1

actctgaagg aggtaaggac ccttgctgtc ttatattgtg tgagacccag aacattctaa 60

tctttatgcc caggttacaa aggcaaataa ggacatgcca gcagctcctg ctatgctagt 120

gggaaagcca tcattcgggc cacttacttt cttgtgtctg ttttggtaat tccactttga 180

atagtgaatt tacctattga gggtttcatt ccagaagagc tagctattta tatctaatac 240

cattttctga tgtttgtgat gtggcttatc ttggttatag atgacatttt cagcctctct 300

atgttggcaa cacttaacta aaacaaaggt gggcatgtca ggacctcaaa agatttcctt 360

taaaatagca gaatgagttg gctcagcagt tgagagcacc aaatactctt tcaaaggagc 420

cagttgttat tcctaggact cacatggctc acaactatct gtaactccag ttccagggga 480

tttgatggtc tcttctgacc tccttaaaca ccaagcaggc atatgcatag gtgaaggcaa 540

aaaacataaa aattaaaagg aataaatatg atttaaaaaa aataaataaa aaggcagaat 600

aagcccagta ttgtctatct cctaagcaaa taagtgaaaa taggtaaatt ctcctcagtg 660

ggaagtgtgt tgtttcagca tcccaagctc acagtactaa gctagtagac tgtaaaccca 720

tgtgctgtcc gtgctttgca tcattcccag tggcgtgctt gcattaagca gtagtctagg 780

accaggtaaa gatggggtgg gcaggagtga cacataattg ttctggacat tcaagttgaa 840

cctgaacata ctctgaacat tctgaaacag ctgatgggag ataaaggtat tgacagccgc 900

tcgaggcaga ggtgtgctga gctggcagtc ctacatgtga ctggcacagc acaagaagac 960

ccaggctttc cagaggtcat gaaacatccc atatagctga gggatggagc aagaggcatg 1020

gaagtctagt gcgaaggtgc agctaaagca cccagaatgg cggctctcaa cctcccaatg 1080

ctgtgactct ttaatacagt tcttctgtgg tgatccccaa ccataaaatt attttattgc 1140

tacttgatca ctaatgttgc tactgttatg aatcataatg taaatatctg gtatgcatga 1200

tatctgatat atgaaacaac ctgtgaaagg gttgtttgac ccccacaaag ggcttgagaa 1260

ccatggaagc gaggctcagg aaaggagagc aatacccagg aagtggaggt ggctgtccgg 1320

aagcagaaac tatctcatgg tggcaagcct tcattacaga caccaagaca caccaggaac 1380

tcagcctcat aggcttaaca agtatcttat ctttcctcag agctctaagc acagcttcat 1440

caccttgaaa gtagtacttt atcagaagga aatagaagga ataaaaccca ggttttttta 1500

gtcaaatgat cctgaacaca acaggcaagg cctgagggtg atcaggccag gtcatcgtgt 1560

catagacact caggtctctt ctcctcactg gtgcccagcg gatgtcatac actgacgagt 1620

tttcccagga ctatatcttt cttgtctgtt ctgttgtcca taggcccaga ttctacatat 1680

gtgtgtgtgg gaggggtggg tgcttcctgg cggtccatcc tgcaagtatg ctggagaagc 1740

aagcctctta tctggtgtgt gtgcctttct aacatgtgta cagtagatcc atctacctgt 1800

tattttctag aattcaacag cattcacata ccaagatctt cttgtccata ctgagcctca 1860

cacttaagag ttcctgtttt ccgtctccct ttcttaactg tccataatca ctcataaaac 1920

tgtgtctaaa gtatgcccag catcaccctc ggctttttct aatttttgtt ggatgggcct 1980

tgtgtttatc aggtaccaga actttgggtc atttgctcta agaggctatt gtgacctttt 2040

gctttctgta gattggatcc ccgcttccag gtagatgggc ctgcctctat ctccccactg 2100

ccttagagga cccactatcc ttttgcagcc acataggaga cctcaggaca cagtgactgt 2160

cctttgtctg tgggaagttg gctttaggat acttaagttt tcatctaggc cacagtcaaa 2220

ttttgtgaat gatgtttttt aattagtgaa ccacatacag tgatagagac cgtgtatgct 2280

ttagaaactt gtgaaagagc acagagtttg agttttaaaa actaagttaa aaaaaaacat 2340

ttaggaagaa acaaccttat ggtaagcatt gtaaaaggat ttccaacttt aatttttttc 2400

tttttaaaaa cactttgtag ccaggcagtg gtggtgcaca cctttaatcc cagcacttgg 2460

gaggcagagg caggtggatt tctgagttcg aggccagcct ggcctacaga gtgagttcca 2520

ggacagccag ggctatacag aggaaccctg tcttgataaa ccaaacaaac aaaaaagaaa 2580

aacactttgt ttttgttttg tgggtttttt tttttaatgt atactgagta ttttgccgtg 2640

tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tatcatgtgc ttgcctgcgg 2700

aggtcagtca gaaaagggca tctggttccc tggatggttg tgagccacca tgtggatgct 2760

gggagttaaa cttaggtcct ctgtaagtgc agaaagtgcc cctagacact gagccatctt 2820

tccagacctt caatgttaat ttatagatga gagacctgaa tgacacctag taaggacaag 2880

gggctcattg agttgaggtg atcacaaaaa ttgttccttc atattattta ggagtgtggt 2940

tttttttttc ccaaacagga tgaagacatt gtatgcacca agaagttaag aacaagaagt 3000

ggaagcggag ctactaactt cagcctgctg aagcaggctg gcgacgtgga ggagaaccct 3060

ggtcctatgg gctccaattt actgaccgta caccaaaatt tgcctgcatt accggtcgat 3120

gcaacgagtg atgaggttcg caagaacctg atggacatgt tcagggatcg ccaggcgttt 3180

tctgagcata cctggaaaat gcttctgtcc gtttgccggt cgtgggcggc atggtgcaag 3240

ttgaataacc ggaaatggtt tcccgcagaa cctgaagatg ttcgcgatta tcttctatat 3300

cttcaggcgc gcggtctggc agtaaaaact atccagcaac atttgggcca gctaaacatg 3360

cttcatcgtc ggtccgggct gccacgacca agtgacagca atgctgtttc actggttatg 3420

cggcggatcc gaaaagaaaa cgttgatgcc ggtgaacgtg caaaacaggc tctagcgttc 3480

gaacgcactg atttcgacca ggttcgttca ctcatggaaa atagcgatcg ctgccaggat 3540

atacgtaatc tggcatttct ggggattgct tataacaccc tgttacgtat agccgaaatt 3600

gccaggatca gggttaaaga tatctcacgt actgacggtg ggagaatgtt aatccatatt 3660

ggcagaacga aaacgctggt tagcaccgca ggtgtagaga aggcacttag cctgggggta 3720

actaaactgg tcgagcgatg gatttccgtc tctggtgtag ctgatgatcc gaataactac 3780

ctgttttgcc gggtcagaaa aaatggtgtt gccgcgccat ctgccaccag ccagctatca 3840

actcgcgccc tggaagggat ttttgaagca actcatcgat tgatttacgg cgctaaggat 3900

gactctggtc agagatacct ggcctggtct ggacacagtg cccgtgtcgg agccgcgcga 3960

gatatggccc gcgctggagt ttcaataccg gagatcatgc aagctggtgg ctggaccaat 4020

gtaaatattg tcatgaacta tatccgtaac ctggatagtg aaacaggggc aatggtgcgc 4080

ctgctggaag atggcgatct aactttaaat aattggcatt atttaaagtt actcgagcca 4140

tctgctggag acatgagagc tgccaacctt tggccaagcc cgctcatgat caaacgctct 4200

aagaagaaca gcctggcctt gtccctgacg gccgaccaga tggtcagtgc cttgttggat 4260

gctgagcccc ccatactcta ttccgagtat gatcctacca gacccttcag tgaagcttcg 4320

atgatgggct tactgaccaa cctggcagac agggagctgg ttcacatgat caactgggcg 4380

aagagggtgc caggctttgt ggatttgacc ctccatgatc aggtccacct tctagaatgt 4440

gcctggctag agatcctgat gattggtctc gtctggcgct ccatggagca cccagtgaag 4500

ctactgtttg ctcctaactt gctcttggac aggaaccagg gaaaatgtgt agagggcatg 4560

gtggagatct tcgacatgct gctggctaca tcatctcggt tccgcatgat gaatctgcag 4620

ggagaggagt ttgtgtgcct caaatctatt attttgctta attctggagt gtacacattt 4680

ctgtccagca ccctgaagtc tctggaagag aaggaccata tccaccgagt cctggacaag 4740

atcacagaca ctttgatcca cctgatggcc aaggcaggcc tgaccctgca gcagcagcac 4800

cagcggctgg cccagctcct cctcatcctc tcccacatca ggcacatgag taacaaaggc 4860

atggagcatc tgtacagcat gaagtgcaag aacgtggtgc ccctctatga cctgctgctg 4920

gaggcggcgg acgcccaccg cctacatgcg cccactagcc gtggaggggc atccgtggag 4980

gagacggacc aaagccactt ggccactgcg ggctctactt catcgcattc cttgcaaaag 5040

tattacatca cgggggaggc agagggtttc cctgccacag cttgataact aactttaaat 5100

aattggcatt atttaaagtt atgataagaa caagaagtta ccagaaaagt gaaactatgt 5160

agcaaagaca tttaagaagg aaaagtaaat ttgacttagt gataagttcc agtgtggttt 5220

tcacctccag tgtaaagatg aactgtaaat actactgcta ctgcctgagt ttaaggaagg 5280

aagctttgag ctttcctggt catactctct tcagacgcca atggaggtca tgaggaagat 5340

caccagggat ctcagcgcaa ttacagttta ggggtgagca ggcagaaatg tggccctctg 5400

tcctatccaa taaagctctg aaattcgctg cctcctttgg cctctctgac aactgcagct 5460

gctcccctct gccctcatga aggaggggaa ggtggtgccc ctccattcat tagacattgg 5520

ttgtgcagtt atatcagcca accttacaca ggatgactgt acggtggagt ggttggtttg 5580

taggctacac cattagtcac ttacgcaagt cagcctaatc ctctgggcct gtgacctttg 5640

ggagaaacat ctgacaagga tggctgccga gctcccttca ggggcacggg tcgctatgtt 5700

aaagagcggt tgatgtctgt gcttttcatt aggcctctgt attgagtgga ttggctgcct 5760

tgcctgtgga acctttgctg ctggggagtc tcctgtcccc actggagtct ccactccagt 5820

ctcctgtcct agcgtctgct tttatcacgg gctttctctg acctcttgcc tggcagcaca 5880

aggccatcct ggtgtctggt atgagatgct tatcttccaa gtttcacttt aaccctaaac 5940

tcttttctgt tggaaaccac tgcgcatttg catatgcaac tttgtgcttt tcactctgcc 6000

tgctagtccc ctttctgttt tccagcagta acatatctgc tggtgctgga agagagccta 6060

gagtgtgccc tggtcagcca ttgccctaac ctcttcactt ctccatctcc tgtctgagat 6120

acaggtgaag aacactgggt acgcaggtga gaaacactga gtggaggccg ggatttagca 6180

ttttgggtga gtctgggagt tctgccattt catctacctc aggaattctg taatcaagga 6240

atggcaactg gttattaata agggggcaaa agcttcatag ggtgggtaac agtggaactg 6300

gcaaaggaga ttgtgtagag cagaaggcac aggaaaagag cgcccttttt acctgttaga 6360

gggtgtgagg catgaaagtg cccttaattg acttaaatcc taaagtcaaa gtctttgaag 6420

taacaggaac cttgactatg aattgctctg atgtagaatt agaaatatca catgtatgtg 6480

ggaaattgta gtcaactgca tgctgattga atggaactgg gtgataaggg aaaggcctgc 6540

tcagttatag gaaattctgt ctgagccatg ttagcacatt ttctcactta ggacagatgt 6600

gacggctctg aagcagctgc tatgcaggca agaaggcaag agcagattag cagaacctat 6660

gtctgagctg ggcctggtga cataggtctg caaccccagc attgggatat ggagtcaggt 6720

atatgagtgc ccgaggcttg ctagccagcc accctagcca aagaggatcc agtagaaaga 6780

tgtcccaaat cagcctacat acatgtctgc ttgtgtgggc tgatgtgtgc acttggtatg 6840

tatatatgca cacacatgca gccaccatgt aacctaaaac gctcatttga gggtgatacc 6900

attgccaaga cattcttaga acacatcctc tatttatctc tgtgtgcaca tctgagaaag 6960

acccacttgt tggttgattg taacaaatat ccacccattc ctcaagtgtt tagctatggt 7020

ccctagcaat gtcagtttcc cagcagaaag catgatggga gattcccaag aaaggagtgc 7080

tgtacttttt gcctcccaga tctgtgactc ttcctgtttt gttgtcattt gctcctgccc 7140

ttctcataaa cagctactgt tttccctgcc tggaacttga cccagccccg cacttcatca 7200

attgtattca ttggaatgat gaacttagct ccaagaagct tcctggcctc tccactgcag 7260

ccactgtccc gggttaggaa cggcaggtcc ttagttgtca gcagcatcta ggcacctagt 7320

gagaatcggc atctgtatta gtcagggttc tctagagtca cagaacttag gaatagtctc 7380

tatatagtaa aggaatttat tgatgattta tagtagtcca attcccaaca atggttcagt 7440

agaagctgtg aatggaagtc caaagatcta gcagttactc agtcccacac ggcaagcagg 7500

cgaaggagca agagccagac tcccttcttc caatgtcctt atatggtctc cagcagaagg 7560

tgtagcccag attaaaggtc tgtcccacca cacctttaat cccagatgac cttgaactca 7620

gagatctccc tgtcttaatc ttctggaatc catagccact atgcctcaag atctccatac 7680

caagatccag atcagaaact tccatctccc agcctccaga ttagggtcac tggtgagcct 7740

tccaattctg gattgtagtt cattccaaat atagtcaagt tgacagctgg gaatagccac 7800

tacagcatcc taaatgcaat tttcatcccc ttgactaaac tgatttagtt taatagcatg 7860

taatctcagc ttgcctgatg attgcaatgt gacttggggc aaatctttaa caggcagttt 7920

tctggtctat agaatgatgt tctcagtgct ccatctcagg gtagttaaga tgaacagaat 7980

agaatactgc ttgcagctcc tgtagccttt ggccagtgct tggagtcaag ctgggtcatg 8040

agggctttct ccactgagaa ggtagaagga agatttggag caccgaagtc tcagcactag 8100

attttatatg atgtcctgaa cagggaa 8127

<210> 2

<211> 666

<212> PRT

<213> Artificial Sequence

<400> 2

Met Gly Ala Ser Glu Leu Ile Ile Ser Gly Ser Ser Gly Gly Phe Leu

1 5 10 15

Arg Asn Ile Gly Lys Glu Tyr Gln Glu Ala Ala Glu Asn Phe Met Arg

20 25 30

Phe Met Asn Asp Gln Gly Ala Tyr Ala Pro Asn Thr Leu Arg Asp Leu

35 40 45

Arg Leu Val Phe His Ser Trp Ala Arg Trp Cys His Ala Arg Gln Leu

50 55 60

Ala Trp Phe Pro Ile Ser Pro Glu Met Ala Arg Glu Tyr Phe Leu Gln

65 70 75 80

Leu His Asp Ala Asp Leu Ala Ser Thr Thr Ile Asp Lys His Tyr Ala

85 90 95

Met Leu Asn Met Leu Leu Ser His Cys Gly Leu Pro Pro Leu Ser Asp

100 105 110

Asp Lys Ser Val Ser Leu Ala Met Arg Arg Ile Arg Arg Glu Ala Ala

115 120 125

Thr Glu Lys Gly Glu Arg Thr Gly Gln Ala Ile Pro Leu Arg Trp Asp

130 135 140

Asp Leu Lys Leu Leu Asp Val Leu Leu Ser Arg Ser Glu Arg Leu Val

145 150 155 160

Asp Leu Arg Asn Arg Ala Phe Leu Phe Val Ala Tyr Asn Thr Leu Met

165 170 175

Arg Met Ser Glu Ile Ser Arg Ile Arg Val Gly Asp Leu Asp Gln Thr

180 185 190

Gly Asp Thr Val Thr Leu His Ile Ser His Thr Lys Thr Ile Thr Thr

195 200 205

Ala Ala Gly Leu Asp Lys Val Leu Ser Arg Arg Thr Thr Ala Val Leu

210 215 220

Asn Asp Trp Leu Asp Val Ser Gly Leu Arg Glu His Pro Asp Ala Val

225 230 235 240

Leu Phe Pro Pro Ile His Arg Ser Asn Lys Ala Arg Ile Thr Thr Thr

245 250 255

Pro Leu Thr Ala Pro Ala Met Glu Lys Ile Phe Ser Asp Ala Trp Val

260 265 270

Leu Leu Asn Lys Arg Asp Ala Thr Pro Asn Lys Gly Arg Tyr Arg Thr

275 280 285

Trp Thr Gly His Ser Ala Arg Val Gly Ala Ala Ile Asp Met Ala Glu

290 295 300

Lys Gln Val Ser Met Val Glu Ile Met Gln Glu Gly Thr Trp Lys Lys

305 310 315 320

Pro Glu Thr Leu Met Arg Tyr Leu Arg Arg Gly Gly Val Ser Val Gly

325 330 335

Ala Asn Ser Arg Leu Met Asp Ser Ala Ser Gly Ala Arg Arg Ile Cys

340 345 350

Val Arg Gly Ser Met Arg Ala Ala Asn Leu Trp Pro Ser Pro Leu Met

355 360 365

Ile Lys Arg Ser Lys Lys Asn Ser Leu Ala Leu Ser Leu Thr Ala Asp

370 375 380

Gln Met Val Ser Ala Leu Leu Asp Ala Glu Pro Pro Ile Leu Tyr Ser

385 390 395 400

Glu Tyr Asp Pro Thr Arg Pro Phe Ser Glu Ala Ser Met Met Gly Leu

405 410 415

Leu Thr Asn Leu Ala Asp Arg Glu Leu Val His Met Ile Asn Trp Ala

420 425 430

Lys Arg Val Pro Gly Phe Val Asp Leu Thr Leu His Asp Gln Val His

435 440 445

Leu Leu Glu Cys Ala Trp Leu Glu Ile Leu Met Ile Gly Leu Val Trp

450 455 460

Arg Ser Met Glu His Pro Val Lys Leu Leu Phe Ala Pro Asn Leu Leu

465 470 475 480

Leu Asp Arg Asn Gln Gly Lys Cys Val Glu Gly Met Val Glu Ile Phe

485 490 495

Asp Met Leu Leu Ala Thr Ser Ser Arg Phe Arg Met Met Asn Leu Gln

500 505 510

Gly Glu Glu Phe Val Cys Leu Lys Ser Ile Ile Leu Leu Asn Ser Gly

515 520 525

Val Tyr Thr Phe Leu Ser Ser Thr Leu Lys Ser Leu Glu Glu Lys Asp

530 535 540

His Ile His Arg Val Leu Asp Lys Ile Thr Asp Thr Leu Ile His Leu

545 550 555 560

Met Ala Lys Ala Gly Leu Thr Leu Gln Gln Gln His Gln Arg Leu Ala

565 570 575

Gln Leu Leu Leu Ile Leu Ser His Ile Arg His Met Ser Asn Lys Gly

580 585 590

Met Glu His Leu Tyr Ser Met Lys Cys Lys Asn Val Val Pro Leu Tyr

595 600 605

Asp Leu Leu Leu Glu Ala Ala Asp Ala His Arg Leu His Ala Pro Thr

610 615 620

Ser Arg Gly Gly Ala Ser Val Glu Glu Thr Asp Gln Ser His Leu Ala

625 630 635 640

Thr Ala Gly Ser Thr Ser Ser His Ser Leu Gln Lys Tyr Tyr Ile Thr

645 650 655

Gly Glu Ala Glu Gly Phe Pro Ala Thr Ala

660 665

<210> 3

<211> 344

<212> PRT

<213> Artificial Sequence

<400> 3

Met Gly Ser Asn Leu Leu Thr Val His Gln Asn Leu Pro Ala Leu Pro

1 5 10 15

Val Asp Ala Thr Ser Asp Glu Val Arg Lys Asn Leu Met Asp Met Phe

20 25 30

Arg Asp Arg Gln Ala Phe Ser Glu His Thr Trp Lys Met Leu Leu Ser

35 40 45

Val Cys Arg Ser Trp Ala Ala Trp Cys Lys Leu Asn Asn Arg Lys Trp

50 55 60

Phe Pro Ala Glu Pro Glu Asp Val Arg Asp Tyr Leu Leu Tyr Leu Gln

65 70 75 80

Ala Arg Gly Leu Ala Val Lys Thr Ile Gln Gln His Leu Gly Gln Leu

85 90 95

Asn Met Leu His Arg Arg Ser Gly Leu Pro Arg Pro Ser Asp Ser Asn

100 105 110

Ala Val Ser Leu Val Met Arg Arg Ile Arg Lys Glu Asn Val Asp Ala

115 120 125

Gly Glu Arg Ala Lys Gln Ala Leu Ala Phe Glu Arg Thr Asp Phe Asp

130 135 140

Gln Val Arg Ser Leu Met Glu Asn Ser Asp Arg Cys Gln Asp Ile Arg

145 150 155 160

Asn Leu Ala Phe Leu Gly Ile Ala Tyr Asn Thr Leu Leu Arg Ile Ala

165 170 175

Glu Ile Ala Arg Ile Arg Val Lys Asp Ile Ser Arg Thr Asp Gly Gly

180 185 190

Arg Met Leu Ile His Ile Gly Arg Thr Lys Thr Leu Val Ser Thr Ala

195 200 205

Gly Val Glu Lys Ala Leu Ser Leu Gly Val Thr Lys Leu Val Glu Arg

210 215 220

Trp Ile Ser Val Ser Gly Val Ala Asp Asp Pro Asn Asn Tyr Leu Phe

225 230 235 240

Cys Arg Val Arg Lys Asn Gly Val Ala Ala Pro Ser Ala Thr Ser Gln

245 250 255

Leu Ser Thr Arg Ala Leu Glu Gly Ile Phe Glu Ala Thr His Arg Leu

260 265 270

Ile Tyr Gly Ala Lys Asp Asp Ser Gly Gln Arg Tyr Leu Ala Trp Ser

275 280 285

Gly His Ser Ala Arg Val Gly Ala Ala Arg Asp Met Ala Arg Ala Gly

290 295 300

Val Ser Ile Pro Glu Ile Met Gln Ala Gly Gly Trp Thr Asn Val Asn

305 310 315 320

Ile Val Met Asn Tyr Ile Arg Asn Leu Asp Ser Glu Thr Gly Ala Met

325 330 335

Val Arg Leu Leu Glu Asp Gly Asp

340

<210> 4

<211> 34

<212> DNA

<213> Artificial Sequence

<400> 4

ataacttcgt atannntann ntatacgaag ttat 34

<210> 5

<211> 32

<212> DNA

<213> Artificial Sequence

<400> 5

taactttaaa taatnnnnat tatttaaagt ta 32

<210> 6

<211> 765

<212> PRT

<213> Artificial Sequence

<400> 6

Met Val Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro Ile Leu

1 5 10 15

Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly

20 25 30

Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe Ile

35 40 45

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr

50 55 60

Phe Thr Tyr Gly Val Gln Cys Phe Ala Arg Tyr Pro Asp His Met Lys

65 70 75 80

Gln His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gln Glu

85 90 95

Arg Thr Ile Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu

100 105 110

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile Glu Leu Lys Gly

115 120 125

Ile Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His Lys Leu Glu Tyr

130 135 140

Asn Tyr Asn Ser His Lys Val Tyr Ile Thr Ala Asp Lys Gln Lys Asn

145 150 155 160

Gly Ile Lys Val Asn Phe Lys Thr Arg His Asn Ile Glu Asp Gly Ser

165 170 175

Val Gln Leu Ala Asp His Tyr Gln Gln Asn Thr Pro Ile Gly Asp Gly

180 185 190

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gln Ser Ala Leu

195 200 205

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe

210 215 220

Val Thr Ala Ala Gly Ile Thr Leu Gly Met Asp Glu Leu Tyr Lys Glu

225 230 235 240

Gly Arg Gly Ser Leu Leu Thr Cys Gly Asp Val Glu Glu Asn Pro Gly

245 250 255

Pro Ala Pro Gly Ser Met Ser Gly Gly Glu Glu Leu Phe Ala Gly Ile

260 265 270

Val Pro Val Leu Ile Glu Leu Asp Gly Asp Val His Gly His Lys Phe

275 280 285

Ser Val Arg Gly Glu Gly Glu Gly Asp Ala Asp Tyr Gly Lys Leu Glu

290 295 300

Ile Lys Phe Ile Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr

305 310 315 320

Leu Val Thr Thr Leu Cys Tyr Gly Ile Gln Cys Phe Ala Arg Tyr Pro

325 330 335

Glu His Met Lys Met Asn Asp Phe Phe Lys Ser Ala Met Pro Glu Gly

340 345 350

Tyr Ile Gln Glu Arg Thr Ile Gln Phe Gln Asp Asp Gly Lys Tyr Lys

355 360 365

Thr Arg Gly Glu Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg Ile

370 375 380

Glu Leu Lys Gly Lys Asp Phe Lys Glu Asp Gly Asn Ile Leu Gly His

385 390 395 400

Lys Leu Glu Tyr Ser Phe Asn Ser His Asn Val Tyr Ile Arg Pro Asp

405 410 415

Lys Ala Asn Asn Gly Leu Glu Ala Asn Phe Lys Thr Arg His Asn Ile

420 425 430

Glu Gly Gly Gly Val Gln Leu Ala Asp His Tyr Gln Thr Asn Val Pro

435 440 445

Leu Gly Asp Gly Pro Val Leu Ile Pro Ile Asn His Tyr Leu Ser Thr

450 455 460

Gln Thr Lys Ile Ser Lys Asp Arg Asn Glu Ala Arg Asp His Met Val

465 470 475 480

Leu Leu Glu Ser Phe Ser Ala Cys Cys His Thr His Gly Met Asp Glu

485 490 495

Leu Tyr Arg Arg Ala Lys Arg Ser Gly Ser Gly Ala Thr Asn Phe Ser

500 505 510

Leu Leu Lys Gln Ala Gly Asp Val Glu Glu Asn Pro Gly Pro Met Val

515 520 525

Ser Lys Gln Ile Leu Lys Asn Thr Gly Leu Gln Glu Ile Met Ser Phe

530 535 540

Lys Val Asn Leu Glu Gly Val Val Asn Asn His Val Phe Thr Met Glu

545 550 555 560

Gly Cys Gly Lys Gly Asn Ile Leu Phe Gly Asn Gln Leu Val Gln Ile

565 570 575

Arg Val Thr Lys Gly Ala Pro Leu Pro Phe Ala Phe Asp Ile Leu Ser

580 585 590

Pro Ala Phe Gln Tyr Gly Asn Arg Thr Phe Thr Lys Tyr Pro Glu Asp

595 600 605

Ile Ser Asp Phe Phe Ile Gln Ser Phe Pro Ala Gly Phe Val Tyr Glu

610 615 620

Arg Thr Leu Arg Tyr Glu Asp Gly Gly Leu Val Glu Ile Arg Ser Asp

625 630 635 640

Ile Asn Leu Ile Glu Glu Met Phe Val Tyr Arg Val Glu Tyr Lys Gly

645 650 655

Arg Asn Phe Pro Asn Asp Gly Pro Val Met Lys Lys Thr Ile Thr Gly

660 665 670

Leu Gln Pro Ser Phe Glu Val Val Tyr Met Asn Asp Gly Val Leu Val

675 680 685

Gly Gln Val Ile Leu Val Tyr Arg Leu Asn Ser Gly Lys Phe Tyr Ser

690 695 700

Cys His Met Arg Thr Leu Met Lys Ser Lys Gly Val Val Lys Asp Phe

705 710 715 720

Pro Glu Tyr His Phe Ile Gln His Arg Leu Glu Lys Thr Tyr Val Glu

725 730 735

Asp Gly Gly Phe Val Glu Gln His Glu Thr Ala Ile Ala Gln Leu Thr

740 745 750

Ser Leu Gly Lys Pro Leu Gly Ser Leu His Glu Trp Val

755 760 765

<210> 7

<211> 29

<212> DNA

<213> Artificial Sequence

<400> 7

tggattttgc tgcagttatt tgtgtatag 29

<210> 8

<211> 27

<212> DNA

<213> Artificial Sequence

<400> 8

cacataaatt aaacatgact tggttac 27

<210> 9

<211> 23

<212> DNA

<213> Artificial Sequence

<400> 9

tccgagcgtg gtggagccgt tct 23

<210> 10

<211> 27

<212> DNA

<213> Artificial Sequence

<400> 10

tactaccttg ttctgataga aatattt 27

<210> 11

<211> 20

<212> DNA

<213> Artificial Sequence

<400> 11

ataaattaac attgaaggtc 20

Claims

1. A nucleic acid molecule selected from

(1) A nucleic acid molecule comprising a 5 'homology arm and a 3' homology arm, the 5 'homology arm and the 3' homology arm being capable of recombining, into a genome, sequences therebetween such that the sequences therebetween are co-expressed with a cell proliferation factor gene in the genome, and a coding sequence for a first recombinase, an estrogen receptor ER coding sequence, and a recognition site for a second recombinase located between the 5 'homology arm and the 3' homology arm, and

(2) (1) the complementary sequence of said nucleic acid molecule.

2. The nucleic acid molecule of claim 1, wherein the polynucleotide sequence of the nucleic acid molecule is, in order from the 5 'end to the 3' end, a 5 'homology arm, a coding sequence for a first recombinase, a recognition site for a second recombinase, an estrogen receptor ER coding sequence, a recognition site for a second recombinase, and a 3' homology arm, wherein the 5 'homology arm and the 3' homology arm recombine sequences between them at the 5 'or 3' end of the cell proliferation factor gene,

preferably, the nucleic acid molecule has one or more characteristics selected from the group consisting of:

the cell proliferation factor is Ki67 or PCNA,

the nucleic acid sequence of the 5' homologous arm is shown as the 1 st to 3000 th nucleotides of SEQ ID NO. 1,

the nucleic acid sequence of the 3' homologous arm is shown as the 5128-8127 site nucleotide of SEQ ID NO. 1,

the first recombinase and the second recombinase are Cre and Dre or Dre and Cre respectively, wherein the recognition site of Cre is LoxP, the recognition site of Dre is rox,

the amino acid sequence of Dre is shown as amino acids 1-356 of SEQ ID NO. 2,

cre has an amino acid sequence shown in SEQ ID NO. 3,

the nucleic acid sequence of LoxP is shown in SEQ ID NO 4,

the nucleic acid sequence of rox is shown as SEQ ID NO. 5,

the amino acid sequence of the estrogen receptor ER is shown as the amino acid 357-666 of SEQ ID NO: 2.

3. A nucleic acid construct comprising the nucleic acid molecule of claim 1 or 2.

4. A recombinase system comprising

(1) The nucleic acid molecule of claim 1 or 2, and

(2) optionally, a nucleic acid molecule encoding a fusion protein of a second recombinase and an estrogen receptor ER,

(3) optionally, a nucleic acid molecule comprising the structure: from the 5 'end to the 3' end there is a recognition site for the first recombinase, a termination sequence, a recognition site for the first recombinase and a marker-coding sequence, respectively, or

The system comprises

(1) The nucleic acid construct of claim 3, and

(2) optionally a second nucleic acid construct having a polynucleotide sequence encoding a fusion protein of a second recombinase and an estrogen receptor ER,

(3) an optional third nucleic acid construct having a polynucleotide sequence comprising the structure: from the 5 'end to the 3' end there are a recognition site for the first recombinase, a termination sequence, a recognition site for the first recombinase and a marker-encoding sequence, respectively.

5. A host cell, comprising one or more selected from the group consisting of:

(1) the nucleic acid molecule of claim 1 or 2;

(2) the nucleic acid construct of claim 3;

(3) the system of claim 4.

6. A method of constructing a transgenic animal comprising:

Subjecting the three kinds of F₀Mating any two of the generations of animals, then F, which undergoes homologous recombination₁Animal and a third F₀Mating the animals at F₂Homologous recombination occurs in the animal generations to obtain transgenic animals comprising the first, second and third polynucleotide sequences,

wherein,

F₀a first polynucleotide sequence contained in the genome of the first animal, wherein the first polynucleotide sequence is the polynucleotide sequence of the nucleic acid molecule of claim 1 or 2, or the first polynucleotide sequence comprises a first recombinase coding sequence, a recognition site for a second recombinase, an Estrogen Receptor (ER) coding sequence, and a recognition site for the second recombinase from 5 'to 3' end, and the first polynucleotide sequence is co-expressed with a cell proliferation factor gene in the genome;

F₀a second polynucleotide sequence contained in the genome of the second animal, said second polynucleotide sequence encoding a fusion protein of a second recombinase and an Estrogen Receptor (ER),

F₀the genome of the third animal contains a third polynucleotide sequence which is provided with a recognition site of the first recombinase, a termination sequence, a recognition site of the first recombinase and a marker coding sequence from 5 'end to 3' end;

preferably, the animal is a mouse,

preferably, the first and second recombinant enzymes, the estrogen receptor ER and the cell proliferation factor are as defined in claim 1 or 2.

7. A method for constructing a transgenic animal comprising introducing any one, two or three of the first, second and third polynucleotide sequences into animal cells and culturing, and selecting a transgenic animal whose genome comprises any one, two or three of the first, second and third polynucleotide sequences, wherein

the third polynucleotide sequence is provided with a recognition site of the first recombinase, a termination sequence, a recognition site of the first recombinase and a marker coding sequence from the 5 'end to the 3' end in sequence,

preferably, the cells are animal ES cells,

preferably, the animal is a mouse,

8. An in vivo long-term cell labeling method comprising labeling cells expressing a cell proliferation factor gene in an animal comprising a first, second, and third polynucleotide sequence in the presence of an inducer that interacts with the estrogen receptor ER, wherein,

the third polynucleotide sequence is provided with a recognition site of a first recombinase, a termination sequence, a first recombination site and a marker coding sequence from the 5 'end to the 3' end in sequence,

preferably, the animal is a mouse,

9. Use of the nucleic acid molecule of claim 1 or 2, the nucleic acid construct of claim 3, the system of claim 4 or the host cell of claim 5 for long-term cell labeling or cell tracking.

10. A kit comprising the nucleic acid molecule of claim 1 or 2, the nucleic acid construct of claim 3, the system of claim 4, or the host cell of claim 5, and reagents required to knock the nucleic acid molecule into the genome of the cell.