CN112501205A

CN112501205A - Construction method and application of CEACAM1 gene humanized non-human animal

Info

Publication number: CN112501205A
Application number: CN202110173466.1A
Authority: CN
Inventors: 赵磊
Original assignee: Baccetus Beijing Pharmaceutical Technology Co ltd
Current assignee: Baccetus Beijing Pharmaceutical Technology Co ltd
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-03-16
Anticipated expiration: 2041-02-09
Also published as: CN112501205B

Abstract

The invention provides a construction method of a CEACAM1 gene humanized non-human animal, which is characterized in that a nucleotide sequence coding human CEACAM1 protein is introduced into the genome of the non-human animal in a homologous recombination mode, the animal can normally express the humanized CEACAM1 protein, can be used as an animal model for researching the signal mechanism of human CEACAM1 and screening tumor and immune disease drugs, and has important application value for the research and development of new drugs of immune targets. The invention also provides a humanized CEACAM1 protein, a humanized CEACAM1 gene, a targeting vector of the CEACAM1 gene, a non-human animal obtained by the construction method and application thereof in the field of biomedicine.

Description

Construction method and application of CEACAM1 gene humanized non-human animal

Technical Field

The invention belongs to the field of animal genetic engineering and genetic modification, and particularly relates to a construction method of a CEACAM1 gene humanized non-human animal and application thereof in the field of biomedicine.

Background

Carcinoembryonic antigen-related cell adhesion molecule 1 (hereinafter referred to as CEACAM1), also known as CD66a or C-CAM, is a transmembrane glycoprotein and belongs to the carcinoembryonic antigen (CEA) family. CEACAM1 is widely expressed on epithelial cells and vascular endothelial cells, CEACAM1 is also expressed on granulocytes, monocytes-macrophages, platelets, B cells, IL-2 activated T cells), dendritic cells. CEACAM1 has many biological functions, and can prevent tumor growth and epithelial cell proliferation, induce epithelial cell apoptosis, prevent T lymphocyte activation, stimulate B lymphocyte proliferation, inhibit T cell and NK cell cytotoxic effects, prevent tumor-infiltrating lymphocyte activity, delay granulocyte and monocyte apoptosis, promote tumor cell invasion, enhance endothelial cell activity, promote angiogenesis, and regulate blood vessel remodeling. .

In particular, CEACAM1 is thought to be an immune checkpoint molecule similar to PD-1 and CTLA-4, playing a key role in regulating T cell activation. Immune checkpoint pathways protect tissues from immune-mediated damage under non-inflammatory physiological conditions. When CEACAM1 is activated on T lymphocytes, primarily upon CEACAM1-CEACAM1 trans-homologous engagement, CEACAM1 signals inhibition of TCR-mediated inflammatory pathways by recruiting phosphatases into its own cytoplasmic ITIM motif. Therefore, inhibition of immune checkpoint pathways in the cancer environment has become a promising anti-cancer therapeutic strategy.

The experimental animal disease model is an indispensable research tool for researching etiology and pathogenesis of human diseases, developing prevention and treatment technologies and developing medicines. However, due to the differences between the physiological structures and metabolic systems of animals and humans, the traditional animal models cannot reflect the real conditions of human bodies well, and the establishment of disease models closer to the physiological characteristics of human bodies in animal bodies is an urgent need of the biomedical industry.

With the continuous development and maturation of genetic engineering technology, the replacement or substitution of animal homologous genes with human genes has been realized, and the development of humanized experimental animal models in this way is the future development direction of animal models. The gene humanized animal model is one animal model with normal or mutant gene replaced with homologous gene in animal genome and similar physiological or disease characteristics. The gene humanized animal not only has important application value, for example, the humanized animal model of cell or tissue transplantation can be improved and promoted by gene humanization, but also more importantly, the human protein can be expressed or partially expressed in the animal body due to the insertion of the human gene segment, and the gene humanized animal can be used as a target of a drug which can only recognize the human protein sequence, thereby providing possibility for screening anti-human antibodies and other drugs at the animal level. However, due to differences in physiology and pathology between animals and humans, coupled with the complexity of genes, for example, the identity of human and mouse CEACAM1 protein is only 58%, how to construct an "effective" humanized animal model for new drug development remains the greatest challenge.

In view of the complex mechanism of action of CEACAM1 and the huge application value in the field of tumor therapy, there is an urgent need in the art to develop a non-human animal model of CEACAM 1-related signaling pathway in order to further explore its related biological properties, improve the effectiveness of preclinical drug efficacy tests, improve the success rate of research and development, make preclinical tests more effective and minimize the research and development failures. In addition, the non-human animal obtained by the method can be mated with other gene humanized non-human animals to obtain a multi-gene humanized animal model which is used for screening and evaluating the drug effect research of human drugs and combined drugs aiming at the signal path. The invention has wide application prospect in academic and clinical research.

Disclosure of Invention

In a first aspect of the present invention, there is provided a CEACAM1 gene humanized non-human animal or a construction method thereof, wherein the genome of the non-human animal comprises exons 2 to 6 of human CEACAM1 gene.

Preferably, the genome of said non-human animal comprises part of exon 2, all of exons 3 to 5 and part of exon 6 of human CEACAM1 nucleotide sequence, further preferably comprises intron 2-3 and/or intron 5-6, more preferably comprises any intron between exons 2-6; wherein, the part of the No. 2 exon of the nucleotide sequence of the human CEACAM1 at least comprises the nucleotide sequence of the No. 2 exon coding the extracellular region of the human CEACAM1 protein, preferably, the part of the No. 2 exon at least comprises the nucleotide sequence of 313bp, 314bp, 315bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322bp in length from the No. 2 exon 3 '-5', the part of the No. 6 exon at least comprises the nucleotide sequence of 1-5 (such as 1, 2, 3, 4, 5) amino acids of the C end of the extracellular region of the human CEACAM1 protein removed from the No. 6 exon, preferably, the part of the No. 6 exon at least comprises the nucleotide sequence of 14bp, 15bp, 16bp, 17bp, 18bp, 19bp, 20bp, 21bp, 22bp or 23bp in length from the No. 6 exon 5 '-3'.

Preferably, said constructing method comprises inserting or replacing at the non-human animal CEACAM1 locus with a nucleotide sequence comprising all or part of exon 2 to 6 of human CEACAM1 nucleotide sequence, further preferably, inserting or replacing at the non-human animal CEACAM1 locus with a nucleotide sequence comprising part of exon 2, all of exon 3 to 5 and part of exon 6 of human CEACAM1 nucleotide sequence, more preferably, comprising intron 2-3 and/or intron 5-6, still more preferably, comprising any intron between exon 2-6; wherein, the part of the No. 2 exon of the human CEACAM1 gene at least comprises the nucleotide sequence of the No. 2 exon coding the extracellular region of the human CEACAM1 protein, preferably, the part of the No. 2 exon at least comprises the nucleotide sequence of 313bp, 314bp, 315bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322bp in length from the No. 2 exon 3 '-5', the part of the No. 6 exon at least comprises the nucleotide sequence of 1-5 (such as 1, 2, 3, 4, 5) amino acids of the C end of the extracellular region of the human CEACAM1 protein removed from the No. 6 exon, preferably, the part of the No. 6 exon at least comprises the nucleotide sequence of 14bp, 15bp, 16bp, 17bp, 18bp, 19bp, 20bp, 21bp, 22bp or 23bp in length from the No. 6 exon 5 '-3'.

Preferably, the insertion or substitution to the CEACAM1 locus is to insert or substitute a part of nucleotide sequence of non-human animal endogenous CEACAM1 gene comprising part of exon 2, all of exon 3 to 5 and part of exon 6, preferably intron 2-3 and/or intron 5-6, further preferably any intron between

exons

2 and 6, wherein the part of exon 2 of non-human animal CEACAM1 gene comprises at least nucleotide sequence of extracellular region of ceam 1 protein in exon 2, preferably the part of exon 2 comprises at least nucleotide sequence of extracellular region of CEACAM1 protein in exon 2, and the part of exon 2 comprises at least nucleotide sequence of extracellular region of ceam 1 protein with length 313bp, 314bp, 315bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322bp, and the part of exon 6 comprises at least extracellular region of CEACAM 385 (for example, No. 1-25C) in exon 6 (for example, No. 2, No. 3 '-5's exon 3-5) is removed Such as a nucleotide sequence of 1, 2, 3, 4, 5) amino acids, preferably, the part of exon 6 comprises at least a nucleotide sequence of 14bp, 15bp, 16bp, 17bp, 18bp, 19bp, 20bp, 21bp, 22bp or 23bp in length from exon 6 ' 5 ' -3 '.

Preferably, the non-human animal body expresses human or humanized CEACAM1 protein.

Preferably, the human or humanized CEACAM1 protein comprises an extracellular region of human CEACAM1 protein. Further preferably, the peptide further comprises a signal peptide, a transmembrane region and/or a cytoplasmic region.

Preferably, the humanized CEACAM1 protein comprises an extracellular region portion of human CEACAM1 protein, and further preferably, the extracellular region portion comprises an extracellular region of human CEACAM1 protein with 0-5 (e.g., 0, 1, 2, 3, 4, 5) amino acids removed from the C-terminus.

Preferably, the construction method comprises insertion or substitution into the non-human animal CEACAM1 locus with a nucleotide sequence comprising an extracellular region encoding the human CEACAM1 protein. Further preferred, comprises insertion or substitution to the non-human animal CEACAM1 locus, preferably substitution at the corresponding position, with a nucleotide sequence comprising the C-terminal deletion of 0-5 (e.g. 0, 1, 2, 3, 4, 5) amino acids of the extracellular region encoding the human CEACAM1 protein.

More preferably, the method of construction comprises the step of using a polynucleotide comprising a nucleotide sequence encoding SEQ ID NO: 2 from position 35 to 423 to the CEACAM1 locus.

Preferably, the method of construction comprises the use of a polynucleotide comprising SEQ ID NO: 5 to the CEACAM1 locus of a non-human animal.

Preferably, the construction method comprises insertion, inversion, knockout or substitution.

More preferably, the construction method is a substitution, and the substitution is a substitution of a nucleotide sequence encoding the nucleotide sequence of the CEACAM1 gene of the non-human animal CEACAM: 1, positions 35 to 419.

Most preferably, the substitution is a substitution of the nucleotide sequence of the non-human animal CEACAM1 gene NC _000073.7 at positions 25165846 to 25176090.

The non-human animal of the invention is a rodent; preferably, the rodent is a rat or a mouse.

In one embodiment of the invention, the method of construction comprises contacting the nucleic acid sequence comprising the nucleic acid sequence encoding SEQ ID NO: 2 or a nucleotide sequence comprising amino acids 35 to 423 of SEQ ID NO: 5 to the corresponding region of the non-human animal CEACAM1 gene.

Preferably, the non-human animal body expresses the human or humanized CEACAM1 protein with reduced or absent expression of endogenous CEACAM1 protein.

Preferably, the humanized CEACAM1 protein comprises all or part of the extracellular region of human CEACAM1 protein, further preferably comprises part of the extracellular region, more preferably comprises the extracellular region of human CEACAM1 protein with 0-5 (e.g., 0, 1, 2, 3, 4, 5) amino acids removed from the C-terminus, and still more preferably comprises a sequence identical to SEQ ID NO: 2 from 35 to 423 or SEQ ID NO: 10 or an amino acid sequence having at least 70%, 80%, 85%, 90%, 95% or at least 99% identity to SEQ ID NO: 2 from 35 to 423 or SEQ ID NO: 10, or a pharmaceutically acceptable salt thereof.

Preferably, the humanized CEACAM1 protein further comprises a portion of a non-human animal CEACAM1 protein, preferably a signal peptide, extracellular region, transmembrane region and/or cytoplasmic region of the non-human animal CEACAM1 protein.

In one embodiment of the present invention, the humanized CEACAM1 protein comprises one of the following groups:

a) SEQ ID NO: 10 or SEQ ID NO: 2, part or all of the amino acid sequence shown at positions 35 to 423;

b) and SEQ ID NO: 10 or SEQ ID NO: 2 from position 35 to 423 is at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or at least 99%;

c) and SEQ ID NO: 10 or SEQ ID NO: 2 from position 35 to 423 with no more than 10, 9, 8, 7, 6, 5, 4, 3, 2 or no more than 1 amino acid difference;

d) has the sequence shown in SEQ ID NO: 10 or SEQ ID NO: 2, positions 35 to 423, comprising substitution, deletion and/or insertion of one or more amino acid residues.

Preferably, the genome of the non-human animal comprises a humanized CEACAM1 gene, and the humanized CEACAM1 gene encodes a humanized CEACAM1 protein.

Preferably, the humanized CEACAM1 gene comprises SEQ ID NO: 5, and further preferably, the mRNA sequence transcribed by the CEACAM1 gene contained in the non-human animal comprises SEQ ID NO: 9, or a nucleotide sequence shown in the specification.

In one embodiment of the present invention, the humanized CEACAM1 gene comprises one of the following groups:

a) the mRNA sequence of the humanized CEACAM1 gene is SEQ ID NO: 9, or a part or all of the sequence shown in seq id no;

b) the mRNA sequence of the humanized CEACAM1 gene is similar to that of SEQ ID NO: 9 is at least about 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99%;

c) the mRNA sequence of the humanized CEACAM1 gene is similar to that of SEQ ID NO: 9 by no more than 10, 9, 8, 7, 6, 5, 4, 3, 2, or by no more than 1 nucleotide;

d) the mRNA sequence of the humanized CEACAM1 gene has the sequence shown in SEQ ID NO: 9, including nucleotide sequences with one or more nucleotides substituted, deleted and/or inserted.

Preferably, the construction method comprises inserting or replacing a nucleotide sequence comprising the humanized CEACAM1 gene at the non-human animal CEACAM1 locus.

Preferably, the construction method comprises inserting or replacing a nucleotide sequence encoding the humanized CEACAM1 protein into the non-human animal CEACAM1 locus.

Preferably, the insertion or substitution site is after an endogenous regulatory element of the CEACAM1 gene.

Preferably, the insertion is performed by first disrupting the coding frame of the endogenous CEACAM1 gene of the non-human animal and then performing the insertion operation, or the insertion step can be performed by both causing a frame shift mutation at the endogenous CEACAM1 gene and performing the insertion step of the human sequence.

Preferably, the humanized CEACAM1 gene is homozygous or heterozygous in the non-human animal.

Preferably, the genome of the non-human animal comprises a humanized CEACAM1 gene on at least one chromosome.

Preferably, at least one cell in the non-human animal expresses a human or humanized CEACAM1 protein.

Preferably, the CEACAM1 gene humanized non-human animal is constructed using gene editing techniques including gene targeting using embryonic stem cells, CRISPR/Cas9, zinc finger nuclease, transcription activator-like effector nuclease, homing endonuclease or other molecular biology techniques.

Preferably, the construction of a non-human animal humanized with CEACAM1 gene is performed using a targeting vector, wherein the targeting vector comprises all or part of the nucleotide sequence of exon nos. 2 to 6 of human CEACAM 1; more preferably, the part of the exon 2, the whole exon 3 to 5 and the part of the exon 6 are contained, more preferably, the intron 2-3 and/or the intron 5-6 are contained, still more preferably, any intron between the exons 2 to 6 is contained, wherein the part of the exon 2 of the nucleotide sequence of the human CEACAM1 at least comprises the nucleotide sequence of the exon 2 coding the extracellular region of the human CEACAM1 protein, preferably, the part of the exon 2 at least comprises the nucleotide sequence of the exon 2 with the length of 313bp, 314bp, 315bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322bp, and the part of the exon 6 at least comprises the nucleotide sequence of the exon 6 with the extracellular region 1 protein coding the extracellular region removed (for example, 1, 2bp, 3bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322 bp), and the part of the exon 6 at least comprises the

exon

6, 3. 4, 5) amino acid nucleotide sequence, preferably, the part of the No. 6 exon at least comprises the nucleotide sequence with the length of 14bp, 15bp, 16bp, 17bp, 18bp, 19bp, 20bp, 21bp, 22bp or 23bp from the No. 6 exon 5 '-3'.

Preferably, the targeting vector comprises a nucleic acid sequence encoding SEQ ID NO: 2, amino acid sequence 35 to 423 or SEQ ID NO: 5.

Preferably, the targeting vector further comprises a DNA fragment homologous to the 5 'end of the transition region to be altered, i.e., the 5' arm, selected from the group consisting of nucleotides of 100-10000 in length of the genomic DNA of the CEACAM1 gene of a non-human animal; preferably, said 5' arm has at least 90% homology to NCBI accession No. NC _ 000068.8; further preferably, the 5' arm sequence is identical to SEQ ID NO: 3 or as shown in SEQ ID NO: 3, respectively.

Preferably, the targeting vector further comprises a DNA fragment homologous to the 3 'end of the transition region to be altered, i.e., the 3' arm, selected from the group consisting of nucleotides of 100-10000 in length of the genomic DNA of the CEACAM1 gene of a non-human animal; preferably, said 3' arm has at least 90% homology to NCBI accession No. NC _ 000068.8; further preferably, the 3' arm sequence is identical to SEQ ID NO: 4 or as shown in SEQ ID NO: 4, respectively.

Preferably, said transition region to be altered is located at the CEACAM1 locus of a non-human animal. Further preferably, it is located from exon 2 to exon 6 of the CEACAM1 gene of non-human animal.

In one embodiment of the present invention, the construction method comprises introducing the targeting vector into a cell of a non-human animal, culturing the cell (preferably an embryonic stem cell), transplanting the cultured cell into an oviduct of a female non-human animal, allowing the female non-human animal to develop, and identifying and screening to obtain the non-human animal.

In a second aspect of the present invention, there is provided a CEACAM1 gene humanized non-human animal obtained by the above construction method.

In a third aspect of the invention, there is provided a targeting vector for CEACAM1 gene, said targeting vector comprising part of the nucleotide sequence of human CEACAM 1.

Preferably, said part of human CEACAM1 nucleotide sequence comprises all or part of the nucleotide sequence of exon 2 to exon 6 of human CEACAM 1; more preferably, the part of the exon 2, the whole exon 3 to 5 and the part of the exon 6 are contained, more preferably, the intron 2-3 and/or the intron 5-6 are contained, still more preferably, any intron between the exons 2 to 6 is contained, wherein the part of the exon 2 of the nucleotide sequence of the human CEACAM1 at least comprises the nucleotide sequence of the exon 2 coding the extracellular region of the human CEACAM1 protein, preferably, the part of the exon 2 at least comprises the nucleotide sequence of which the length from the exon 23 '-5' is 313bp, 314bp, 315bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322bp, and the part of the exon 6 at least comprises the nucleotide sequence of which the N end 1-5 (such as 1, 2bp, 316bp, 317bp, 318bp, 319bp, 320bp, 321bp or 322 bp) of the extracellular region of the human CEACAM1 protein is removed from the

exon

Preferably, the targeting vector further comprises a DNA fragment homologous to the 5 'end of the transition region to be altered, i.e., the 5' arm, selected from the group consisting of nucleotides of 100-10000 in length of the genomic DNA of the CEACAM1 gene of a non-human animal; preferably, said 5' arm has at least 90% homology to NCBI accession No. NC _ 000073.7; further preferably, the 5' arm sequence is identical to SEQ ID NO: 3 or as shown in SEQ ID NO: 3, respectively.

Preferably, the targeting vector further comprises a DNA fragment homologous to the 3 'end of the transition region to be altered, i.e., the 3' arm, selected from the group consisting of nucleotides of 100-10000 in length of the genomic DNA of the CEACAM1 gene of a non-human animal; preferably, said 3' arm has at least 90% homology to NCBI accession No. NC _ 000073.7; further preferably, the 3' arm sequence is identical to SEQ ID NO: 4 or as shown in SEQ ID NO: 4, respectively.

Preferably, said transition region to be altered is located at the CEACAM1 locus, and more preferably, said transition region to be altered is located on exons 2 to 6 of CEACAM1 gene.

Preferably, the targeting vector further comprises a marker gene, more preferably, the marker gene is a gene encoding a negative selection marker, and even more preferably, the gene encoding the negative selection marker is a gene encoding diphtheria toxin subunit a (DTA).

In a specific embodiment of the present invention, the targeting vector further comprises a resistance gene selected by a positive clone, and further preferably, the resistance gene selected by the positive clone is neomycin phosphotransferase coding sequence Neo.

In a specific embodiment of the present invention, the targeting vector further comprises a specific recombination system, and further preferably, the specific recombination system is a Frt recombination site (a conventional LoxP recombination system may also be selected), and the specific recombination system has two Frt recombination sites, which are respectively connected to both sides of the resistance gene.

In a fourth aspect of the invention, there is provided a cell comprising the targeting vector described above.

In a fifth aspect of the invention, there is provided the use of a targeting vector as described above, or a cell as described above, in CEACAM1 gene modification, preferably, said use includes but is not limited to inversion, knock-out, insertion or substitution.

The sixth aspect of the invention relates to a CEACAM1 gene humanized cell, wherein the genome of the CEACAM1 gene humanized cell comprises exons 2 to 6 of a human CEACAM1 gene. Preferably, the human CEACAM1 gene encodes SEQ ID NO: 2 or a nucleotide sequence comprising amino acids 35 to 423 of SEQ ID NO: 5, which is regulated by an endogenous CEACAM1 regulatory element; the CEACAM1 gene can express human or humanized CEACAM1 protein in humanized cell body, and simultaneously the expression of endogenous CEACAM1 protein is reduced or deleted. Preferably, the human CEACAM1 gene is regulated by endogenous CEACAM1 regulatory elements.

The seventh aspect of the invention relates to a CEACAM1 gene-deleted cell, wherein the CEACAM1 gene-deleted cell deletes exons 2 to 6 of endogenous CEACAM1 gene.

In an eighth aspect, the present invention relates to a method for preparing a tumor-bearing animal model, which comprises the step of preparing a tumor-bearing animal model from the above-mentioned CEACAM1 gene-humanized non-human animal.

Preferably, the method for preparing the tumor-bearing animal model further comprises the step of implanting tumor cells into the non-human animal or the offspring thereof, which is humanized by the above gene.

The ninth aspect of the invention provides a tumor-bearing animal model obtained by the preparation method.

In a tenth aspect the invention relates to a cell or cell line or primary cell culture derived from a non-human animal as described above or a tumor-bearing animal model as described above.

In an eleventh aspect, the present invention relates to a tissue or organ or culture thereof derived from the above-mentioned non-human animal or the above-mentioned tumor-bearing animal model.

Preferably, the tissue or organ or culture thereof is spleen, tumor or culture thereof.

In a twelfth aspect of the present invention, there is provided a humanized CEACAM1 protein, wherein the humanized CEACAM1 protein comprises all or part of human CEACAM1 protein, further preferably, the humanized CEACAM1 protein comprises all or part of extracellular region of human CEACAM1 protein, further preferably, comprises part of extracellular region, more preferably, the part of extracellular region comprises extracellular region of human CEACAM1 protein with 0-5 (e.g., 0, 1, 2, 3, 4, 5) amino acids removed from C-terminus.

Preferably, the humanized CEACAM1 protein comprises a sequence identical to SEQ ID NO: 2 from 35 to 423 or SEQ ID NO: 10 or an amino acid sequence having at least 70%, 80%, 85%, 90%, 95% or at least 99% identity to SEQ ID NO: 2 from 35 to 423 or SEQ ID NO: 10, or a pharmaceutically acceptable salt thereof.

Preferably, the humanized CEACAM1 protein further comprises a portion of non-human animal CEACAM1 protein, preferably a signal peptide, extracellular region, transmembrane region, cytoplasmic region of non-human animal CEACAM1 protein.

Preferably, the humanized CEACAM1 protein comprises an amino acid sequence encoded by exon 2 to exon 6 of human CEACAM1 gene, and an amino acid sequence of non-human animal CEACAM1 protein.

In a thirteenth aspect of the present invention, there is provided a humanized CEACAM1 gene encoding the above-mentioned humanized CEACAM1 protein, said humanized CEACAM1 gene comprising exons 2 to 6 of human CEACAM1 gene, and a nucleotide sequence of non-human animal CEACAM1 gene.

Preferably, the humanized CEACAM1 gene comprises SEQ ID NO: 5.

Preferably, the mRNA sequence transcribed by the humanized CEACAM1 gene comprises SEQ ID NO: 9, or a nucleotide sequence shown in the specification.

In a specific embodiment of the present invention, said humanized CEACAM1 gene comprises a human CEACAM1 nucleotide sequence portion selected from one of the following groups:

(A) comprises the amino acid sequence of SEQ ID NO: 5, all or part of a nucleotide sequence set forth in seq id no;

(B) comprises a nucleotide sequence substantially identical to SEQ ID NO: 5, a nucleotide sequence that is at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% identical in nucleotide sequence;

(C) comprises a nucleotide sequence substantially identical to SEQ ID NO: 5 by no more than 10, 9, 8, 7, 6, 5, 4, 3, 2, or by no more than 1 nucleotide;

(D) has the sequence shown in SEQ ID NO: 5, including nucleotide sequences with one or more nucleotides substituted, deleted and/or inserted.

In a specific embodiment of the present invention, the mRNA transcribed from the nucleotide sequence of the humanized CEACAM1 gene is selected from one of the following groups:

(a) comprises the amino acid sequence of SEQ ID NO: 9, or a portion or all of a nucleotide sequence set forth in seq id no;

(b) comprises a nucleotide sequence substantially identical to SEQ ID NO: 9, a nucleotide sequence that is at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% identical in nucleotide sequence;

(c) comprises a nucleotide sequence substantially identical to SEQ ID NO: 9 by no more than 10, 9, 8, 7, 6, 5, 4, 3, 2, or by no more than 1 nucleotide; or

(d) Comprises the amino acid sequence of SEQ ID NO: 9, including nucleotide sequences with one or more nucleotides substituted, deleted and/or inserted.

In a fourteenth aspect, the present invention relates to a construct expressing the humanized CEACAM1 protein described above.

In a fifteenth aspect, the invention relates to a cell comprising the above construct.

In a sixteenth aspect, the invention relates to a tissue comprising the above-described cells.

Preferably, none of the above cells or cell lines or primary cell cultures, tissues or organs or cultures thereof is capable of developing into an individual animal.

In a seventeenth aspect of the present invention, there is provided a method of constructing a polygene-modified non-human animal, the method comprising:

(a) preparing and obtaining the non-human animal by applying the construction method;

(b) mating the non-human animal obtained in the step (a) with other genetically modified animals except CEACAM1, performing in vitro fertilization or directly performing gene editing, and screening to obtain the polygenic humanized modified non-human animal.

Preferably, the multi-gene humanized modified non-human animal is a two-gene humanized non-human animal, a three-gene humanized non-human animal, a four-gene humanized non-human animal, a five-gene humanized non-human animal, a six-gene humanized non-human animal, a seven-gene humanized non-human animal, an eight-gene humanized non-human animal or a nine-gene humanized non-human animal.

Preferably, the animals modified by other genes except CEACAM1 are selected from one or more than two of the animals modified by genes PD-1, PD-L1, TIGIT or CD 226.

The eighteenth aspect of the present invention relates to the use of the above non-human animal, the above tumor-bearing animal model, the above cell or cell line or primary cell culture, the above tissue or organ or culture thereof, the above humanized CEACAM1 protein or the above humanized CEACAM1 gene in the preparation of a medicament for treating or preventing tumors.

In a nineteenth aspect, the present invention relates to a non-human animal as described above, a tumor-bearing animal model as described above, a cell or cell line or primary cell culture as described above, a tissue or organ as described above or a culture thereof, a humanized CEACAM1 protein as described above or an application of a humanized CEACAM1 gene as described above in studies related to CEACAM1 gene or protein, wherein the application comprises:

A) product development involving the immunological process of human cells, use in the manufacture or screening of human antibodies;

B) as model systems for pharmacological, immunological, microbiological and medical research;

C) the production of immune processes involving human cells and the use of animal experimental disease models for pathogenic research, for the development of diagnostic strategies or for the development of therapeutic strategies;

D) screening, drug effect detection, efficacy evaluation, verification or evaluation of human CEACAM1 signal pathway modulators are studied in vivo; or,

E) the application of the CEACAM1 gene function, the human CEACAM1 antibody, the medicines and the drug effects aiming at the target site of the human CEACAM1, the medicines for immune-related diseases and the medicines for resisting tumors or inflammations is researched.

Preferably, the use comprises use in the preparation of a pharmaceutical composition or a test kit.

Preferably, the use is not a method of diagnosis or treatment of disease.

"tumors" as referred to herein include, but are not limited to, lymphomas, B cell tumors, T cell tumors, myeloid/monocytic tumors, non-small cell lung cancer, leukemias, ovarian cancer, nasopharyngeal cancer, breast cancer, endometrial cancer, colon cancer, rectal cancer, stomach cancer, bladder cancer, lung cancer, bronchial cancer, bone cancer, prostate cancer, pancreatic cancer, liver and bile duct cancer, esophageal cancer, kidney cancer, thyroid cancer, head and neck cancer, testicular cancer, glioblastoma, astrocytoma, melanoma, myelodysplastic syndrome, and sarcomas. Wherein the leukemia is selected from acute lymphocytic (lymphoblastic) leukemia, acute myelogenous leukemia, chronic lymphocytic leukemia, multiple myeloma, plasma cell leukemia, and chronic myelogenous leukemia; said lymphoma is selected from Hodgkin's lymphoma and non-Hodgkin's lymphoma, including B-cell lymphoma, diffuse large B-cell lymphoma, follicular lymphoma, mantle cell lymphoma, marginal zone B-cell lymphoma, T-cell lymphoma, and Waldenstrom's macroglobulinemia; the sarcoma is selected from osteosarcoma, Ewing's sarcoma, leiomyosarcoma, synovial sarcoma, soft tissue sarcoma, angiosarcoma, liposarcoma, fibrosarcoma, rhabdomyosarcoma, and chondrosarcoma. In one embodiment of the invention, the tumor is selected from the group consisting of a B cell tumor, a T cell tumor, a bone marrow/monocyte tumor. Preferably B-or T-cell Acute Lymphoblastic Leukemia (ALL), Acute Myeloid Leukemia (AML), non-Hodgkin's lymphoma (NHL) and Multiple Myeloma (MM), nasopharyngeal carcinoma, lung carcinoma.

The "immune-related diseases" described in the present invention include, but are not limited to, allergy, asthma, myocarditis, nephritis, hepatitis, systemic lupus erythematosus, rheumatoid arthritis, scleroderma, hyperthyroidism, idiopathic thrombocytopenic purpura, autoimmune hemolytic anemia, ulcerative colitis, autoimmune liver disease, diabetes, pain, or neurological disorder, etc. In one embodiment of the invention. The immune-related disease is rheumatoid arthritis.

The term "inflammation" as used herein includes acute inflammation as well as chronic inflammation. Specifically, it includes, but is not limited to, degenerative inflammation, exudative inflammation (serous inflammation, cellulolytic inflammation, suppurative inflammation, hemorrhagic inflammation, necrotizing inflammation, catarrhal inflammation), proliferative inflammation, specific inflammation (tuberculosis, syphilis, leprosy, lymphogranuloma, etc.).

The CEACAM1 gene humanized non-human animal body can normally express human or humanized CEACAM1 protein. Can be used for drug screening, drug effect evaluation, immunity-related diseases and tumor treatment aiming at the target site of human CEACAM1, can accelerate the development process of new drugs, and can save time and cost. Provides effective guarantee for researching CEACAM1 protein function and screening related disease drugs.

The invention relates to a whole or part, wherein the whole is a whole, and the part is a part of the whole or an individual forming the whole.

The humanized CEACAM1 protein comprises a part derived from human CEACAM1 protein and a part of non-human CEACAM1 protein. Wherein, the "human CEACAM1 protein" is the same as the whole "human CEACAM1 protein", namely the amino acid sequence of the "human CEACAM1 protein" is consistent with the full-length amino acid sequence of the human CEACAM1 protein. The "part of human CEACAM1 protein" is a continuous or alternate 5-526 (preferably 10-389) amino acid sequence which is identical to the amino acid sequence of human CEACAM1 protein or has more than 70% homology with the amino acid sequence of human CEACAM1 protein.

The whole extracellular region of the human CEACAM1 protein represents that the amino acid sequence of the whole extracellular region of the human CEACAM1 protein is consistent with the full-length amino acid sequence of the extracellular region of the human CEACAM1 protein.

The "part of the extracellular region of the human CEACAM1 protein" of the invention is identical to the amino acid sequence of the extracellular region of the human CEACAM1 protein by 5-394 (preferably 5-389) amino acid sequences in sequence or at intervals, or has homology of more than 70% with the amino acid sequence of the extracellular region of the human CEACAM1 protein.

The humanized CEACAM1 gene comprises a part derived from a human CEACAM1 nucleotide sequence and a part of a non-human CEACAM1 gene. Wherein, the 'human CEACAM1 nucleotide sequence' is identical to the 'human CEACAM1 nucleotide sequence' in all, namely the nucleotide sequence is consistent with the full-length nucleotide sequence of the human CEACAM1 nucleotide sequence. The part of the human CEACAM1 nucleotide sequence is a continuous or alternate 20-21177bp (preferably 20-14906bp or 20-1167 bp) nucleotide sequence which is consistent with the human CEACAM1 nucleotide sequence or has more than 70 percent of homology with the human CEACAM1 nucleotide sequence.

The "exon" from xx to xxx or all of the "exons from xx to xxx" in the present invention include nucleotide sequences of exons and introns therebetween, for example, the "exons 2 to 6" include all nucleotide sequences of exon 2, intron 2-3, exon 3, intron 3-4, exon 4, intron 4-5, exon 5, intron 5-6 and exon 6.

The "x-xx intron" described herein represents an intron between the x exon and the xx exon. For example, "intron 2-3" means an intron between exon 2 and exon 3.

"part of an exon" as referred to herein means that the nucleotide sequence is identical to all exon nucleotide sequences in a sequence of several, several tens or several hundreds of nucleotides in succession or at intervals. For example, the portion of exon 2 of the nucleotide sequence of human CEACAM1, comprises contiguous or spaced nucleotide sequences of 5-360bp, preferably 10-322bp, identical to the exon 2 nucleotide sequence of human CEACAM 1. In a specific embodiment of the present invention, the "portion of exon 2" contained in said "humanized CEACAM1 gene" comprises at least the nucleotide sequence of exon 2 encoding the extracellular domain of human CEACAM1 protein.

The "locus" of the present invention refers to the position of a gene on a chromosome in a broad sense and refers to a DNA fragment of a certain gene in a narrow sense, and the gene may be a single gene or a part of a single gene. For example, the "CEACAM 1 locus" refers to a DNA fragment of any one of exons 1 to 7 of CEACAM1 gene. Preferably any one or a combination of two or more of exon 1, exon 2, exon 3, exon 4, exon 5, exon 6, exon 7, exon 8, exon 9, or introns therebetween, or all or part of one or two or more thereof, more preferably exons 2 to 6 of the CEACAM1 gene.

The "nucleotide sequence" of the present invention includes a natural or modified ribonucleotide sequence and a deoxyribonucleotide sequence. Preferably DNA, cDNA, pre-mRNA, rRNA, hnRNA, miRNAs, scRNA, snRNA, siRNA, sgRNA, tRNA.

The term "treating" (or "treatment") as used herein means slowing, interrupting, arresting, controlling, stopping, alleviating, or reversing the progression or severity of one sign, symptom, disorder, condition, or disease, but does not necessarily refer to the complete elimination of all disease-related signs, symptoms, conditions, or disorders. The term "treatment" or the like refers to a therapeutic intervention that ameliorates the signs, symptoms, etc. of a disease or pathological state after the disease has begun to develop.

"homology" as used herein means that, in the context of using a protein sequence or a nucleotide sequence, one skilled in the art can adjust the sequence as needed to obtain a sequence having (including but not limited to) 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 70%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.1%, 99.2%, 99.3%, 99.4%, 99.5%, 99.6%, 99.7%, 99.8%, 99.9% identity.

One skilled in the art can determine and compare sequence elements or degrees of identity to distinguish between additional mouse and human sequences.

In one aspect, the non-human animal is a mammal. In one aspect, the non-human animal is a small mammal, such as a muridae or superfamily murinus. In one embodiment, the genetically modified animal is a rodent. In one embodiment, the rodent is selected from a mouse, a rat, and a hamster. In one embodiment, the rodent is selected from the murine family. In one embodiment, the genetically modified animal is from a family selected from the family of the family. In a particular embodiment, the genetically modified rodent is selected from a true mouse or rat (superfamily murinus), a gerbil, a spiny mouse, and a crowned rat. In one embodiment, the genetically modified mouse is from a member of the murine family. In one embodiment, the animal is a rodent. In a particular embodiment, the rodent is selected from a mouse and a rat. In one embodiment, the non-human animal is a mouse.

In a particular embodiment, the non-human animal is a rodent, a strain of C57BL, C58, a/Br, CBA/Ca, CBA/J, CBA/CBA/mouse selected from BALB/C, a/He, a/J, A/WySN, AKR/A, AKR/J, AKR/N, TA1, TA2, RF, SWR, C3H, C57BR, SJL, C57L, DBA/2, KM, NIH, ICR, CFW, FACA, C57BL/A, C57BL/An, C57BL/GrFa, C57BL/KaLwN, C57BL/6, C57BL/6J, C57BL/6ByJ, C57BL/6NJ, C57BL/10, C57BL/10 sn, C57BL/10Cr and C57 BL/Ola.

The practice of the present invention will employ, unless otherwise indicated, conventional techniques of cell biology, cell culture, molecular biology, transgenic biology, microbiology, recombinant DNA, and immunology. These techniques are explained in detail in the following documents. For example: molecular Cloning A Laboratory Manual, 2nd Ed., ed. By Sambrook, FritschandManiatis (Cold Spring Harbor Laboratory Press: 1989); DNA Cloning, Volumes I and II (d.n. glovered., 1985); oligonucleotide Synthesis (m.j. gaited., 1984); mullisetal U.S. Pat. No.4, 683, 195; nucleic Acid Hybridization (B.D. Hames & S.J. Higgins.1984); transformation And transformation (B.D. Hames & S.J. Higgins.1984); culture Of Animal Cells (r.i. freshney, alanr.liss, inc., 1987); immobilized Cells And Enzymes (IRL Press, 1986); B.Perbal, A Practical Guide To Molecular Cloning (1984); the series, Methods In ENZYMOLOGY (J.Abelson and M.Simon, eds., In-chief, Academic Press, Inc., New York), specific, volumes, 154 and 155 (Wuetal. eds.) and Vol.185, "Gene Expression Technology" (D.Goeddel, ed.); gene Transfer Vectors For Mammalian Cells (J.HMiller and M.P.Caloseds, 1987, Cold Spring Harbor Laboratory); immunochemical Methods In Cell And Molecular Biology (Mayer And Walker, eds., Academic Press, London, 1987); handbook Of Experimental Immunology, Volumes V (d.m.weir and c.c.blackwell, eds., 1986); and Manipulating the Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1986).

The foregoing is merely a summary of aspects of the invention and is not, and should not be taken as, limiting the invention in any way.

All patents and publications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication was specifically and individually indicated to be incorporated herein by reference. Those skilled in the art will recognize that certain changes may be made to the invention without departing from the spirit or scope of the invention.

The following examples further illustrate the invention in detail and are not to be construed as limiting the scope of the invention or the particular methods described herein.

Drawings

Embodiments of the invention are described in detail below with reference to the attached drawing figures, wherein:

FIG. 1: schematic comparison of mouse CEACAM1 locus to human CEACAM1 locus (not to scale);

FIG. 2: schematic representation (not to scale) of humanization transformation of the CEACAM1 gene in mice;

FIG. 3: CEACAM1 gene targeting strategy and targeting vector design schematic (not to scale);

FIG. 4: PCR assay of CEACAM1 recombinant cells, in which WT was the wild-type control, H₂O is water control, PC is positive control, and M is Marker;

FIG. 5: CEACAM1 post-recombination cellular Southern blot results, with WT being wild type control;

FIG. 6: schematic representation (not to scale) of FRT recombination process of CEACAM1 gene humanized mouse;

FIG. 7: CEACAM1 gene humanized mouse F1 mouse tail PCR identification result, wherein WT is wild type, H₂O is water control, and PC is positive control;

FIG. 8: the flow detection result of CEACAM1 protein on spleen B cells of C57BL/6 wild type mice (WT) and CAECAM1 gene humanized Homozygote mice (Homozygate) shows that mCEACAM1 represents murine CEACAM1 protein, and h CEACAM1 represents humanized CEACAM1 protein.

Detailed Description

The invention will be further described with reference to specific embodiments, and the advantages and features of the invention will become apparent as the description proceeds. These examples are illustrative only and do not limit the scope of the present invention in any way. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention, and that such changes and modifications may be made without departing from the spirit and scope of the invention.

In each of the following examples, the equipment and materials were obtained from several companies as indicated below:

KpnI, MfeI and AseI enzymes are purchased from NEB, and the cargo numbers are R3142L, R3589S and R0526S respectively;

c57BL/6 mice and Flp tool mice were purchased from the national rodent experimental animal seed center of the Chinese food and drug assay institute;

brilliant Violet 510 anti-mouse CD45 was purchased from Biolegend, cat # 103138;

PerCP/Cy5.5 anti-mouse TCR β chain from Biolegend, cat # 109228;

FITC anti-Mouse CD19 was purchased from Biolegend, cat # 1B 256058;

APC anti-mouse CD66a (CEACAM1a) Antibody, available from Biolegend under cat No. 134509;

PE anti-human CD66a/c/e Antibody was purchased from Biolegend, cat # 342303.

Example 1 CEACAM1 Gene humanized mouse

A schematic comparison of the mouse CEACAM1 Gene (NCBI Gene ID: 26365, Primary source: MGI:1347245, UniProt: Q925P3, from position 25161127 to 25177072 on chromosome 7 NC-000073.7, based on transcript NM-001039185.1 and its encoded protein NP-001034274.1 (SEQ ID NO: 1)) and the human CEACAM1 Gene (NCBI Gene ID: 634, Primary source: HGNC:1814, UniProt ID: P13688-1, from position 42507306 to 42528482 on chromosome 19 NC-000019.10, based on transcript NM-001712.5 and its encoded protein NP-001703.2 (SEQ ID NO: 2)) is shown in FIG. 1.

To achieve the object of the present invention, a nucleotide sequence encoding human CEACAM1 protein may be introduced at the endogenous CEACAM1 locus of a mouse, so that the mouse expresses the human or humanized CEACAM1 protein. Specifically, by using a gene editing technology, under the control of a mouse CEACAM1 gene regulatory element, a part sequence from a part sequence of a mouse exon 2 to a part sequence of a mouse exon 6 is replaced by a part sequence from the part sequence of the exon 2 to the part sequence of the exon 6 containing the human CEACAM1 gene, and a schematic diagram of the humanized CEACAM1 locus is shown in FIG. 2, so that the humanized transformation of the mouse CEACAM1 gene is realized.

The targeting strategy was designed as shown in FIG. 3, which shows the homologous arm sequences on the targeting vector containing the upstream and downstream of the mouse CEACAM1 gene, as well as an A fragment comprising the sequence of human CEACAM 1. Wherein, the upstream homology arm sequence (5 'homology arm, SEQ ID NO: 3) is identical to the nucleotide sequence from position 25176091 to 25182711 of NCBI accession No. NC-000073.7, and the downstream homology arm sequence (3' homology arm, SEQ ID NO: 4) is identical to the nucleotide sequence from position 25163099 to 25165433 of NCBI accession No. NC-000073.7. The nucleotide sequence of human CEACAM1 on fragment A (SEQ ID NO: 5) is identical to the nucleotide sequence from position 42512457 to 42527362 of NCBI accession No. NC-000019.10; the connection of the downstream of the human CEACAM1 sequence with the mouse is designed to be 5' -acagataatgctctaccacaagaaaatGGCCTCTCAGATGGCGCCAT-3' (SEQ ID NO: 6), wherein the sequence "aaat"t" in "is the last nucleotide, sequence, of a human"GGCCThe first "G" in "is the first nucleotide of the mouse sequence.

The targeting vector is also coated withThe method comprises a resistance gene used for positive clone screening, namely neomycin phosphotransferase coding sequence Neo, and two site-specific recombination system Frt recombination sites which are arranged in the same direction are arranged on two sides of the resistance gene to form a Neo cassette (Neo cassette). Wherein the connection between the 5 'end of the Neo box and the mouse gene is designed to be 5' -gtccaggaagagagagaagggagggactccaagaagcagcaagactatgcGGTACCGAATTCCGAAGTTCCTATTCTCTAGAAAGTATAGGAACTT-3' (SEQ ID NO: 7), wherein the sequence "ctatgc"last of" c "is the last nucleotide, sequence" of the mouse "GGTA"the first" G "is the first nucleotide of the Neo cassette; the connection between the 3 'end of the Neo box and the mouse gene is designed to be 5' -GAAGTTCCTATTCTCTAGAAAGTATAGGAACTTCATCAGTCAGGTACATAATGGTGGATCCCAATTGAGGTAGCATGTCCTGCTGACTGAAGCAGC-3' (SEQ ID NO: 8), wherein the sequence "AATTG"G" of "is the last nucleotide, sequence of the Neo cassette"AGGTA"the first" A "of" is the first nucleotide of the mouse. In addition, a coding gene with a negative selection marker (diphtheria toxin a subunit coding gene (DTA)) was constructed downstream of the 3' homology arm of the targeting vector. The mRNA sequence of the humanized mouse CEACAM1 after being transformed is shown as SEQ ID NO: 9, the expressed protein sequence is shown as SEQ ID NO: shown at 10.

Given that human CEACAM1 has multiple subtypes or transcripts, the methods described herein can be applied to other subtypes or transcripts.

The construction of the targeting vector can be carried out by adopting a conventional method, such as enzyme digestion connection and the like. And carrying out preliminary verification on the constructed targeting vector by enzyme digestion, and then sending the targeting vector to a sequencing company for sequencing verification. The method comprises the steps of performing electroporation transfection on a targeting vector which is verified to be correct by sequencing into embryonic stem cells of a C57BL/6 mouse, screening the obtained cells by using a positive clone screening marker gene, detecting and confirming the integration condition of an exogenous gene by using PCR and Southern Blot technology, screening correct positive clone cells, detecting clones which are verified to be positive by PCR (figure 4), further performing Southern Blot (digesting cell DNA by KpnI or MfeI or AseI respectively and hybridizing by using 3 probes, wherein the length of the probes and target fragments is shown in table 1), and detecting the result as shown in figure 5, wherein the detection result shows that 10 clones which are verified to be positive by PCR, and the other 8 clones except 1-A11 and 2-F03 are verified to be positive by sequencing and have no random insertion, and are specifically numbered as 1-A05, 1-C01, 1-C05, 1-F06, 06 and 2-F03, 1-H06, 1-E06, 2-B12 and 2-C11.

Table 1: specific probes and target fragment lengths

Wherein the PCR assay comprises the following primers:

F1：5’-GCTCGACTAGAGCTTGCGGA-3’（SEQ ID NO：11），

R1：5’-GGAGTCAATAGAGTGAATGCATGAGTGT-3’（SEQ ID NO：12）；

the Southern Blot detection comprises the following probe primers:

5 'Probe (5' Probe):

5’Probe-F：5’-TATCACAAGAGGGAATAAACCACAGGGT-3’（SEQ ID NO：13），

5’Probe-R：5’-ATTGCACCATGAGGTTGAACAGCAT-3’（SEQ ID NO：14）；

3 'Probe (3' Probe):

3’Probe-F：5’-ACTCCTACACACAGAGCACTAACAG-3’（SEQ ID NO：15），

3’Probe-R：5’-CAGGCCAGAGGAAATGTAACAAAGG-3’（SEQ ID NO：16）；

neo Probe (Neo Probe):

Neo Probe-F：5’-CATAAGGTGGGATCTCTCAGACAGG-3’（SEQ ID NO：17），

Neo Probe-R：5’-GCTCTGAAGTCCAGTAGGATCATGT-3’（SEQ ID NO：18）。

the selected correctly positive cloned cells (black mice) are introduced into the separated blastocysts (white mice) according to the known technology in the field, the obtained chimeric blastocysts are transferred into a culture solution for short-term culture and then transplanted into the oviduct of a recipient mother mouse (white mouse), and F0 generation chimeric mice (black and white alternate) can be produced. The F1 generation mice are obtained by backcrossing the F0 generation chimeric mice and the wild mice, and the F1 generation heterozygous mice are mutually mated to obtain the F2 generation homozygous son mice. The positive mice can also be mated with Flp tool mice to remove the positive clone screening marker gene (the process is shown in the schematic diagram in figure 6), and then the positive mice and Flp tool mice are mated with each other to obtain the CEACAM1 gene humanized homozygote mice. The somatic cell genotype of the progeny mice can be identified by PCR (primers are shown in Table 2), and the results of identification of exemplary F1 generation mice (from which the Neo marker gene has been removed) are shown in FIG. 7, in which 12 mice numbered F1-01, F1-02, F1-03, F1-04, F1-05, F1-06, F1-07, F1-08, F1-09, F1-10, F1-11, and F1-12 are all positive heterozygous mice. This shows that the method can be used for constructing the CEACAM1 gene humanized mouse which can be stably passaged and has no random insertion.

Table 2: primer name and specific sequence

The expression of the humanized CEACAM1 protein in positive mice can be confirmed by conventional detection methods, such as flow cytometry. Specifically, 1 of 8-week-old female C57BL/6 wild-type mice and 8-week-old female CEACAM1 gene-humanized homozygote mice were each harvested, splenic tissues were harvested after cervical dislocation, and flow-type detection was performed after recognition staining with an anti-Mouse CD45 Antibody Brilliant Violet 510 anti-Mouse CD45, a Mouse T cell-specific recognition Antibody, PerCP/cy5.5 anti-Mouse TCR β chain, a B cell-specific recognition Antibody, FITC anti-Mouse CD19, an anti-Mouse CEACAM1 Antibody, APC anti-Mouse CD66a (CEACAM1a) antipod, an anti-human CEACAM1 Antibody PE anti-man CD66a (CEACAM1 a)/C/e antipod, and the results of detection are shown in fig. 8.

As can be seen from fig. 8, the expression of murine CEACAM1 protein was detected on spleen cells of C57BL/6 wild-type mice (fig. 8A), and humanized CEACAM1 protein was not detected (fig. 8C); humanized CEACAM1 protein was detected on humanized homozygote mouse spleen cells of CEACAM1 gene (fig. 8D), and expression of murine CEACAM1 protein was not detected (fig. 8B).

Example 2 preparation of double-humanized or multiple double-humanized mice

A double-humanized or multi-humanized mouse model can be prepared by using the CEACAM1 mouse prepared by the method. For example, in the above example 1, the embryonic stem cells used for blastocyst microinjection can be selected from mice containing other gene modifications such as PD-1, PD-L1, TIGIT, CD226, etc., or can be obtained from humanized CEACAM1 mice by isolating mouse ES embryonic stem cells and gene recombination targeting techniques to obtain a two-gene or multi-gene modified mouse model of CEACAM1 and other gene modifications. The CEACAM1 mouse homozygote or heterozygote obtained by the method can also be mated with other gene modified homozygote or heterozygote mice, the offspring thereof is screened, the humanized CEACAM1 and other gene modified double-gene or multi-gene modified heterozygote mice can be obtained with a certain probability according to Mendelian genetic rule, then the heterozygote is mated with each other to obtain double-gene or multi-gene modified homozygote, and the double-gene or multi-gene modified mice can be used for in vivo efficacy verification of targeted human CEACAM1 and other gene regulators and the like.

The preferred embodiments of the present invention have been described in detail, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.

It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.

In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Sequence listing

<110> Baiosai Diagram (Beijing) pharmaceutical science and technology Co., Ltd

Construction method and application of <120> CEACAM1 gene humanized non-human animal

<130> 1

<160> 25

<170> SIPOSequenceListing 1.0

<210> 1

<211> 521

<212> PRT

<213> Mouse (Mouse)

<400> 1

Met Glu Leu Ala Ser Ala His Leu His Lys Gly Gln Val Pro Trp Gly

1 5 10 15

Gly Leu Leu Leu Thr Ala Ser Leu Leu Ala Ser Trp Ser Pro Ala Thr

20 25 30

Thr Ala Glu Val Thr Ile Glu Ala Val Pro Pro Gln Val Ala Glu Asp

35 40 45

Asn Asn Val Leu Leu Leu Val His Asn Leu Pro Leu Ala Leu Gly Ala

50 55 60

Phe Ala Trp Tyr Lys Gly Asn Thr Thr Ala Ile Asp Lys Glu Ile Ala

65 70 75 80

Arg Phe Val Pro Asn Ser Asn Met Asn Phe Thr Gly Gln Ala Tyr Ser

85 90 95

Gly Arg Glu Ile Ile Tyr Ser Asn Gly Ser Leu Leu Phe Gln Met Ile

100 105 110

Thr Met Lys Asp Met Gly Val Tyr Thr Leu Asp Met Thr Asp Glu Asn

115 120 125

Tyr Arg Arg Thr Gln Ala Thr Val Arg Phe His Val His Pro Ile Leu

130 135 140

Leu Lys Pro Asn Ile Thr Ser Asn Asn Ser Asn Pro Val Glu Gly Asp

145 150 155 160

Asp Ser Val Ser Leu Thr Cys Asp Ser Tyr Thr Asp Pro Asp Asn Ile

165 170 175

Asn Tyr Leu Trp Ser Arg Asn Gly Glu Ser Leu Ser Glu Gly Asp Arg

180 185 190

Leu Lys Leu Ser Glu Gly Asn Arg Thr Leu Thr Leu Leu Asn Val Thr

195 200 205

Arg Asn Asp Thr Gly Pro Tyr Val Cys Glu Thr Arg Asn Pro Val Ser

210 215 220

Val Asn Arg Ser Asp Pro Phe Ser Leu Asn Ile Ile Tyr Gly Pro Asp

225 230 235 240

Thr Pro Ile Ile Ser Pro Ser Asp Ile Tyr Leu His Pro Gly Ser Asn

245 250 255

Leu Asn Leu Ser Cys His Ala Ala Ser Asn Pro Pro Ala Gln Tyr Phe

260 265 270

Trp Leu Ile Asn Glu Lys Pro His Ala Ser Ser Gln Glu Leu Phe Ile

275 280 285

Pro Asn Ile Thr Thr Asn Asn Ser Gly Thr Tyr Thr Cys Phe Val Asn

290 295 300

Asn Ser Val Thr Gly Leu Ser Arg Thr Thr Val Lys Asn Ile Thr Val

305 310 315 320

Leu Glu Pro Val Thr Gln Pro Phe Leu Gln Val Thr Asn Thr Thr Val

325 330 335

Lys Glu Leu Asp Ser Val Thr Leu Thr Cys Leu Ser Asn Asp Ile Gly

340 345 350

Ala Asn Ile Gln Trp Leu Phe Asn Ser Gln Ser Leu Gln Leu Thr Glu

355 360 365

Arg Met Thr Leu Ser Gln Asn Asn Ser Ile Leu Arg Ile Asp Pro Ile

370 375 380

Lys Arg Glu Asp Ala Gly Glu Tyr Gln Cys Glu Ile Ser Asn Pro Val

385 390 395 400

Ser Val Arg Arg Ser Asn Ser Ile Lys Leu Asp Ile Ile Phe Asp Pro

405 410 415

Thr Gln Gly Gly Leu Ser Asp Gly Ala Ile Ala Gly Ile Val Ile Gly

420 425 430

Val Val Ala Gly Val Ala Leu Ile Ala Gly Leu Ala Tyr Phe Leu Tyr

435 440 445

Ser Arg Lys Ser Gly Gly Gly Ser Asp Gln Arg Asp Leu Thr Glu His

450 455 460

Lys Pro Ser Ala Ser Asn His Asn Leu Ala Pro Ser Asp Asn Ser Pro

465 470 475 480

Asn Lys Val Asp Asp Val Ala Tyr Thr Val Leu Asn Phe Asn Ser Gln

485 490 495

Gln Pro Asn Arg Pro Thr Ser Ala Pro Ser Ser Pro Arg Ala Thr Glu

500 505 510

Thr Val Tyr Ser Glu Val Lys Lys Lys

515 520

<210> 2

<211> 526

<212> PRT

<213> human (human)

<400> 2

Met Gly His Leu Ser Ala Pro Leu His Arg Val Arg Val Pro Trp Gln

1 5 10 15

Gly Leu Leu Leu Thr Ala Ser Leu Leu Thr Phe Trp Asn Pro Pro Thr

20 25 30

Thr Ala Gln Leu Thr Thr Glu Ser Met Pro Phe Asn Val Ala Glu Gly

35 40 45

Lys Glu Val Leu Leu Leu Val His Asn Leu Pro Gln Gln Leu Phe Gly

50 55 60

Tyr Ser Trp Tyr Lys Gly Glu Arg Val Asp Gly Asn Arg Gln Ile Val

65 70 75 80

Gly Tyr Ala Ile Gly Thr Gln Gln Ala Thr Pro Gly Pro Ala Asn Ser

85 90 95

Gly Arg Glu Thr Ile Tyr Pro Asn Ala Ser Leu Leu Ile Gln Asn Val

100 105 110

Thr Gln Asn Asp Thr Gly Phe Tyr Thr Leu Gln Val Ile Lys Ser Asp

115 120 125

Leu Val Asn Glu Glu Ala Thr Gly Gln Phe His Val Tyr Pro Glu Leu

130 135 140

Pro Lys Pro Ser Ile Ser Ser Asn Asn Ser Asn Pro Val Glu Asp Lys

145 150 155 160

Asp Ala Val Ala Phe Thr Cys Glu Pro Glu Thr Gln Asp Thr Thr Tyr

165 170 175

Leu Trp Trp Ile Asn Asn Gln Ser Leu Pro Val Ser Pro Arg Leu Gln

180 185 190

Leu Ser Asn Gly Asn Arg Thr Leu Thr Leu Leu Ser Val Thr Arg Asn

195 200 205

Asp Thr Gly Pro Tyr Glu Cys Glu Ile Gln Asn Pro Val Ser Ala Asn

210 215 220

Arg Ser Asp Pro Val Thr Leu Asn Val Thr Tyr Gly Pro Asp Thr Pro

225 230 235 240

Thr Ile Ser Pro Ser Asp Thr Tyr Tyr Arg Pro Gly Ala Asn Leu Ser

245 250 255

Leu Ser Cys Tyr Ala Ala Ser Asn Pro Pro Ala Gln Tyr Ser Trp Leu

260 265 270

Ile Asn Gly Thr Phe Gln Gln Ser Thr Gln Glu Leu Phe Ile Pro Asn

275 280 285

Ile Thr Val Asn Asn Ser Gly Ser Tyr Thr Cys His Ala Asn Asn Ser

290 295 300

Val Thr Gly Cys Asn Arg Thr Thr Val Lys Thr Ile Ile Val Thr Glu

305 310 315 320

Leu Ser Pro Val Val Ala Lys Pro Gln Ile Lys Ala Ser Lys Thr Thr

325 330 335

Val Thr Gly Asp Lys Asp Ser Val Asn Leu Thr Cys Ser Thr Asn Asp

340 345 350

Thr Gly Ile Ser Ile Arg Trp Phe Phe Lys Asn Gln Ser Leu Pro Ser

355 360 365

Ser Glu Arg Met Lys Leu Ser Gln Gly Asn Thr Thr Leu Ser Ile Asn

370 375 380

Pro Val Lys Arg Glu Asp Ala Gly Thr Tyr Trp Cys Glu Val Phe Asn

385 390 395 400

Pro Ile Ser Lys Asn Gln Ser Asp Pro Ile Met Leu Asn Val Asn Tyr

405 410 415

Asn Ala Leu Pro Gln Glu Asn Gly Leu Ser Pro Gly Ala Ile Ala Gly

420 425 430

Ile Val Ile Gly Val Val Ala Leu Val Ala Leu Ile Ala Val Ala Leu

435 440 445

Ala Cys Phe Leu His Phe Gly Lys Thr Gly Arg Ala Ser Asp Gln Arg

450 455 460

Asp Leu Thr Glu His Lys Pro Ser Val Ser Asn His Thr Gln Asp His

465 470 475 480

Ser Asn Asp Pro Pro Asn Lys Met Asn Glu Val Thr Tyr Ser Thr Leu

485 490 495

Asn Phe Glu Ala Gln Gln Pro Thr Gln Pro Thr Ser Ala Ser Pro Ser

500 505 510

Leu Thr Ala Thr Glu Ile Ile Tyr Ser Glu Val Lys Lys Gln

515 520 525

<210> 3

<211> 6621

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 3

ggtgggaagc agtgactaag gaatggggtt gtgtatgacc aaagtatatt ttcacgtggg 60

tgtggagttg tcaaataatt aaagtacttt caaaaaagta agtggctcca atcctcttca 120

ctatttgaag aaaacagaaa aaaaaaaaaa aaacttagaa cacaaaagtg cctccaccga 180

ccctcattcg aaacctgggg gggtgggggg gggaggggag gggaggacaa gagcaagcaa 240

gctggtctac ctcccaaagg cttaatgttc tggaaactat taaacacttt caaatgtgaa 300

aaattaaatt gcttttaagt tttagaagaa tttgcattaa aaaacaatag ttggttttgt 360

gaatgaatac tggccctcat gctttagtca agtttactta agtcttctgg ataactttat 420

gtcaacttga cacaggctat agtcatgtca gagaagggag ctccaattga gaaaatgcat 480

tcttaagata cagctgtagg caaccctgta gggaactttt taaattattg atccatgagg 540

gagggcccag accattgtgg ttgctgctgt ccctgggctg gtggtcctgg gttctatagg 600

aaagcagggt aagcaagcca gtaagcagcc cccctccatg gcctctgcat cagctcctgc 660

ctccaggctc cttccctatt taagatcctg tcctgccttc ctttgatgat gagcagtaag 720

taatagggag gtgtaaacca aataaaccct tcggacccaa agttccctct gatcatgatg 780

tttcatcaca gcgacagtaa ccctcagaca gcaagtttgt gaattaaatc tacaaacgga 840

aatgagaata tgcagagcag gaaaacattt ataaactata ctcacaacaa ataactggta 900

caactcagaa aaagcccaat acctagttat catggaaata caaattaaac agctgtgaga 960

taccagccca ccccagttag aatagctata atcaaagtaa ctgataacaa atgttgggag 1020

aggtgttggt ggctttttaa taagtgtttg gttttctctt tacccgggta gcagctcccg 1080

aataatagac tggagtatta tatttattta aaggtttaag ggcacaactg agcagtatga 1140

acctatttta atcctctaag ctaatctggt ggcctcccag cccaaatccc caagatcctt 1200

gcattttatg gctttctctg ctccagctgt ttctccatca tgtctcatgg actctctccc 1260

cttggctaat ctctcttcct cctctctgtt tctctccctt ccttcctccc tctttccctc 1320

cctcccctgg atgggaggaa gtccagtcct attctataaa ccctgctcag tcattagctg 1380

atcagctttt attgacaaat cagagaataa atggggacca atgcttacac agccctgaag 1440

caggagacac agtatagata gttacctggc aaaggggcac agaaatcagc attatacaag 1500

gttaatattt aaacaatatg caataacatt atgcctacag agagagtgtg ggggggaaga 1560

gaaacacaca ctcattgctg gtgggtgtgc aaattaatag ccattataca aagctgtgtg 1620

gaagtctctt aaaaaatagc aataccatat aacgcagcta tttcactgga catataccca 1680

gagaacctgt accctaccat tgagatatct gcactcccat gattacagct gctccatccc 1740

atatagcaag gaagtagaac caacctagat ctccatcaat atatgaatgc acaatacaag 1800

tctaccacac actcacacat acacacacac acacacacac acacactcac acatacacac 1860

acacacacac acacacacac acactgtgaa gtattgttcg tctctacaga aagatgaatc 1920

acaaagttta cagggatgtg gatagactta gaatatatca tgttaaatga gatcatttga 1980

tctcaggaag aaaaccccat atgtcctttc ttgtttgtgg atcctagctc taatgtctgc 2040

atgtacacat gtaattacat atataacatt aggaaggaga acatgaaggg gtttggtctg 2100

gtttggtttg gtttggtttg gtttggtttt caagacaggg tttctctgtg tagccctggc 2160

tgtcctggaa ctcactctgt agaccaggct ggcctgaaac tcacagagat cctccttcct 2220

ctgcctcccg agtgttggag gtgtgcgcca ccgcctggca atggtgtggt agttctaagg 2280

agacaaagtg ctccctcatg ggctcacaca tatgatgtac ctggtgtggg ccctttggat 2340

gctctctcgc atgtaaccgg ttttgacgta aaacaggccc ttagcaagca gtgtccacgt 2400

gggactccat gcggggctgc tgccttactg ggaggcctgg atctgaagtt ctgtctttca 2460

tcctttgaat attcctgtac agtgtagttc ttgtaagtac caacaattag ggcccatagt 2520

tctttaaaca agaatttcac ctataaaaac tatagagagt gaagacactc ataagtatgc 2580

ttaaataatc ataattatga atttataaaa caaattccaa ggaatgctga ctcactgcta 2640

ctgcatctca aatactaaca caccaggaat tacacacact aagaagcttc ctttttgtgg 2700

tttttaatgt tacaattctt tttttaatat atgtatataa tttttattag gtattttcct 2760

catttacatt tccaatgcta tcccaaaagt cccccatacc atccccccga ctcccacttt 2820

ttggccctgg cgttcccctg tactggagca tataaagttt gcaagtccaa tgggcctctc 2880

tttccagtga tggccgacta ggccatcttt tgatacatat gcagctagag tcaagagctc 2940

cggggtactg gttagttcat aatgttgttc cacctatagg gttgcagatc cctttagctc 3000

cttgggtact ttctctagct cctccattgg gggccctgtg atccatccaa tagctgacta 3060

tgagcatcca cttctgtgtt ttctaggccc cggcacagtc tcacaagaga cagctatatc 3120

agggtccttt cagcaaaatc ttggtagtgt atgcaatggt gtcagtgttt ggaggctgat 3180

tatgggatgg ctccctggat atggcaatct ctagatggtc catccttttg tctcagctcc 3240

aaactttgtc tctgtaatgt tacaattcta aaatatgcta ttgttggtga atgaaagtgc 3300

tttgctttaa aatttagaag tcagaataaa gcaattatag agaaagcaca gttctaggat 3360

attgctgccc aggaacaaga tgtacaacta agttactaga cacatgggaa tgggagatca 3420

acctcattag tcaccgtgga actgcagatc aaatggcagt cagtctcatt tcacagatat 3480

aatagtctct agctaaaaaa aaaaaaaaaa ggaatggatc ctgaaaagat ctatacactt 3540

gggatctttc ctggaaacct tagtgctcaa ctcctccaca gttttgaaag ccttgttacc 3600

atccttaggg caacccagat gtgagctgca ggtctcctcc tgaacaccag ggggcgaaat 3660

tggaccatat aatcaacaga aaaaactcct gagccatgac caccccttgt atattttctt 3720

tatttacatt tcaaatgttt tcccctttcc aggtctcccc ttcagaaccc ccctatcctt 3780

acctccctcc ccctacctct atgaggatgc tcccacatcc acccattctc attctcctgc 3840

cctggcattc ccctacactg gagtattgaa cagcctcagg cctgagggcc tctcctccca 3900

ctgatatcca ataaggccat cttctgccac atgtttggcc agggccatgt gtcactaatg 3960

gatacagaaa atgtggtaca gccgggcggt agtggcgcac acctttaatc ccagcacttg 4020

ggaggcagag acaggcggaa ttctgagttc gaggccagcc tggtctacaa agtgagttcc 4080

aggacagcct agactataca gagaaaccct gtctcgaaaa acaaaaaagg aagaaagaaa 4140

gaaagaaaga aagaaagaaa gaaagaaaga aagaaagaaa gaaagaaaga aagaaagaaa 4200

gaaagaaaat gtggtacatt tacacaacag actactaatc agccattaaa aacaatgact 4260

tcatgaaatt cataggcaaa tggatggaac ttgagaatat catcctgagt gaggtaactc 4320

agtcacaaag gaacaaacaa acaagatatg tactcactga taagtggata ttagtccaaa 4380

agttcagaat acccaagatt caattcacag actatatgaa gcctaagaag aagaaagacc 4440

aaagtgtgga tgcttcactg cttcttagaa gggtgaacaa aatcttcaca ggaggaaata 4500

cggagacaaa gtgtggagca aagactgaag gaaaggccat ccagagactg ccccacctgg 4560

ggatccattc aacatatagc caccaaacct gtacacgatt gtggatgtca ggaagtgcta 4620

gctgatggaa gccggatctg tctgtctcct aagaagcttt accagagcct gacaagtaca 4680

gaggcaaggc gaaagctcac agtcaaccat tggactgagc atggggtccc agatggagga 4740

gttggagaag ggactgaagg agccgagggg ggtttgcagc ccagtggagg gagcaacagt 4800

gccaagaggc cagaccccca cagagctccc gggatctgga tgaccaacga aagaatacgc 4860

aagaacctat cttgcaacct ggattgcagt ttccatccca gttgacacac ctggagccag 4920

ctccctcacc taaccatact ctgtcctcct ggctctttcc tgtatctctg agaactccat 4980

tctgcaagga cagccgatgt ggtcccttcc atcctccatc atccatggga gagatgggaa 5040

atcccagggc attcaccaag gagaacagag ccatcctggg acggttacgg agggcacggg 5100

cagacctgaa tcacatttgg ccagcacagg aagctgggga ggtctccctg gcaccctcca 5160

taggaaggtg gagcacacag tcctctttcc aggacacaca ggtcacctcc tcctgcacac 5220

ccaggatatg aagcccctga gacaacttgt atcctaggat cagacacatg actggtgtca 5280

tcagtgacga tggatcaggt cctacccagt catcactcag ctaggccttt ccttaaccct 5340

ccagataact ctgccacttc ctgcctggag taaaccccac ctctgtgagc attgagagca 5400

gggcacagag ggctcccatg ggggtttgtg tcactctagg ctacaggaaa tgctggaact 5460

cctgctgcag ttgacagccc caaggccagg gcacagggca ctcctcagcc ttgctgctcg 5520

gagtatgttc tagaacactg aactggaaag aggaatagaa ggacgggagg cccacactga 5580

caggagttca gcattgtcag actcacaggc tccaccccca gcccacgtgg atctgggagg 5640

tgccctcccc tgggaggaga caaagctcct ttaagaaaag cagggcagat atcagggcag 5700

cctggcttag cagtagtgtt ggagaagaag ctagcaggca ggcagcagag acatggagct 5760

ggcctcagca catctccaca aagggcaggt tccctgggga ggactactgc tcacaggtaa 5820

ggagatattc cttcccagta gagagcaggg gagctcagag actggctggg ctcttctggg 5880

agaggaaagg aacctgagaa gggacatctg gcttctgctt gaagcttgac ggcaacagga 5940

agctctcagt gagtgtgaat tggctagtgg tgaggagtaa ctcagttctt tgctattgtt 6000

aagtttatcg cactggctag gatatcctga agactgaagt ccacaactct gtcagggtga 6060

ctctccacgt aaaaaccaaa ctggggcaag taaacagaat tattgcatac tacactgaga 6120

aatttgcaaa tacactaggc tctgggacac tgatttgcaa acaggtatat gagcatgttc 6180

caggcagcag ggattaagcc aaaacccacc tagccctgca cacaaagccc tagttctatg 6240

tgtaaaacgg cagacctgcg agggcatcta ggtggggtca ggttggcaag cacttcagaa 6300

aaaatcaaag cagagaaagg ccagaaaatg aaggtgcagg gtctttagag gagggggtca 6360

gagaagatag gcctacactc agcaaagact ctgagtgtga gctggggtct gagggcagtg 6420

agaggtgagc tgtgtgaatg ccggagaaga gcgtttccta cagtggaaag atgaggaatg 6480

gaggtgatct gctggccacc acccaatagg acacaggcac agcaaggctg agaggttttg 6540

caaaggtcct aagattgata ggtctttctc tcttccctct tagcctcact tttagcctcc 6600

tggagccctg ccaccactgc t 6621

<210> 4

<211> 2335

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 4

aggtagcatg tcctgctgac tgaagcagca gcaggaagga cagggatagg ttgagaagaa 60

accctaggat gaagaaaaag cattaaaatc ctcttggtca agaacatgag tagaaacaga 120

agagccacct gcgttaagac agacagggca gggtcaagta aggcaggtgc aggagcacct 180

gagaagctac agcctgggga cacagctcac tcacagtaaa gagccagccc tgggtgtggt 240

catggcacca ggagcaaagg agagcaaact ccccgtccag aaaagaggga agaacaaata 300

ctagcaggca tcatgattag attaagcaca cagacaccag gaggaaaggc agtgccaagt 360

tgatgcgatg gggcatcact gacaccaagg aactctcagg aagaagctgc ccacagagtc 420

caccccaact ccaggagcaa aggccccgtg tgtgggaata gtagtcacag tgtctcatat 480

ttatttaggg gaagtgacca gcgagatctc acagagcaca aaccctcagc ctccaaccac 540

agtaagtaaa gccaatcaca tgatgagaat tggtgttata ccatgtttcc tgttctgcca 600

ggggatcctg ggattgtaga ccatgccttc ctctcagaat tttcatagaa gagagctcta 660

cttcccagtg ctaaggatcc tacaaatcat acccttccta gctacagaat ggcatgagat 720

agccttggag aagggctcat tccacctagg ctagacaggc cataaagaag tcatgtttcc 780

aggagccatc ctagtgtcat cttccctgca tctattacca atgtgggagt tctaccttgg 840

ggaaactttt ttgttttgtt ggatatgggt gccatggcct ctatgttcct ccttgctatt 900

ggggcatgct gctgagaatc aactctgggc atcttgtgaa ctgggtaaac gtggatggct 960

ctatcccact cctcctatga cactaatgca atgggtccag accagactgg ggagggggag 1020

ccaattcatg aactgatgat gggtcccctc ccccccttct tccccctcca cctgctagaa 1080

atcacaaagc cttattcatg tgcactgata tctttcttcc tcccctagat ctggctcctt 1140

ctgacaactc tcctaacaag gtgagcactg ccacttttgc tggctgtttg tgctacaaaa 1200

tgtctctgag gaaacttggg atatctgtat tgttttgatt tctttgtttg ttgagacagg 1260

atctcaccat gcagccttgg caacctttga actatgtaac caaagttggc cttgaactgt 1320

ggccatacac ttgccttagc ctttcatgtg ctgggatgac aagtgtgtgc taccacacca 1380

agctgagaaa agtattcttg aagagacaca actgtgaaat ccagtatggg tctctactcc 1440

tcaacactgc acagaaagac agactggtca atgggtccca tgagtctact acaagagtgt 1500

gtgttggaat tatctctgcc ctgtggttaa tttctggcta tgactcctag aattccatgg 1560

ctcttgtcaa ggaactaact ctgtgatcct caagtttggt cacctagaca atgggcattc 1620

tgcctacctc aaaaagagga caaaagaata cagcaagttg gcttgtgcac agcccggccc 1680

acgcagggca gcacatcata ctctctcctc cagtctagac ctgctctgac ctcaaagcag 1740

ggcttccatt ctaatgggtc tgagatcctc ttccttcatt ttataacaat tttaccaaga 1800

tttcatcatt taagcctttc aacagagtac acacatagac caaagagtgc tttaagtcat 1860

aaggacactt tcacagtggg aactttcaca aacctgtcct cctccccaga cacacattgc 1920

tgtcacattt cctacagagg gcaaaggcag tctgacagtg cacatcctcg cacacatctg 1980

gctttcattc tgctctgtag cccatctgga tgtctgactg tgagtgacag gagccctctc 2040

ccaccgctgc agcaaggtga ggctcaggcc tgtggtgctg agtcactgat caactctcat 2100

cctttcaggt ggatgacgtc gcatacactg tcctgaactt caattcccag caacccaacc 2160

ggccaacttc agccccttct tctccaagag ccacagaaac agtttattca gaagtaaaaa 2220

agaagtgagc ataatctgtc cgtctgtcct gctggctgca ccagtgatgc attcccggat 2280

tctgttcctc actggagggt ctcagcacac acacacacgt acacatgcgc gcgcg 2335

<210> 5

<211> 14906

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 5

cagctcacta ctgaatccat gccattcaat gttgcagagg ggaaggaggt tcttctcctt 60

gtccacaatc tgccccagca actttttggc tacagctggt acaaagggga aagagtggat 120

ggcaaccgtc aaattgtagg atatgcaata ggaactcaac aagctacccc agggcccgca 180

aacagcggtc gagagacaat ataccccaat gcatccctgc tgatccagaa cgtcacccag 240

aatgacacag gattctacac cctacaagtc ataaagtcag atcttgtgaa tgaagaagca 300

actggacagt tccatgtata ccgtgagtat ttccccatga cctctgggtg ttggggggtt 360

agttctactt cccacacatg ggattgtcag gcctgggctg tgcctgtgtc ctcctctgca 420

ttatacccca tgttaaggtt tgagcatcta gtgcaggaca cactatggga cagacatcaa 480

aataccgaat gtcttactct ggagtcctga tcctgcagac atttgcttca gaggaaggac 540

aatctgatgt gggtaactta tcgaggggag accagtctca atctagcccc ccgggaccct 600

cctggtgaac atgtccctaa gaaagaccca gtaggactca gtcagggtct ggcctgaagg 660

gccttctggg atcctcacag acaagctcag ccctgggaat cccctgctcc aaaacactat 720

cccaaggttc tcaactcctg gtgaccctga ggagcctggg ccagggctag attgtggcct 780

cctgggcagg gctgactagg aacaagaatt tgagctatcc gagggctgtg gctcctggag 840

ctggtaacca gccagggttc aggcactaga gcctcatttg ggcaaggatg gagcctcatc 900

cttcacctta gtctcagcct ggagaggaca gatggacaag ctccccaggc catcagccaa 960

ctgccctggg ggcttaggag atgccatagg aagtttacca tccccaggag gaggaacaga 1020

ggagacagga tgctcctgaa agctccttgt ccaccaggga tcaggctgag aggcactctc 1080

agggaatcca acaacaatag atgctgcggc cgggagcggt ggctcacgcc tgtaatccca 1140

gtactttggg aggccgaggt ggatggacta tgaggtcagg agttcaagac cagcctgacc 1200

aacatggtga aatgccgtct ctactaaaaa tacaaaaaaa ttagccgggc atggtggtgc 1260

gtgcctgtaa tcccagctac tcaggaggct gaggcaggag aattgcttga acctgggagg 1320

cagaggttgc agtgagccaa gatcacacca ctgcacccta gcctgggtga cagagcgaga 1380

ctccatctca aaaaaataaa ataaaataag attgtttatt tgggcagctc ccctatgcag 1440

agctgaggtc aaataattat aaatatttta aggttaattc acagaccaca agataagcca 1500

aaaccattaa ccatttatac tatttcactg aggggaaact gagccacatg gcaataaata 1560

cagtcactat atcaaggtca cacagaccgt aactggaaga tcagttacag gagatttgtc 1620

ttcagccaca gtcttaacct ctcctctact ccaggggcct tatttggcat ctgtggaccc 1680

caataccaga gttggttcag ttcctttctt cttaggcatc tgcagcccag aggggagatc 1740

ctggtctggg gagtgagatc aaatggagaa ttaaccaagt tttattctgg aacttattca 1800

acaattagag tcagtgtcag aaatgttcag gcctgggtca ggtgtggtgg ctcacacctg 1860

taatcccagc actttgggat gccaaggtgg gcggatcact tgaggtcagg agttcaagac 1920

cagcctggcc aacatggtga aaccccatct ctactaaaaa tacaaaaaaa attagtcggg 1980

catggtggtg cgtgcctcta atcccagcta cttgggatgc tgaggtggta gaatcgcttg 2040

aacccaggag gcagaggttg cagtgagcca agatcatgtc actgtgctcc agcctgggag 2100

acagagtgag actctatctc aaaaaaaaaa aaaaaaaatg ttcaggccta tcccctcaga 2160

cccacacctc tgctccctgg acttgaaatc ctgtgtttcc tgatgtgttg gtgtcactcc 2220

cacaggagga taaaggaaag gactttgctt tctctcctca ctcacaccct gcaccagcca 2280

gggcccaata tgaaacacac actcagtagt tctctcacga atgaatgaat gattgaatga 2340

tccataacct ctttagagac tggatctggt tgcagaatcc tgggaggttc ttgccatacc 2400

tgctccccat ccctccagag actgagaccc atgttccatc ccctgaccat ctacccctaa 2460

agccacccca agtaccatat ttcatgtgac tctggggctg ctcggctgtg ggaaggtttt 2520

cagggctccc tggtcttggt tctgggacac cagaggctcc tggttgctgg ggtctctaag 2580

gtcacttagc ccccactgct cactgtcatg ggcatctctg actctctctg ctcctccatg 2640

tcctcatttt cctctccttt tatcccagct gaactgcaca gtttcctcca cccttaggcc 2700

tttcccagac actccgtcta acaaggttga ctgtcctgtt cccttcccgc tcacactgtg 2760

gccaggccca cctcccaggc aataggaaag gcacagaaat gagcccagag cagccacccc 2820

tgccaggtcc atcataagcc actgtcccca cgtccctcag tgaggctgac atgtggaaca 2880

ggccagggga cagggacgag cgactcctcc atacactctc tatactgact caccagggga 2940

tcagaggcag aaggacaggt ctgcagtccc caaagcccat gggctcattt cacccatttg 3000

actcctaact ccatccctgt tctgctgtgg gctcacatcc tctagtggtt cctgggacct 3060

tccccaggta gagctagcca ggcaggtgct gtctgatggg tttgctgccc attccaccta 3120

cacctgtgtc ctcatgatga caccattgtc ataaggtggg atctctcaga cagggagatg 3180

cattagccac tggtgtctca cttagactct gcccaatttg gatgaatttg gtcaaactct 3240

agttgtgttt cctgaggttg agagaaaagg aagaaagcat ttcacatgac tgttttgggt 3300

tcctcttttt acctgaattt ccacaggaat ttgtaacata gaactctttc taaatttact 3360

aacataacat acatcacttc tcctcatatg gcatcgtttc tctcttaatg caattgagaa 3420

ctctcagata tcctttctga caagttctgt cttcctaaca ggacccatga gtcccttcta 3480

cccagggagt cactgatttc aaactgactt cagcatcttt ctttgatcat aacatgatcc 3540

tactggactt cagagcttgg tttaagagtt tatcacgctc tctagagaat attcctctta 3600

ttaattggtg ttagatccag agtattaata acaattttca caaatggaat aatttaactt 3660

ctcaaattta tatcccagat ctaccaaaca cacacggtcc atgagggtca ggctttcagc 3720

aagttcatgc tccttccgtt ccactagctg gttattctgc atttgcaaaa aacccacata 3780

tttcagaaaa gatccacagt gtcatcatct gctcttttca gggggaaaag ggacattaaa 3840

gaccaaagac aaggaacgta gatatgctgt taaatccagg cagcccctgc ctgtcaccct 3900

cactcttttc tagatcattc cttggactct gctctatctt tagggggtca ctggctcaag 3960

tcagtcatca tcaaacacct gggaaaaact gccccacctt gtggttctgc tgcctgacga 4020

ctgagctacc ttcaggcttg cccctggtgt cccctgttat ttctgctgaa acatacagtc 4080

ccaggccagg ctgctcagta tcctcagggt ttaaggacaa taggaagtcc catcatcacc 4140

catctctagg atgtcctcag acagggaagc tgcagagaaa acacacctag tggggcaaag 4200

taggactgtg aagctggaag ggacccagca cctgtatgtt ccaggtgagg acccacaggt 4260

gggtcaggca ggcatcagcc agtcagggaa ggaccagaag tgcctggggc tgtgactccc 4320

agtcctcggt ctgtccacga cccaacactg ctgctcagtt cacacttgag aaagtctgtg 4380

cttctctcac acagagcagg cggcctcacg gtctctgagc cctcagatca ttgcacatct 4440

gtcttgtgaa acacacactt gccatgggct tttagggact tgggttggct gagaggtggg 4500

gagatgccaa ctctgattga aaaatgcccg gacggaatcc cagcactttg ggaggccgag 4560

gcgggcagat cacgaggtca ggaggtcgag accatcctgg ctaacagtga ccatcctggc 4620

taaaatacaa aaaactagcc gggcatggtg gcatgcacct gtagtcccag ctactcagga 4680

ggctgaggca ggagaatcgc ttgaacccgg gaggaagagg ttgcagtgag ccaagatcgt 4740

gccactgcac cccagcctca gcaacaaagc gagactctgt ctcaaaaaaa aaaagaaaga 4800

aagaaaaatg cccagccagg cacggtggct caggcctgta atcccagcac tttgagaggc 4860

cgaatcgggc ggattgccag agctcaggag tttgagacca gcctgggcaa cacggtgaaa 4920

ccccgtcagg catggtggca tgtgcctgta atcccagcta ctcaggaagc tgaggcagga 4980

gaattgcttg aatccaggag gtggaggttg cagtgagccg agatcgtgcc attgcactcc 5040

agcctgggtg acagagcgag gctccatctc caaaaaaaat aaataaataa gaaaagaaaa 5100

atgcctgtgg aggaatcaaa ggtgccacac agggcaatct tctctctgtt ttctgcatag 5160

cggagctgcc caagccctcc atctccagca acaactccaa ccctgtggag gacaaggatg 5220

ctgtggcctt cacctgtgaa cctgagactc aggacacaac ctacctgtgg tggataaaca 5280

atcagagcct cccggtcagt cccaggctgc agctgtccaa tggcaacagg accctcactc 5340

tactcagtgt cacaaggaat gacacaggac cctatgagtg tgaaatacag aacccagtga 5400

gtgcgaaccg cagtgaccca gtcaccttga atgtcacctg tgagtatctt ctgttcctct 5460

gtggcccagg ctgccagccc aaatccacgc agccagaggc caggcctctc agtccctctc 5520

aggtctaagg acgcagaccc ttaaccctgg acacccaggc tggccatgac ttcctttccc 5580

caggcaaacc tgggcagccc cagcctgaac caagaatagg aggggagagg ctgctcctgt 5640

cctgggaggc tcagggtcca cagcctgtga tgggagaaac aggtgaatgt ctcagaccca 5700

gactcagtgg acacaatgga ggtttggtta ggacttcagg gttgtgactt agtagagagg 5760

aacactgtgg cccttctcca gaccagcagc ttccccttcc ctctgatgac atcacctgtg 5820

gctttattct ctttgctcca gatggcccgg acacccccac catttcccct tcagacacct 5880

attaccgtcc aggggcaaac ctcagcctct cctgctatgc agcctctaac ccacctgcac 5940

agtactcctg gcttatcaat ggaacattcc agcaaagcac acaagagctc tttatcccta 6000

acatcactgt gaataatagt ggatcctata cctgccacgc caataactca gtcactggct 6060

gcaacaggac cacagtcaag acgatcatag tcactggtaa gtaattcctg gagcatcaac 6120

actaagatct ggggtacaag ctttctggtt ttcaaatagg agcagagaag aaattttctt 6180

ttgcagcctg tatccaacag gcacaaacaa gtccaaattc tcccctgaac cctctcaatt 6240

catctgtgca gactctcttc cctttgtttt tctgatttct cacagctgac cttaggtcca 6300

gcctggaatg tggggagggg gttctctcag ccccagaaag ccccgtgtag caggaggggc 6360

ttcacagagg gggaagcaga aagggtcctc aaggtcaatt tgcttctgtc actaacatgt 6420

ccctttctgt aacttcttgg ccttctttta cctattccat gagatataag gaatatgtga 6480

ggttttaaaa cagactcaca atagttttcc ctaaatgaga gaaggaaatg cccttcatca 6540

gggatgagca gctcagactc tgctccctgc tctactcccg gcttgcccgg tgattggctc 6600

tgccctgacc ccatgtgggg taggacgcag gtgtgtgcag aaggtgtcca ggtggcctgt 6660

catgaatcca gctaaatcaa gatggcagtc aatggctggg cgctgtggtt catgcctgtg 6720

atcccagtac tttggaaggc cgaggtgaga ggatcacctg aggtcaggag ttcgagacca 6780

gcctgaccaa catggcaaaa ctccatctct actaaaaata caaaaaaaaa aatttagcca 6840

ggcatggttg cacatgccac taggcatgcc actagggagg ctgaggcaca agaatcactt 6900

gaacctggga ggcagaggtt gcaatgagcc gagatggcac cactgcactc cagcctggac 6960

aacacagaga gactctgtct caaaaactaa ataaataaat aaataaaggc agtcaacacc 7020

tgagcctccc ctgggtcagg ctgcctccct gagcttgtcc tggctctgaa gtcaccagct 7080

gtatgaggct gtgggcacag cacatgggat agcacagagc acagcgagtg acccacactt 7140

ggagaaatcg ggagattcag ccacaggggc tctgcattgg agagaatggg caatgccaaa 7200

cagcgtgtat ttgtagagaa ggtaagaata tcagcctttt gttaacactg tgcctactct 7260

aggaatctcc ttcaccgtga tattctatcc acagaccagg aagtaaaact cctctttaca 7320

gtgggaaatc cttcggattg gaactccaga tagtaaggtc atgaagactg gatggggcat 7380

catcattccc taaaaaatta tttaatgaaa aaaaacacta ccttcccttt tgtatgtaaa 7440

gtgacagtca caggaaggat gcctgatcac agctccagga aagggtcagt gggaggccag 7500

gcacagtggc tcacgcctgt aatcccagca ctttgggagg ctgaggcggg tggatcatga 7560

ggtcaggaga tcaagaccat cctggctaac atggtgaaac cccgtctcta ctaaaaatac 7620

aaaaaattag ctgggcatgg tggcacatgc ctttagtccc agctactcgg gaggctgagg 7680

caggagaatg gcttgaaccc gggaggcaga gcttgcagtg agctgagatc atgccactgc 7740

attccggcct gggtgacaga gcgagactcc gtctgaaaaa aaaaaaaaaa aagtcagtgg 7800

gaaaaacatt ctacctgatg atgaggttgc tcggtctgtg cgctgagaag aagattccaa 7860

gtggagatat agagatatcc agagggtcac tctgagacga tctggggtca ggagggaggt 7920

gcagccctct ccttacaatt catcacctga acaaagacac tcgaccttct gcagagggtc 7980

agggctatcc cctggttggt gacctttgca cagctcactg tgggacctga gagctggcta 8040

aaatctcagg gaaaggagca tagccctagg ccccaggccc caaccctatt ctcagtaggt 8100

tatctcagat actctgcttg tccacagagc taagtccagt agtagcaaag ccccaaatca 8160

aagccagcaa gaccacagtc acaggagata aggactctgt gaacctgacc tgctccacaa 8220

atgacactgg aatctccatc cgttggttct tcaaaaacca gagtctcccg tcctcggaga 8280

ggatgaagct gtcccagggc aacaccaccc tcagcataaa ccctgtcaag agggaggatg 8340

ctgggacgta ttggtgtgag gtcttcaacc caatcagtaa gaaccaaagc gaccccatca 8400

tgctgaacgt aaactgtaag tgactcctca ccccttccta tatgtccctc taggattact 8460

ctgtcaatgg tgtgcaaaat ggataaaact cacaggaggc agaatatcaa tgaagagacc 8520

attatagcaa acagaattgc aaagtggtta agagctcagc tcaggccggg cacagtggct 8580

cacgcctgta atcccagcag tttgggaggc caaggcgggc ggatcacgag ggcaggagat 8640

cgagaccatc ctggctaata tggtgaaacc ccgtgtctac taaaaataca aaaaaaaatt 8700

agccgggcat ggtggcgggc gcctgtagtc ccagctactc gggaggctga ggcgggagaa 8760

tggcgtgaac ctgggaggcg gagctttcag tgagccgaga tggtgccact gcactccagt 8820

ctaggcaaca gagcaagact ctgtctcaaa aaaaaaaaaa aaaaaagagc tcaggctctg 8880

aatcaaatat acatatactt agttggtttt ttttggttgg ttgggttttt ttgtttgttt 8940

tgttgttttg agacagggtc tcactctgtc acccaggctg gagtgctgtg gtgtgatcaa 9000

agctcactcc cgcctcaatc ccctgggttc aagcaatcct gccacctcag cctccagagt 9060

agctgggact acaggttgca ccaccatgcc tggctaagtt tttaagtttt tttgtagagt 9120

tggggtttca ctgtgttgcc caggctggtc tcaatctcct ggtctcagcc tcggcctccc 9180

aaagtgctgg gattacagga atgagccact ctgcccaccc cgtatttaac tattctaagt 9240

acctctcata cagatggaag catgcaatat ttgtcctttt gtgtctggct tacttcattt 9300

agcacaatgt cttcaagctc catctatgtt gtagaatgta tcagaatttc attccttttg 9360

gagactgaat aatattatgc tgtgtagata gatacatcac attttgctta tccactcatc 9420

catctatgga cagttaggtt gcttccacct tttggctatt atgaataatg ctattacaaa 9480

catgggtata caaatatctg ttcaaatccc tgctttcagt tcttttaaat agatactcag 9540

aagtggaatt gctggatcaa atggtaatcc tgtttaattt tgaagaacca tcataccatt 9600

ttccacagtg gctataccat ttcacattcc caccagcaat gcactagagt tccaatttct 9660

ctacatcttc aaaaacattt gttgctttct ggttttgttt tgttttttat aatggccatc 9720

ctaatggtta taaggtggta tatcattgga gttttgattt gcacttccct aatgattagc 9780

aatatttagc atcttttcat gtgcttattt gccatttatc ttctttggag aaatgtttat 9840

tcaagtcctt tgcccatgtt ttaattaggt tgtttgggga tttttggttg agttgcagta 9900

gttctttata tattttggat attaatccct tatcagatat atgattctca aatattttct 9960

cccattctat aagaagtctt ttcacttttg tgataatgtg ctttgataca caaaagcttt 10020

taattttcat taagtccaat ttctctactt cttctttcgt tgcctatgct tttagtgtca 10080

tagccaagaa atcattgcca aattcaatgt tccaaagttt tcactctatc ttccaagagc 10140

tttatagttt tagctcttac atttaggtct tttatgcatt ttgaattaat ttttatatat 10200

ggtgttacat aaaggttcaa tttcattctt ttgcatggat atccagtttc ttcaatgcca 10260

tttgttgaaa agactatcct ttccccactg aatgatcttg gcacccttgt caaaaaacat 10320

ttggctatgt atgcaaacat ttctttctgg gctctatatt ttattccact ggtttctatt 10380

tctttttgcc agtaccatac tgttttgatt actgtagctt ttggattttg tttgtttgtt 10440

ttattgttgt tgtttgggtt tttttgtttt gttttgtttt tttgcttttc tttgtagaga 10500

tggcgtttca ccatgttgcc aaggctggtc tcaaactcct gagctcaagc aatccacccg 10560

cctccacctc ccaaagtgct aagattacag gtgtgatgat taccatagct ttgtaaaaaa 10620

ttttgaaacc aggaagtgtg agcccttcaa ctttgttctt tttcaagatt gctttggcta 10680

tccatggtcc cttcagagtc tatataaatt ttagaatgaa tttttctatt tctgcaaaaa 10740

atattactgg aattttgata gagattgcac tgaatctgta gatcactttg ggtagtactg 10800

tcatcttaac aatattaagt cttctaatcc atgaaaatgg ggtgtctttt caatttatgt 10860

cttatttaat ttcttttggc aatgttttgt attttcaggg tacaaatctt tcacctcttt 10920

ggttaagttt atttctaagt atttttaaag ctcttataaa tagaattttt ttcttaattt 10980

tcctttgaat tgttattagt atacaaaaat acaactgatt tttgcatgtg gattttgtat 11040

cctgccactt tgctaaattt attattctaa cagttttttt gtggaatctc tagggttttc 11100

tatatataag ttagtgtatt ctgcaaacag gtataatttt acttctttcc aatctagatg 11160

cttttttttt cttgcctaat tgttctgtct aggtcttcca atactacatt gaatagaaat 11220

ggcaaaagca ggcatccttg tcttgttctt gatcttaaag gaaaagtttt caatctttca 11280

ccattgacta tgatggtagc tgggggtttt cacatgtagc atttattatg ttgagaattt 11340

ccttctattc ctagtttcag tgttttttag catgaaagaa tgttgaattt tgtcaaatgc 11400

ttttatcgac tcattttcat tactggttat aggtctattc agattttcta tttattcatg 11460

attctatcat ggcaggtttt gtgtttctag gaatttgttc atttcatcca ggttatccaa 11520

tttgttggca ttcaattact catagtactc ttataatcct tattatttct gcagaattag 11580

tagtaatgtt ttactttcat ttctgacttt agtaatttga atcttctttc tttcttagtc 11640

aatctaatta acagttgtca attatagtga tcttttttga agaacaactt ttttttttcg 11700

gtttgagaca ggttctcact ctgtcaccga ggctgatcat ggctcaccac agcctcaact 11760

tcccgggttc aagcaatcct cctgcttcag cctcctgggt agctgaaact acagacaagc 11820

actaccacct ccggctaatt tttgtaattt tttgtagaga cagggtttca ccatcttgcc 11880

cagctggtct caaattcctg agctcaagtg atacacctgc ctcagcctcc caaattgctg 11940

ggattacagt catacaccac tgtacctggc ctacagttat aaatttcttt cttgcacaag 12000

attcttaact actctgagcc tcggattcct caaccgaaaa ttgcactgtg aatgcctgct 12060

ccatagtatt gcacgggttt ggggtttttg ttttgttttt gagacagggt gtcactctgt 12120

cacccagact ggagtgcagt ggtgcaaaca cagctcactg cagcctcaac ctcctgggct 12180

caagcaattc cctcacctta gcctcctaag tagtacatac taccacatct ggctaattta 12240

tttttatttt tgttttcaga gagacagaat ctcaccatgt tacccaggct ggactcgaac 12300

tcctgggctc aagcaatcct cccatctgtt tcccaaagtg ctgagattac aggtgtgagc 12360

caccacgcct ggccccatag tgttatttta aagatttaat gtaataataa accttcagca 12420

aaacaccaca cacagaggaa atgtttcata aatgttagct gctattacta ctactattat 12480

cattagcctt gaaatcaggt agtcctaggg tcaaatctca gatccacctc tcactagcca 12540

tctgacttta ggtaagcctt ttaccactct aagcttccat tttttcatgt ttaaaatgga 12600

aataatgtct acctgacagc actattttat ggatcaaata agatacatgt aaagcattta 12660

gcagcacagg gcctggcaca caggaagtac tccacaaaag tagctaacat agcattagtc 12720

accagcctga gttgactggt gagggttaag ccccaaatag ttgcaacaga tataaacaag 12780

aaataggcta gacacagtgg ctcacacctg taatcccaac atttgggagg ccgaggctgg 12840

aggatctctt gagcccagga atccaagacc agcctaggca atatagtgga accctatctc 12900

tacaaaaatt attttttttt aattagccag gtgggtgggc gtggtggctc acgcctgtaa 12960

tcccaacact ttgggaggtt gaggcaggcg ggtcacctga ggtcgggagt tcaagaccag 13020

cctgaccagg atggagaaac cccgtctcta ctaaaaatac aaaattagcc aggcgtggtg 13080

gcgcatgcct gtaatcccag caattcagga ggctgaggta ggagaatcgc ttgaacctgg 13140

gaggcagagg ttgcagtgag ccgagatcac gccattgcac tctagcctag gcaacaagag 13200

caaaactcgg tctcaaaaaa aaaaaaaaga aagaaaaaaa ttagccaggt gtggtggcat 13260

gtgcttatag tctcagctac tgaggagact gaggtgggag gatcacttga tcccaagagg 13320

ctacaatgag ccatgattgt gccactgcac tccagcctgg gtgatagagt gagaacctgc 13380

ctcaaaaaaa aaaaaaaaaa aaaagaagaa gaagaaatag atgcaaaagg tattatttat 13440

atattatata tatatatata tatatatatg gagggagaag cattatacaa gaaacccact 13500

gggacatggc tatgatcaaa tatgggaaag ggggaaaaaa ggaggtaaag caaagtctca 13560

agcctggtat gttagtttcc atctactgag atacagtgaa gatgggatta aacatacgag 13620

ataatttatt ggggaaaatg cctgtgaggg aaagtaaggc gagagtgaga ggaacctcag 13680

accatgatgc agatctgatt cctgtggaag agaaagagag gaaggaagtt ttagattgaa 13740

gtgcagtttt tgtttgtttg ttttttgaga cagagtctca ctctgttgcc caggcttgag 13800

tgcagtagtg tgatctcggc tcactgaaac ctctgccccc cgggttcaag cgattctcct 13860

gcctcagcct ctcaagtagc tgggattata ggcacctgcc accgcaccca gctaaatttt 13920

gtatttttag tagagatagg gtttcaccat cttggccagg ctggtcttga actcctgacc 13980

tcgtgatcca cccgcctcag cctcccaaag ttctgggatt acaggcgtca gccaccgcgc 14040

ccggcctgca gtgcagttct aagagcattt ctgcaaggct gacagggagt cctccagcca 14100

ttcacacttc agaataaaac agtcacacaa aactgggcta gctttcatac ccctgctggg 14160

agcctgtggg aagccagttc tctatgcaaa agaggtggtg aattcagaat gcaccaactg 14220

ccacaactga gacactgaga aaaagatgca accacgaaaa aggtggaaag ttctaatcac 14280

atacaaaata gcaatcagcc tttctcatat ttcaaagcct taaaaatggc tgagcgcaga 14340

aaagccaggg tggaattggc agaagagaga tcatcaacct agaaacatgg tgactggggt 14400

tgggcgcagt ggctcacgcc tataatccaa gcactttggg aggccgaggc aggcggatca 14460

tgaggtcagg agttcaagac cagtctgacc aatatggtga aaccccgtct ctactaaaaa 14520

aatacagaaa ttagccaggt gtggtggcac gtgcctgtag tccagcctga ggcaggagaa 14580

tcgcttgaac ctgggaggcg gaggttgcag tgagccaaga tcatgctact gcactccagc 14640

ctgagagaca gagcaagact ctgtctcaaa aaaaaaaaag aaaagaaaaa aagaaacatg 14700

gtgattgaaa aaaaaaaatt gcaaggatat agttagctaa tcaccttcca gcaaccttcc 14760

cacaacgaaa ctgtattcct tgaaggaaca attagaaact acttcattct gagagttgtt 14820

tcccagcccc cattgtaaaa taatttcact ttcatttctt ctcctctttt ctctccatga 14880

cagataatgc tctaccacaa gaaaat 14906

<210> 6

<211> 47

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 6

acagataatg ctctaccaca agaaaatggc ctctcagatg gcgccat 47

<210> 7

<211> 96

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 7

gtccaggaag agagagaagg gagggactcc aagaagcagc aagactatgc ggtaccgaat 60

tccgaagttc ctattctcta gaaagtatag gaactt 96

<210> 8

<211> 96

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 8

gaagttccta ttctctagaa agtataggaa cttcatcagt caggtacata atggtggatc 60

ccaattgagg tagcatgtcc tgctgactga agcagc 96

<210> 9

<211> 3748

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 9

aaagctcctt taagaaaagc agggcagata tcagggcagc ctggcttagc agtagtgttg 60

gagaagaagc tagcaggcag gcagcagaga catggagctg gcctcagcac atctccacaa 120

agggcaggtt ccctggggag gactactgct cacagcctca cttttagcct cctggagccc 180

tgccaccact gctcagctca ctactgaatc catgccattc aatgttgcag aggggaagga 240

ggttcttctc cttgtccaca atctgcccca gcaacttttt ggctacagct ggtacaaagg 300

ggaaagagtg gatggcaacc gtcaaattgt aggatatgca ataggaactc aacaagctac 360

cccagggccc gcaaacagcg gtcgagagac aatatacccc aatgcatccc tgctgatcca 420

gaacgtcacc cagaatgaca caggattcta caccctacaa gtcataaagt cagatcttgt 480

gaatgaagaa gcaactggac agttccatgt atacccggag ctgcccaagc cctccatctc 540

cagcaacaac tccaaccctg tggaggacaa ggatgctgtg gccttcacct gtgaacctga 600

gactcaggac acaacctacc tgtggtggat aaacaatcag agcctcccgg tcagtcccag 660

gctgcagctg tccaatggca acaggaccct cactctactc agtgtcacaa ggaatgacac 720

aggaccctat gagtgtgaaa tacagaaccc agtgagtgcg aaccgcagtg acccagtcac 780

cttgaatgtc acctatggcc cggacacccc caccatttcc ccttcagaca cctattaccg 840

tccaggggca aacctcagcc tctcctgcta tgcagcctct aacccacctg cacagtactc 900

ctggcttatc aatggaacat tccagcaaag cacacaagag ctctttatcc ctaacatcac 960

tgtgaataat agtggatcct atacctgcca cgccaataac tcagtcactg gctgcaacag 1020

gaccacagtc aagacgatca tagtcactga gctaagtcca gtagtagcaa agccccaaat 1080

caaagccagc aagaccacag tcacaggaga taaggactct gtgaacctga cctgctccac 1140

aaatgacact ggaatctcca tccgttggtt cttcaaaaac cagagtctcc cgtcctcgga 1200

gaggatgaag ctgtcccagg gcaacaccac cctcagcata aaccctgtca agagggagga 1260

tgctgggacg tattggtgtg aggtcttcaa cccaatcagt aagaaccaaa gcgaccccat 1320

catgctgaac gtaaactata atgctctacc acaagaaaat ggcctctcag atggcgccat 1380

tgctggcatc gtgattggag ttgtggctgg ggtggctcta atagcagggc tggcatattt 1440

cctctattcc aggaagtctg gcgggggaag tgaccagcga gatctcacag agcacaaacc 1500

ctcagcctcc aaccacaatc tggctccttc tgacaactct cctaacaagg tggatgacgt 1560

cgcatacact gtcctgaact tcaattccca gcaacccaac cggccaactt cagccccttc 1620

ttctccaaga gccacagaaa cagtttattc agaagtaaaa aagaagtgag cataatctgt 1680

ccgtctgtcc tgctggctgc accagtgatg cattcccgga ttctgttcct cactggaggg 1740

tctcagcaca cacacacacg tacacatgcg cgcgcgcaca cacacacaca cacacacaca 1800

cacacactta cacacacact catgcattca ctctattgac tccttcagtg tctatagaag 1860

aaaaggtgga tcctggagcc tacagaaaac tcaacccttc taggctttca aatttggctg 1920

agagtgaggt atcaaaattt ctcacccttt cactttcctg acccagattg ttgaaaattg 1980

acctattcag agcaccttca ttcccctccc aactccaagt cctgccctat cagagtctga 2040

cttgaatttc cataaacctt ggaggtcacc taagtgctta cgccaaacaa aacaaaacaa 2100

aacaaaacaa aacaaaacaa aacaaaacaa aacaaaccag aagcaggaaa tggccagtcc 2160

catatcttta aaggctgatt ggaagccacc atacatgaga agatcaaacc tccatgggca 2220

atctacacac ccgacaactg tcatgcttac ccatctggga cattcgagtc tctgaacctt 2280

gtgccctcac gcctgagccc ttctctgagc ctttctccag aaaatccact cacagcaact 2340

agagaggctc tttgtcagca actccaagca aactgctagg caggattcag aagaaaagac 2400

agcatctcta acatccacca ggaaggtgcc cagaaaagca gagctggtga ctttggactg 2460

acagacatct ggagtgtgaa aaagcagcac agagctaacc ttcggagagt gttgaaatta 2520

tttgaaaaga agccatattt ggaggtattg gagttttcct ctttctgaga caatccacta 2580

tttgaaaatt gtagctactg aattgcctct cagtatgcga gctgatcact ttgccttagg 2640

gccactagat ttctgtctcc cttagcccct caagcccttt tgatcatgag ttccaaacca 2700

aaaataaata aatgaacagt gaggcagtcc cttgcagtac cactgtcatg ggtcaggcta 2760

agcctcctgc ttttctgaat tagtcaagaa aagccttggt ttcccttttt ccatctcttt 2820

atcttgtctt tcagatactg gccagagcct ggacactctt cctctgagat ctccagcttc 2880

tctgccttct tgtgtttctt ttaaactcta acaaaaactg ttctcacctt caaaaaataa 2940

aataataaca agctttccac atccccacca aagagggacc cagctaggtt tctggaaacc 3000

cagcaccagc ctccagctgc ccttctgcag tgtttctgcc tctgtttccc tttcgttttg 3060

acttttttcc ttcttttgag acagagttcc agcatggagc ctgtgcaggt ttcaatccca 3120

cagtaacacc ttctgcagca ccccacctgc tcagactgca gccctggcca ccaggcctgg 3180

ctacctggac attctgtctg ccctgcactc tcaggaaacc ttggcctctg ctactgtctg 3240

tttggctcat tcaaagtgtg tccttaaagg aatgcagtca cccatgccag aggcagtgtt 3300

tacagcctgg aatgctctgc acttccagtg gaccagtgct ccaccggaag tgggctgtta 3360

gcagggtcct ctcacctggc cctggccttt ctgtagcctt gaatcctgcc ttccccacca 3420

gggcaccagg gatgagtgca gcagcaggag gagaggcaaa cagtcacctc aggaaccttc 3480

tgagctaagg cacaccctct gtgcctgtca agcaaaggtt gtattggata tcaagtgttt 3540

ggtctcacgc caagccaaca ggctttggag agaattaatt agttctccta ctcagggatt 3600

tctttcagtc ctaacacagc ctgtgtatat tttgcttcac ccacgcaatg ctggattatt 3660

taattttgcc cggcttaaga caaatctgag ttacttgtaa atttgctcta tgttcataat 3720

aaaaatgtat tatatatcac tgatagca 3748

<210> 10

<211> 525

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 10

Met Glu Leu Ala Ser Ala His Leu His Lys Gly Gln Val Pro Trp Gly

1 5 10 15

Gly Leu Leu Leu Thr Ala Ser Leu Leu Ala Ser Trp Ser Pro Ala Thr

20 25 30

Thr Ala Gln Leu Thr Thr Glu Ser Met Pro Phe Asn Val Ala Glu Gly

35 40 45

Lys Glu Val Leu Leu Leu Val His Asn Leu Pro Gln Gln Leu Phe Gly

50 55 60

Tyr Ser Trp Tyr Lys Gly Glu Arg Val Asp Gly Asn Arg Gln Ile Val

65 70 75 80

Gly Tyr Ala Ile Gly Thr Gln Gln Ala Thr Pro Gly Pro Ala Asn Ser

85 90 95

Gly Arg Glu Thr Ile Tyr Pro Asn Ala Ser Leu Leu Ile Gln Asn Val

100 105 110

Thr Gln Asn Asp Thr Gly Phe Tyr Thr Leu Gln Val Ile Lys Ser Asp

115 120 125

Leu Val Asn Glu Glu Ala Thr Gly Gln Phe His Val Tyr Pro Glu Leu

130 135 140

Pro Lys Pro Ser Ile Ser Ser Asn Asn Ser Asn Pro Val Glu Asp Lys

145 150 155 160

Asp Ala Val Ala Phe Thr Cys Glu Pro Glu Thr Gln Asp Thr Thr Tyr

165 170 175

Leu Trp Trp Ile Asn Asn Gln Ser Leu Pro Val Ser Pro Arg Leu Gln

180 185 190

Leu Ser Asn Gly Asn Arg Thr Leu Thr Leu Leu Ser Val Thr Arg Asn

195 200 205

Asp Thr Gly Pro Tyr Glu Cys Glu Ile Gln Asn Pro Val Ser Ala Asn

210 215 220

Arg Ser Asp Pro Val Thr Leu Asn Val Thr Tyr Gly Pro Asp Thr Pro

225 230 235 240

Thr Ile Ser Pro Ser Asp Thr Tyr Tyr Arg Pro Gly Ala Asn Leu Ser

245 250 255

Leu Ser Cys Tyr Ala Ala Ser Asn Pro Pro Ala Gln Tyr Ser Trp Leu

260 265 270

Ile Asn Gly Thr Phe Gln Gln Ser Thr Gln Glu Leu Phe Ile Pro Asn

275 280 285

Ile Thr Val Asn Asn Ser Gly Ser Tyr Thr Cys His Ala Asn Asn Ser

290 295 300

Val Thr Gly Cys Asn Arg Thr Thr Val Lys Thr Ile Ile Val Thr Glu

305 310 315 320

Leu Ser Pro Val Val Ala Lys Pro Gln Ile Lys Ala Ser Lys Thr Thr

325 330 335

Val Thr Gly Asp Lys Asp Ser Val Asn Leu Thr Cys Ser Thr Asn Asp

340 345 350

Thr Gly Ile Ser Ile Arg Trp Phe Phe Lys Asn Gln Ser Leu Pro Ser

355 360 365

Ser Glu Arg Met Lys Leu Ser Gln Gly Asn Thr Thr Leu Ser Ile Asn

370 375 380

Pro Val Lys Arg Glu Asp Ala Gly Thr Tyr Trp Cys Glu Val Phe Asn

385 390 395 400

Pro Ile Ser Lys Asn Gln Ser Asp Pro Ile Met Leu Asn Val Asn Tyr

405 410 415

Asn Ala Leu Pro Gln Glu Asn Gly Leu Ser Asp Gly Ala Ile Ala Gly

420 425 430

Ile Val Ile Gly Val Val Ala Gly Val Ala Leu Ile Ala Gly Leu Ala

435 440 445

Tyr Phe Leu Tyr Ser Arg Lys Ser Gly Gly Gly Ser Asp Gln Arg Asp

450 455 460

Leu Thr Glu His Lys Pro Ser Ala Ser Asn His Asn Leu Ala Pro Ser

465 470 475 480

Asp Asn Ser Pro Asn Lys Val Asp Asp Val Ala Tyr Thr Val Leu Asn

485 490 495

Phe Asn Ser Gln Gln Pro Asn Arg Pro Thr Ser Ala Pro Ser Ser Pro

500 505 510

Arg Ala Thr Glu Thr Val Tyr Ser Glu Val Lys Lys Lys

515 520 525

<210> 11

<211> 20

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 11

gctcgactag agcttgcgga 20

<210> 12

<211> 28

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 12

ggagtcaata gagtgaatgc atgagtgt 28

<210> 13

<211> 28

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 13

tatcacaaga gggaataaac cacagggt 28

<210> 14

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 14

attgcaccat gaggttgaac agcat 25

<210> 15

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 15

actcctacac acagagcact aacag 25

<210> 16

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 16

caggccagag gaaatgtaac aaagg 25

<210> 17

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 17

cataaggtgg gatctctcag acagg 25

<210> 18

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 18

gctctgaagt ccagtaggat catgt 25

<210> 19

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 19

cttactcatg gctgccacac tgaga 25

<210> 20

<211> 22

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 20

gccacaactc caatcacgat gc 22

<210> 21

<211> 24

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 21

gctactgcac tccagcctga gaga 24

<210> 22

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 22

attgctggca tcgtgattgg agttg 25

<210> 23

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 23

caggtggctc ttctgtttct actca 25

<210> 24

<211> 25

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 24

gacaagcgtt agtaggcaca tatac 25

<210> 25

<211> 24

<212> DNA/RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 25

gctccaattt cccacaacat tagt 24

Claims

1. A method for constructing a non-human animal humanized with CEACAM1 gene, comprising administering to a mammal a peptide comprising the sequence encoding SEQ ID NO: 2, amino acids 35 to 423, to the CEACAM1 locus.

2. The method of claim 1, comprising administering a peptide comprising SEQ ID NO: 5 to the locus CEACAM1 of a non-human animal.

3. The method of construction according to claim 1 or 2, wherein said replacement at the CEACAM1 locus is a replacement of the CEACAM1 gene encoding SEQ ID NO: 1, amino acids 35 to 419 of said non-human animal being a mouse or a rat.

4. The method for constructing a CEACAM1 polypeptide of claim 1 or 2, wherein said non-human animal expresses a human or humanized CEACAM1 protein with reduced or absent expression of endogenous CEACAM1 protein, said humanized CEACAM1 protein comprising the amino acid sequence of SEQ ID NO: 10, or a pharmaceutically acceptable salt thereof.

5. The method according to claim 1 or 2, wherein the genome of the non-human animal comprises humanized CEACAM1 gene, and mRNA transcribed from the humanized CEACAM1 gene comprises SEQ ID NO: 9, or a nucleotide sequence shown in the specification.

6. The method of claim 1 or 2, wherein the non-human animal is constructed using a targeting vector comprising a nucleic acid sequence encoding the amino acid sequence of SEQ ID NO: 2 or a nucleotide sequence comprising amino acids 35 to 423 of SEQ ID NO: 5, and the targeting vector further comprises a 5 ' arm and/or a 3 ' arm, and the nucleotide sequence of the 5 ' arm is shown as SEQ ID NO: 3, and the nucleotide sequence of the 3' arm is shown as SEQ ID NO: 4, respectively.

7. A targeting vector of CEACAM1 gene, which comprises a nucleotide sequence encoding SEQ ID NO: 2 or a nucleotide sequence comprising amino acids 35 to 423 of SEQ ID NO: 5, and the targeting vector further comprises a 5 ' arm and/or a 3 ' arm, and the nucleotide sequence of the 5 ' arm is shown as SEQ ID NO: 3, and the nucleotide sequence of the 3' arm is shown as SEQ ID NO: 4, respectively.

8. A humanized CEACAM1 protein, wherein the amino acid sequence of the humanized CEACAM1 protein is shown as SEQ ID NO: shown at 10.

9. A humanized CEACAM1 gene encoding the humanized CEACAM1 protein of claim 8, wherein the humanized CEACAM1 gene comprises SEQ ID NO: 5.

10. Use of a non-human animal obtained by the construction method of any one of claims 1 to 6, the humanized CEACAM1 protein of claim 8 or the humanized CEACAM1 gene of claim 9 in CEACAM1 gene or protein related studies, said use comprising:

C) relates to the production of immune processes of human cells and the use of animal experimental disease models for the application in the research of etiology;