CN1884521A - Method for finding novel gene and computer system platform using same and novel gene - Google Patents

Method for finding novel gene and computer system platform using same and novel gene Download PDF

Info

Publication number
CN1884521A
CN1884521A CNA2006100893399A CN200610089339A CN1884521A CN 1884521 A CN1884521 A CN 1884521A CN A2006100893399 A CNA2006100893399 A CN A2006100893399A CN 200610089339 A CN200610089339 A CN 200610089339A CN 1884521 A CN1884521 A CN 1884521A
Authority
CN
China
Prior art keywords
sequence
software
database
program
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006100893399A
Other languages
Chinese (zh)
Inventor
于在林
郑志华
唐元华
富岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Sino Biotechnology Ltd Fortunerock Inc
U S FUYUAN GROUP
Beijing Weiming Fuyuan Gene Pharmaceutical Rearch Center Co Ltd
Original Assignee
Tianjin Sino Biotechnology Ltd Fortunerock Inc
U S FUYUAN GROUP
Beijing Weiming Fuyuan Gene Pharmaceutical Rearch Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Sino Biotechnology Ltd Fortunerock Inc, U S FUYUAN GROUP, Beijing Weiming Fuyuan Gene Pharmaceutical Rearch Center Co Ltd filed Critical Tianjin Sino Biotechnology Ltd Fortunerock Inc
Priority to CNA2006100893399A priority Critical patent/CN1884521A/en
Publication of CN1884521A publication Critical patent/CN1884521A/en
Priority to CNA2007800202904A priority patent/CN101460625A/en
Priority to PCT/CN2007/070153 priority patent/WO2008000186A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/775Apolipopeptides
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P9/00Drugs for disorders of the cardiovascular system
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/92Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving lipids, e.g. cholesterol, lipoproteins, or their receptors
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Wood Science & Technology (AREA)
  • Urology & Nephrology (AREA)
  • Microbiology (AREA)
  • Hematology (AREA)
  • Pathology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Food Science & Technology (AREA)
  • Cell Biology (AREA)
  • Endocrinology (AREA)
  • General Engineering & Computer Science (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Cardiology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)

Abstract

The invention relates to a computer analysis system platform to discover new genes with biological information science analysis, computer analog forecasting technique and molecular biological technology. Especially the invention obtains a portable and effective computer analysis and prediction means and discovers and prepares a series of new genes with biological function using known human genome order data by self-programming and analytical pathway. The invention also discloses two newly-discovered genes which are similar to human apolipoprotein A1BP and they are named with BFC06016 and BFC06104, and their logging numbers in GenBank individually are DQ778079 and DQ778080. The said two genes and their coded protein may be relevant with the internal metabolism of cholesterol and can be used as alternative drug target genes of human cardiovascular disease diagnose and treatment or as gene drug, so they have clinical application significance.

Description

Find the method for new gene and the computer system platform and the new gene of use
Technical field
The present invention relates to set up new biocomputer analytical procedure and by way of, obtain new gene with function.The result proves that this analytical procedure can obtain new gene order, and coincide with disclosed human genome chromosomal DNA sequence.This method can be used for analyzing and obtaining the new biological function that has, and with the diagnosis of human health and disease, gene that treatment is relevant, the objective of the invention is to be intended to find new gene, especially as the gene of genomic medicine or medicine target.
Background technology
1. functional genomics research
Genomic medicine is to be parent material with the functioning gene of finding in the genomics research or the product of gene, that make by biology, molecular biology or relevant art such as biological chemistry, biotechnology and with the biologically active substance product of corresponding analysis technology control intermediate product and final product quality, can be used for some treatment of diseases, prevention and diagnosis clinically.Recombinant protein drug, vaccine, DNA medicine, RNA medicine and gene therapy medicament etc. all belong to genomic medicine.Gene medicine target is meant that with the functioning gene of finding in the genomics research and the product (functional protein) of gene be parent material, make antagonist or inhibitor by biology, chemistry, physics, molecular biology or relevant art such as biological chemistry, biotechnology, for example: obtain specific antibody, make functional protein lose biological activity or filter out material (antibody or micromolecular compound) that micromolecular compound has the biologic activity that suppresses this gene product by antigen, antibodies, be used for the treatment and the diagnostic purpose of human diseases as medicine.
The genomic medicine of " tradition " and medicine target gene find that step is to analyze according to the symptom of disease, to seek out normal people and the patient difference between the every index of Physiology and biochemistry, for example: human growth hormone is because the normal contemporary of patient's height is short and small, find it is that the endogenous disappearance that the human growth hormone hyposecretion causes causes functional defect by various analyses, reach the clinical purpose of treatment by artificial additional these shortage protein (as being to extract human growth hormone from people's urine in early days, the patient is given in injection then).Subsequently along with science and technology development, to separate, the natural protein of purifying checks order, and extrapolates dna sequence dna by protein sequence again, synthetic, compound, detection (DNA detection) goes out gene fragment with its " displaying ", obtains complete sequence again.Express in the origin system (as intestinal bacteria) outside with genetic engineering technique.The recombinant protein of preparation, purifying forms genetically engineered drug by preclinical test (animal experiment) and clinical trial, and this process can be called the genomic medicine discovery procedure of " tradition " or " classics ".
2. the technology and method that is used for new gene discovery
The subject progress of information biology is in the past very fast, and remarkable progress is also arranged, disclosed patent and research document such as Zailin Yu et al (2002), WIPO patent publication# WO 02/052047 A2; USPTOpublication#:20020155473A1; He Fu is elementary, Chinese patent publication number CN1657537A; Tang, YTet al. (2002), USPTO Patent:6,365,371; Bandman, O et al. (2000), USPTO Patent:6,020,164; Hamady M et al. (2006), BMC Bioinformatics, Published online 2006January 4:10.1186/1471-2105-7-1; Schattner P et al. (2006), RNA 12:15-25; Skupski MP et al. (1999), Nucleic Acids Research, 27 (1): 35-38; Aaron Levineet al. (2001), Nucleic Acids Res.29 (19): 4006-4013, Nishikawa T et al. (2000), Genome Informatics (11): 12-23; Legato J et al. (2003), PhysiologicalGenomics (13): 179-181; Gary B et al. (2002), Nucleic Acids Res.30 (23): 5310-5317; Zondervan K et al. (2002), Fertil Steril.78 (4): 777-781; Kontkanen Oet al (2002), Expert Opin Ther Targets.6 (3): 363-374; Kumar R et al (2002), J Mol Biol.319 (3): 593-602.
Chapman MA et al. (2004), Genome Res.14 (2): 313-318; Uenishi H et al (2004), Nucleic Acids Res. (32): 484-488; Bass MP et al (2004), Pac SympBiocomput. (9): 93-103; Ritter M et al (2001), GENOMICS.79:693-702; YonanAL et al. (2003), Genes Brain Behav. (5): 303-320; Zhang Deli, etc., Acta Genetica Sinica.2004 31 volume 5 phases: 431-443.
Li Yongqing, etc., life science.Calendar year 2001s 5 volume 2 phases: 141-145; Zhu ChuanBing, etc., 2004 27 volumes of Journal of Natural Science of Hunan Normal University, 3 phases: 79-82; Qi Zhenyu, etc., Chinese experimental surgery magazine.2005 22 volume 7 phases: 849-851; Xie Zhengxiang, etc., the Chinese medicine physics magazine.Volume 1 phase: 62-63 etc. had described the application of bioinformation and the discovery and the analysis of new gene respectively in 2006 23, awarded as pertinent literature of the present invention and quoted.
Summary of the invention
The present invention relates to utilize the system platform technology of computer compilation bioinformation handling procedure and foundation, particularly this program and technology can be used to find new gene and analyze its product, the feasible mankind can more clearly understand the relation between expression of gene and the disease, improve the level of disease treatment.
The present invention adopts the program " oppositely " opposite with the process of above-mentioned routine " tradition " to carry out the research of genomic medicine functional genomics, purpose is to accelerate the screening operation of novel gene medicine greatly, the present invention's design shows with the genomic medicine finding method of conventional " tradition " to be compared, with other existing gene discovery technology compare with method more simple and direct, lower to the computer equipment requirement, be convenient to operation and grasp simultaneously, can shorten time several years to obtain the result.
The at first self-editing novel computer program software treatment system of the present invention is single-minded carries out genomic medicine and the screening of medicine target gene.Self-editing program is to utilize human genome DNA's sequence of having published, by a series of programs software operation linux system platform, predicts new protein (gene) sequence (ORF) coding.This operation system of software will consider that the generation of kinds of Diseases, disease, the operating system that forms mechanism, mechanism, genetics information combine with advantage, as utilize biology information technology to predict secretion peptide, signal peptide, stride the film district, and with various existing functional genomics means and computational tool, with a new self-programmed software system carry out integration, its function that increases and computing means to be to reach prediction screening, the splicing that can carry out novel gene.Its two, the possible ORF sequence that computer forecast is gone out by functional genomics research, is used high-throughput screening method, step is to carry out the genomic medicine screening at cell levels and animal level.Utilize Protocols in Molecular Biology to finish the splicing clone and the amplification of gene.Then resulting dna sequence dna information is utilized comparative biology and pharmacology technology, laboratory methods such as applying gene regulation and control, gene knockout and insertion (Knock out/in), transfection, sense-rna, SiRNA, study the assignment of genes gene mapping, expression, overexpression, low expression level, differential expression, with the method for high flux screening such as quantitative PCR, biochip technology, decide the biological function of new predicted gene by screening verification, thereby obtain to have original genomic medicine medicine target gene the medicine target gene.They are three years old, further study the biochemical characteristic and the cell function of candidate gene medicine target gene, by immunohistochemistry, pathology with other Forecasting Methodology is carried out efficiently, determine its biology specifically, using value clinically, thereby obtain new original genomic medicine candidate, and, verify gene and the product effect clinically thereof found to the clinical preceding cytology of their expansion and animal experiment research and clinical trial.
Computer bioinformation of the present invention is found and the analytical technology platform, can be used for human new gene discovery, also can also be used for, but be not limited to, and gene discovery of animal, plant, microorganism and analysis purposes are used.
Therefore, the present invention relates to the analysis of biological information program of computer compilation and set up feasible platform technology, being about to existing disclosed human genome research data and information analyzes a large amount of data, library with the program of design and running of the present invention, therefrom obtain new predicted gene, purpose is solve to use conventional art to obtain new gene, in technology and the deficiency that exists on the time.In a word, compare with conventional analysis of biological information method, the programming that the present invention relates to has following advantage: 1) energy real-time analysis and the new possible gene of acquisition; 2) schedule of operation is easy and efficient; 3) the new gene of Huo Deing has biological function and the possibility that has as the clinical application of genomic medicine and gene medicine target.
Technology and the method for utilizing the present invention to describe are actually used in openly genome sequence of people, carry out the analysis and the acquisition of new gene, and the result shows the computer forecast technology, can make the inventor obtain to have in a large number the possible gene of biological function.The present invention is two and conjugated protein (the ApolioproteinA1 BindingProtein of human apolipoprotein A1 to be found only, APA1BP) similarly, the gene of report is not an example as yet, proves that biocomputer simulation and forecast pattern described in the invention is feasible.
Low-density lipoprotein (LDL) and high-density lipoprotein (HDL) (HDL) are arranged in the human blood.Protein in the lipoprotein is called lipophorin (Apolipoprotein).Lipoprotein combines with cholesterol, forms lipoprotein cholesterol, carries out the running of cholesterol inside and outside cell.The clinical meaning that high density lipoprotein cholesterol reduces can be pointed out easy trouble coronary heart disease.Coronary heart disease, cerebro-vascular diseases that the clinical meaning that low density lipoprotein cholesterol increases can point out easy trouble atherosclerosis to be caused.The committed step of cholesterol antiport is that cholesterol is transferred on the extracellular lipoprotein in cell, and the important component of all kinds of lipoprotein is lipophorins.Lipophorin is responsible for different lipoprotein is transported to each position of health.Lipophorin is the protein that is positioned at the lipoprotein surface, is combined in certain sequence by amino acid.They are present in all kinds of lipoprotein with different ratios in a variety of forms.Various lipoprotein are also because of the kind difference of its contained lipophorin, and have different functions and different pathways metabolisms.
Ritter, M etc. announced the lipophorin protein interactive protein that of its discovery is new in 2002, and with its called after AI-BP (apoA-I binding protein).The gene of AI-BP coding, APOA1BP, it is positioned at karyomit(e) 1q21, constitutes 2.5kb by 6 exons and 5 introns.Northern hybridization analysis proof APOA1BP mRNA generally expresses, and at kidney, heart, liver, Tiroidina, suprarenal gland and testis camber are expressed.AI-BP albumen is not found in normal people's serum, but high-caliber AI-BP is arranged in sepsis syndromes patient's serum sample.Healthy people's AI-BP albumen has very significant amount in cerebrospinal fluid and urine.Stimulate renal proximal tubular cell with apoA-I or HDL, can bring out the secretion of the AI-BP of concentration dependent, if use apoA-II, BSA, or the LDL stimulation just can not produce secretion.And this situation only occurs in kidney proximal tubule, and apoA-I can not stimulate the secretion of AI-BP in other tissue.Test understands in the kidney tube cell, AI-BP play an important role to apoA-I degraded or in absorbing again (Ritter, etc., Genomics, 79:693-702,2002).Find new to have mutual action protein gene with lipophorin, may make we make clear better relevant with cholesterol metabolic by way of, the generation of prevention and control and cardiovascular related diseases and treatment.
Therefore, the present invention has also announced the new gene of two similar lipophorin associated protein utilizing this program and method acquisition first, and they are seated on No. 19 karyomit(e) of people.These two genes have 1 with now disclosed lipophorin mutual action protein gene is different) be seated in the coloured differently body; 2) do not have the secretion peptide; 3) protein amino acid sequence is compared with known ApoA1BP gene, and the homology of 40.0% (BFC06016) and 41.5% (BFC06104) is only arranged.
1, a kind of computer simulation prognoses system platform of new gene discovery
The invention describes various known or disclosed bioinformation data, library obtain with its by the localization work content, the library and the data that are obtained have, but be not limited to, the NCBI remote data base is downloaded the relevant latest data storehouse of needed analysis of biological information.Comprising: people's expressible dna sequence label database, nonredundancy protein sequence database, Nucleotide database, patent protein sequence database, human chromosome sequence library etc.At local computer all these are downloaded the database of returning and format processing.Be translated into the Format Series Lines database that local program can be discerned.
Contain in these libraries and deliver and disclosed human chromosomal determined dna sequence result and people mRNA, cDNA sequencing result library, disclosed protein sequence database.
All applied databases of emphasis description in the present invention, library and database are all from the data that openly can obtain, and empirical tests and local computer digital processing, and form and can transfer by local computer at any time, and can with programming fusion of the present invention and sequencing.
Analysis of biological information program used in the present invention mainly contains, but is not limited to,
The software that is used for sequence alignment has: the blast software package of blastall:NCBI (U.S. state-run biotechnology information center), can realize the comparison work of gene order roughly; Wu-blast: the blast software package of University of Washington, institute's work can be more outstanding aspect the retrieval analysis of new gene for it; The sequence alignment software package of Fasta.EMBL (European Molecular Bioglogy Laboratory); Clustalw: multisequencing compare of analysis software; Sim4: expressed sequence and chromogene group sequence alignment software;
The software that is used for the database editor has: the nucleotide sequence database format software of pressdb:Wu-blast program special use; Im_index: be mainly used in sequence library is set up index, realize the operability of big database;
The protein sequence database format software of setdb:Wu-blast program special use;
The software that is used for sequence assembly has: Cap4/Phrap: the sequence assembly software of University of Washington's genome research establishment; Merger: simple sequence assembly software;
The software that is used for the aminoacid sequence function prediction has: tmpred: predicted protein matter sequence stride film; Signalp. the signal peptide of predicted protein matter sequence; Remap: sequence restriction enzyme site analysis software; Restrict: the sequence enzyme is cut Information Statistics software; Showorf:DNA sequence translation software; Pepinfo: graphics mode shows the amino acid whose content of various different propertiess in the protein sequence; Pepstats: various amino acid whose content draw molecular weight simultaneously in the statistics protein sequence, iso-electric point, the absorbance value of electrically charged and 280nm; Pepwheel: graphics mode shows the helix wheel of all amino-acid residues in the protein sequence; Proparam: be mainly used in the comprehensive proteinic hydrophilic/hydrophobic of determining; Tmap: the proteinic diaphragm area of striding of graphic presentation; Ps_scan: protein active site/functional domain analysis software.
The invention provides a kind of independently biocomputer program and be used for the prediction of new gene and instance analysis as a result.The present invention comprises also that all are worked out and can and form new gene discovery of the present invention and analytical technology system platform for the program of operation on local computer.Particularly, include, but are not limited to,
The software that is used for the sequence editor has: mainly contain tbl2fasta_n/fasta2tbl_n. Format Series Lines switching software, the sequence of fasta form can be transferred to the sequence of table format; Gb2fasta: the Format Series Lines switching software transfers the sequence of genbank form to the sequence of fasta form; Tt_comp_dna: sequence software for editing, dna sequence dna reverse complemental program; Translate: sequence software for editing, dna sequence dna translation program; Gb2cds: the sequence software for editing, obtain the CDS sequence in the sequential file of GenBank form; Tt_zip_2: the sequence software for editing is mainly used in two simple sequence fragments of merging, and filters out the repeating part between them.
The software that is used for database manipulation has: im_delete: the database software for editing, can realize the deletion to any one sequence in the database; Im_insert: the database software for editing, can realize the insertion of sequence library is increased the operation of sequence; Im_retrieve: database software for editing, batch or single some sequence of obtaining in the large database; Tt_get: DNA is carried out in the indexed data storehouse, protein sequence obtains the software of operation to interim not setting up; Rfetch: database manipulation software, directly obtain sequence data on the GenBank by network remote; Lfetch: database manipulation software, directly obtain sequence data on the local data base by local network; Biofaseqindex: the database software for editing, set up the program of index at the database of Fasta form; Biogbseqindex: the database software for editing, set up the program of index at the database of GenBank form; Tt_subseq_genome: genome sequence is carried out the software that fragment sequence obtains; Tt_sub_seq: sequence editor's software, conveniently obtain certain fragment in the sequence.
Being used for the sequence alignment interpretation of result makees the software of figure and have: drawBlast:blast result does the figure program, can make roughly comparison synoptic diagram by the result data of blast.
The software that is used for data parsing has: tt_tmpred_p: data parser software is exclusively used in and resolves tt_tmpred generation analysis result data; Parser_bx: resolver software, to blastn, the software that the result of blastp blastx supervisor output resolves; Parser_fasta: resolver software, the software that the result that fasty comparison program is exported resolves; Ps_signalp: data parser software, resolve the result data that the pepsigp program produces; Tt_pblast:blastn result resolves software, a large amount of results is exported the realization machine analyze automatically;
The software that is used for auxiliary other program run has: tt_cycle: assistant software is mainly used in mating part and can not realizes that the program of automated operation realizes the comprehensive automation operation;
Again the software of You Huaing has: ed_cap4: the Cap4 program of recompility, and realization can be finished the configuration of cap4 running environment automatically; Extractcontigs: the file that the score matrix data of cap4 output is transferred to the fasta form; Pepsigp: the signalp software of recompility, program that originally can only single predicted signal peptide is improved, realize that automatization is predicted comprehensively in batches; Primers_for_fulllength_clone: batch primer-design software; Tt_fasty_1: improved fasty program, main purpose are to realize handled easily; Tt_tmpred: the protein sequence after the recompility is striden the diaphragm area prediction, and the sequence after the improvement can realize batch quantity analysis.
With top software, especially all the combination and the coordinated operation of software constitute basis of the present invention.
In another aspect of this invention, the invention provides a kind of method of finding new gene, this method may further comprise the steps:
1) from the protein sequence database of having published, obtain length less than 300AA or 400AA or 500AA, first-selected is 300AA, preferably 400AA, all proteins sequence of 500AA more preferably, and transfer these sequences to uniform layout;
2) above-mentioned protein sequence being carried out striding in batches the diaphragm area analysis therefrom gets rid of and contains all sequences of striding diaphragm area;
3) sequence that keeps is carried out the coding sequence of secretory signal peptide analysis in batches;
4) sequence fragment that obtains is compared to the expressed sequence tag library as model, obtain to have the expressed sequence tag of certain coupling;
5) expressed sequence tag is spliced; With
6) sequence with the given data storehouse compares, and obtains new full-length gene.
In an embodiment, enumerated from the NCBI remote data base and downloaded various nucleotide sequences library, patent protein library, people's expressible dna sequence label database, the process and the method for nonredundancy protein sequence database, human chromosome sequence library and other Relational database.
In an embodiment, enumerate the utilization of various analysis of biological information softwares and system, particularly worked out special Computerized analysis system platform, made each stand alone software analytical system collaborative work to carry out the discovery and the analytical work of new gene.
In another embodiment, enumerated the computer operating routine analyzer general flow frame diagram of new gene discovery.
According to embodiment, the computer that the present invention finishes and the soft project of establishment are independent and complete bioinformation treatment system platforms, it can duplicate, copies and transplant, and can be used for, but be not limited to, the discovery and the functional analysis of new gene, demonstration, teaching, commercial purpose, clinical treatment and medical diagnostic applications etc.
2. the new discovery that is similar to aPoA 1BP gene
This information processing platform among the present invention is applied to the discovery and the analysis of new protein sequence, and (embodiment 3 is seen in concrete operations) obtained 38 possible new protein sequences.Wherein two are similar to known aPoA 1BP gene, disclose as an example.These two new genes are BFC06016 and BFC06104, have at the nucleotide sequence shown in Seq ID No.1 and the Seq ID No.3; They number are respectively DQ778079 and DQ778080 in the GenBank typing.Be respectively the sequence shown in SEQ ID NO:2 and the SEQ ID NO:4 by nucleotide sequence coded amino acid.Utilize various softwares and analysis of biological information technology, obtain the range protein analysis of data, the existence that comprises proteinic parent/hydrophobicity, secretion peptide whether, protein possible space structure picture, stride membrane structure analysis, protein spiral structure and possible function prediction etc.
In general, newfound gene can obtain by the full DNA sequence synthetic method, and uses it among biology and clinical application research and the product development purposes.The present invention has itemized the dna sequence dna synthetic method and the technology of a full gene in an embodiment.Mainly be to utilize PCR method distribution synthetic DNA segment to be assembled into complete genome sequence then, and passed through the synthetic result of dna sequencing checking.These sequences of the present invention can be applicable to pharmacodiagnosis, preferably relevant with cardiovascular disorder diagnosis and the medicine of therapeutic purpose or medicine target gene, more preferably genomic medicine or gene therapy medicine target.
Description of drawings
Fig. 1, be used for the computer operating routine analyzer general flow frame diagram (A) of new gene discovery and (B).
The dna nucleotide sequence (A-1) of Fig. 2, newfound two new gene BFC06016 (A) that are similar to aPoA 1BP and dna nucleotide sequence (B-1) and the corresponding amino acid sequence (B-2) with it of corresponding amino acid sequence (A-2) and BFC06104 (B) with it.
Fig. 3, utilize ProParam software the BCF06016 (A) of computer forecast and BFC06104 (B) to be carried out the analysis of protein hydrophobic/wetting ability prediction.
Fig. 4, utilize the tmpred/tmap analysis software to carry out protein to stride diaphragm area and analyze the protein of BFC06016 gene and stride the diaphragm area analytical results, prove that its nothing strides film district (Fig. 4 A), in like manner, proved that also BFC06104 does not have the film of striding district (Fig. 4 B).
Fig. 5, utilize pepwheel graphic presentation to go out the helix wheel of each amino-acid residue in this protein sequence, Fig. 5 A is that BFC06016 and Fig. 5 B are the result that BFC06104 gal4 amino acid helix wheel is analyzed.
Fig. 6, utilize pepinfo to count various amino acid whose content of different nature and distribution thereof in this protein sequence, Fig. 6 A has shown the result to the BFC06016 genetic analysis, and Fig. 6 B has shown the result to the BFC06104 genetic analysis.
Fig. 7, BFC06016 (A) and BFC06104 (B) assignment of genes gene mapping are on No. 19 chromosomal DNA sequence of people.
Amino acid identity relatively between Fig. 8, BFC06016 and BFC06104 and known aPoA 1BP gene three.Star word symbol (*) represents between three's gene amino acid identical; This amino acid is inequality between blank character () expression three; The next point symbol (.) represented amino acid is homology not, but belongs to homogeneity type amino acid; Two point (:) represented amino acids homology not up and down, and belong to not homogeneity type amino acid.Amino acid identity between BFC06016 and aPoA 1BP is 40.0%; Amino acid identity between BFC06104 and aPoA 1BP is 41.5%.
Fig. 9, complete synthesis new gene nucleotide series schema by computer forecast.
Embodiment
Embodiment 1, the needed database of analysis of biological information are downloaded and are obtained
By connecting the NCBI remote data base, download the relevant latest data storehouse of needed analysis of biological information.Comprising: people's expressible dna sequence label database, nonredundancy protein sequence database, Nucleotide database, patent protein sequence database, human chromosome sequence library etc.At local computer all these are downloaded the database of returning and format processing.Be translated into the Format Series Lines database that local program can be discerned.
Embodiment 2, program are collected and are write:
Analysis of biological information program used in the present invention all is to derive from public channel or business software, mainly contains the blast software package of blastall:NCBI (U.S. state-run biotechnology information center), can realize the comparison work of gene order roughly; Wu-blast: the blast software package of University of Washington, institute's work can be more outstanding aspect the retrieval analysis of new gene for it; The sequence alignment software package of Fasta:EMBL (European Molecular Bioglogy Laboratory); Cap4/Phrap: the sequence assembly software of University of Washington's genome research establishment; Tmpred: predicted protein matter sequence stride film; Signalp: the signal peptide of predicted protein matter sequence; Clustalw: multisequencing compare of analysis software; Pressdb: database software for editing, the nucleotide sequence database format software of wu-blast program special use; Sim4: expressed sequence and chromogene group sequence alignment software; Im_index: the database software for editing, be mainly used in sequence library is set up index, realize the operability of big database; Setdb: database software for editing, the protein sequence database format software of wu-blast program special use; Remap: sequence restriction enzyme site analysis software; Restrict: the sequence enzyme is cut Information Statistics software; Showorf:DNA sequence translation software; Pepinfo: graphics mode shows the amino acid whose content of various different propertiess in the protein sequence; Pepstats: various amino acid whose content draw molecular weight simultaneously in the statistics protein sequence, iso-electric point, the absorbance value of electrically charged and 280nm; Pepwheel: graphics mode shows the helix wheel of all amino-acid residues in the protein sequence; Proparam: be mainly used in the comprehensive proteinic hydrophilic/hydrophobic of determining; Tmap: the proteinic diaphragm area of striding of graphic presentation.
For carrying out the computer program of the present invention's establishment: mainly contain following software: tbl2fasta_n/fasta2tbl_n: the Format Series Lines switching software, it can transfer the sequence of fasta form to the sequence of table format; Gb2fasta: the Format Series Lines switching software can transfer the sequence of genbank form to the sequence of fasta form; DrawBlast:blast result does the figure program, can make roughly comparison synoptic diagram by the result data of blast; Ed_cap4: the Cap4 program of recompility, can realize finishing automatically the configuration of cap4 running environment; Extractcontigs: the file that the score matrix data of cap4 output is transferred to the fasta form; Im_delete: the database software for editing, can realize the deletion to any one sequence in the database; Im_insert: the database software for editing, can realize the insertion of sequence library is increased the operation of sequence; Im_retrieve: database software for editing, batch or single some sequence of obtaining in the large database; Pepsigp: the signalp software of recompility, program that originally can only single predicted signal peptide is improved, realize that automatization is predicted comprehensively in batches; Primers_for_fulllength_clone: batch primer-design software; Ps_signalp: data parser software, resolve the result data that the pepsigp program produces; Ps_scan: protein active site/functional domain analysis software; Translate: database software for editing, dna sequence dna translation program; Tt_comp_dna: database software for editing, dna sequence dna reverse complemental program; Tt_cycle: assistant software is mainly used in mating part and can not realizes that the program of automated operation realizes the comprehensive automation operation; Tt_fasty_1: improved fasty program, complicated parameter and some empirical values directly can be composed to the fasty program, make the fasty program to be used in combination with tt_cycle, reach the purpose that realizes handled easily; Tt_get: be to be used for that DNA is carried out in the indexed data storehouse, protein sequence obtains the software of operation to interim not setting up; Tt_pblast:blastn is used for the result to resolve software, a large amount of results is exported the realization machine analyze automatically; Tt_sub_seq: sequence editor's software is certain fragment that is used for conveniently obtaining sequence; Tt_subseq_genome: be to be used for genome sequence is carried out the software that fragment sequence obtains; Tt_tmpred: be that protein sequence after optimizing is again striden the diaphragm area forecasting software, make that the sequence after improving can realize batch quantity analysis; Tt_tmpred_p: data parser software is to be exclusively used in to resolve tt_tmpred generation analysis result data; Tt_zip_2: the sequence software for editing, be mainly used in two simple sequence fragments of merging, and filter out the repeating part between them: biofaseqindex: the database software for editing is the program that is used for setting up at the database of Fasta form index; Biogbseqindex: the database software for editing is the program that is used for setting up at the database of GenBank form index; Gb2cds: the sequence software for editing is the CDS sequence that is used for obtaining the sequential file of GenBank form; Parser_bx: resolver software is to be used for blastn, the software that the result of blastp blastx supervisor output resolves; Parser_fasta: resolver software is to be used for the software that the result to fasty comparison program output resolves; Rfetch: database manipulation software is to be used for directly obtaining sequence data on the GenBank by network remote; Lfetch: database manipulation software is to be used for constituting basis of the present invention by the sequence data software system that local network directly obtains on the local data base.
Embodiment 3, new gene obtain operating process
Computer operating routine analyzer general flow frame diagram apportion Fig. 1 (A) of new gene discovery of the present invention and (B).At first resolve the patent albumen database and obtain in the database length less than 300AA by script, also can be 300AA to 500AA, the all proteins sequence (program has fasta2tbl_n, tbl2fasta_n), by tt_cycle, cooperate tt_tmpred eligible to all, for example less than 300AA, protein sequence stride the dynamic prediction of diaphragm area, the result directly gives program tt_tmpred_p by pipeline and resolves, and excludes to contain all sequences of striding diaphragm area; The sequence that remains is given to pepsigp, carries out the coding sequence of secretory signal peptide analysis, the result is given to ps_signalp and resolves; The aminoacid sequence fragment that gets access to is done the tblastn comparison as model to people's expressed sequence tag database, and (parameter setting is: B=50000 by the parameter adjustment setting; V=50000; S=300), can get access to and contain the full sequence segment that meets parameter request, give tt_pblast by pipeline and resolve, obtain all qualified expression sequence labels, and give script it is carried out ployA and ployT replacement filtration; Set up cap4 and move necessary environment.The sequence of these fasta forms is all changed into the file of xml data interchange format with fastaclust2caml.Meanwhile start Cap4 and Phrap software splices these sequences respectively comprehensively, with extractcontigs spliced data file is reduced to the file of FASTA form, collating sequence afterwards; Earlier these sequences and non-secretion albumen database are done the blastx compare of analysis by setting, parser_bx resolves and excludes the sequence that all mate fully, again the sequence and the patent protein sequence database of remainder are done the tt_fasty_1 compare of analysis, parser_fasta resolves the back and obtains remaining sequence; By the time variable control circulation program and the human chromosome sequence library of remainder are done blastn comparison check analysis, do sudden change or disappearance problem on the blastn comparison correction sequence with the patent nucleotide sequence database, do with nucleotide sequence database and human expressed sequence tag database that the blastn compare of analysis solves the not enough problem of sequence length and the nonredundancy albumen database is done whether being found of blastx checking sequence; Contrast this five results that operating analysis repeatedly obtained, can draw full-length gene order.Use Sim4 software can determine the particular location of this full-length gene on karyomit(e); Use ProParam can carry out proteinic hydrophobicity/wetting ability forecast analysis; Use signalp to carry out the coding sequence of secretory signal peptide analysis to this protein; Use tmpred and tmap can carry out protein and stride the diaphragm area analysis; Use garnier can analyze this proteinic secondary structure; But use pepwheel graphic presentation to go out the helix wheel of each amino-acid residue in this protein sequence; Use pepinfo can count in this protein sequence various amino acid whose content of different nature and roughly demonstrate these amino acid whose distributions with synoptic diagram; Use pepstat can add up in this protein sequence various amino acid whose content and obtain molecular weight, iso-electric point, the information such as absorbance value of electrically charged and 280nm; Collect relative literature by the PubMed literature search simultaneously and the gene of being found is carried out the prediction of biological function aspect.
Embodiment 4, the new BFC06016 that is similar to aPoA 1BP and the acquisition of BFC06104 gene
Obtain operating process according to above new gene and carry out actually operating in the terminal of server, we have obtained 38 protein sequences of computer forecast, belong to possible new gene candidate.Wherein, now be numbered BFC06016 and BFC06104 respectively with similar two the new genes of aPoA 1BP gene.Seq ID No.1 and Seq ID No.2 are that dna sequence dna and the aminoacid sequence of BFC06016 is listed among Fig. 2-A.Seq ID No.3 and Seq ID No.4 are that dna sequence dna and the aminoacid sequence of BFC06104 is listed among Fig. 2-B.These two genes have deposited U.S. Genbank database in, obtain Accession ID DQ778079 and ID DQ778080 respectively.Use self-editing genetic analysis program and known gene biological information computer process software, as utilizing ProParam to carry out proteinic hydrophobicity/wetting ability forecast analysis, the result of acquisition shows that GRAVY (the Grandaverage of hydropathicity) value of wetting ability (in+2~-2 scopes) is respectively :-0.015 and-0.115; Signalp carries out the coding sequence of secretory signal peptide analysis to this protein, and (Fig. 3 has shown the result that the BFC06016 coding sequence of secretory signal peptide is analyzed, and proves its no coding sequence of secretory signal peptide; In like manner, prove that also BFC06104 (Fig. 3 B) does not have coding sequence of secretory signal peptide yet); Utilize the tmpred/tmap analysis software to carry out protein and stride diaphragm area and analyze the protein of BFC06016 gene and stride the diaphragm area analytical results, prove that its nothing strides film district (Fig. 4 A), in like manner, proved that also BFC06104 does not have the film of striding district (Fig. 4 B); Utilize pepwheel graphic presentation to go out the helix wheel of each amino-acid residue in this protein sequence, Fig. 5 A is that BFC06016 and Fig. 5 B are the result that BFC06104 gal4 amino acid helix wheel is analyzed; Utilize pepinfo to count various amino acid whose content of different nature and distribution thereof in this protein sequence, Fig. 6 A has shown the result to the BFC06016 genetic analysis, and Fig. 6 B has shown the result to the BFC06104 genetic analysis.
Between embodiment 5, acquisition and aPoA 1 binding-protein gene people of the kind gene BFC06016 and the BFC06104 and the comparison between known aPoA 1BP
Use Sim4 software and determined the particular location of this full-length gene on karyomit(e); Known human apolipoprotein A1BP gene is to be seated in No. 1 karyomit(e) of people (seeing document Ritter et al Genetics, 79:693-702,2002).BFC06016 and the BFC06104 gene predicted via the computer analysis method of the present invention's design are to be respectively seated on No. 19 karyomit(e) of people to see Fig. 7-A and Fig. 7-B.Obtaining the full length cDNA sequence of BFC06016 and BFC06104 gene in people's cDNA library, is respectively Seq ID No.5 and Seq ID No.6.And aminoacid sequence is relatively seen Fig. 8 between the three of known person aPoA 1BP.Itself and aPoA 1BP amino acid identity are respectively 41.5% and 40.0%.Star word symbol (*) represents between three's gene amino acid identical; This amino acid is inequality between blank character () expression three; The next point symbol (.) represented amino acid is homology not, but belongs to homogeneity type amino acid; Two point (:) represented amino acids homology not up and down, and belong to not homogeneity type amino acid.Amino acid identity between BFC06016 and aPoA 1BP is 40.0%; Amino acid identity between BFC06104 and aPoA 1BP is 41.5%.
Embodiment 6, molecule clone technology summary
Conventional molecule clone technology comprises the extraction of DNA, RNA, sepharose and polyacrylamide gel electrophoresis, the connection of dna fragmentation, digestion with restriction enzyme reacts equal reference literature (Maniatis etc., the publication of " molecular cloning laboratory manual " cold spring harbor laboratory, cold spring port, New York, 1982).The enzyme that archaeal dna polymerase chain reaction (PCR) (reference literature Saikiet etc., science, 230:1350,1985) is used and react required PCR instrument and be Perkin Elmer product.And with reference to producer's schedule of operation.The oligonucleotide primer of dna sequencing and the required usefulness of DNA cloning is finished by functional body.The competence intestinal bacteria are buied by GIBCO/BRL company.The purifying of plasmid DNA, the recovery of dna fragmentation etc. all adopt the preparation of commodity Qiagen purification column.Use pichia yeast bacterium or BL21DE3 bacterial strain to be used for protein expression and preparation.
Embodiment 7, BFC06016 and BFC06104 gene complete synthesis
With the BFC06016 gene is example, how to tell about design dna oligonucleotide primer fragment, utilizes round pcr to carry out the gene of complete synthesis computer forecast, and its synthetic route is seen Fig. 9.
Seq ID No.7:5 '-CACATATGAGCAGCGCA GCCGGCCCAG ACCCGTCGGA GGCGCCCGAAGAGCGGC-3 ' synthesizes 1-57 normal chain, long 54 bases;
Seq ID No.8:5 '-GGGCGGCTGCCTCCGCGGTGCTGAGGAAATGCCGCTCTTCGGGCGCCTCCG-3 ' synthesizes the 37-87 reverse complemental, long 51 bases;
Seq ID No.9:5 '-C CGCGGAGGCA GCCGCCCTGG AGCGGGAGCT GCTGGAGGATTATCGCTTTG GGCGGC-3 ' 70-126 normal chain, long 57 bases.
SeqID No.10:5 '-CAGCCACGGCACTAGCATGACCGCACAGCTCCACGAGCTGCTGCCGCCCAAAGCGA TA-3 ' 111-168 reverse complemental, long 58 bases.
Seq ID No.11:5 '-TGCTAGTGC CGTGGCTGTG ACCAAGGCGT TCCCGTTGCC CGCTCTCTCCCGGAAGCAG-3 ' 152-209 normal chain, long 58 bases.
Seq ID No.12:5 '-CTGCCCCGTTCTGCTCCGGGCCACACACGACCAGCACCGTCCTCTGCTTCCGGGAG AG-3 ' 195-252 reverse complemental, long 58 bases.
Seq ID No.13:5 '-GC AGAACGGGGC AGTGGGGCTG GTCTGTGCCC GGCACCTGCGGGTGTTTGAG TATGA-3 ' 239-295 normal chain, long 57 bases.
Seq ID No.14:5 '-GCAGGTCCAGCGAGCGTGTGGGGTAGAAGATGGTGGGTTCATACTCAAACACCCGC-3 ' 278-333 reverse complemental, long 56 bases.
Seq ID No.15:5 '-CACGC TCGCTGGACC TGCTGCATCG GGACCTGACC ACCCAGTGCGAGAAGATGGA C-3 ' 316-371 normal chain, long 56 bases.
Seq ID No.16:5 '-ATGAGCTGCACCTCAGTGGGCAGGTAGCTCAGGAAGGGGATGTCCATCTTCTCGC-3 ' 358-412 reverse complemental, long 55 bases.
Seq ID No.17:5 '-CC TGCCCACTGA GGTGCAGCTC ATTAACGAAG CCTATGGGCTGGTGGTGGAT GCCGT-3 ' 389-445 normal chain, long 57 bases.
Seq ID No.18:5 '-GGGGCCCCCGACCTCGCCCGGCTCCACGCCGGGGCCCAGTACGGCATCCACCACC-3 ' 431-485 reverse complemental, long 55 bases.
Seq ID No.19:5 '-GCCGGGC GAGGTCGGGG GCCCCTGCAC CCGCGCGCTG GCCACGCTCAAGCTGCTGTC C-3 ' 464-521 normal chain, long 58 bases.
Seq ID No.20:5 '-GCCTGAGGGGATGTCCAGGCTCACGAGGGGGATGGACAGCAGCTTGAGCGTGGCC-3 ' 500-554 reverse complemental, long 55 bases.
Seq ID No.21:5 '-CATCCCCTC AGGCTGGGAC GCAGAGACCG GCAGCGATTC GGAGGACGGGCTGCGGCCTG-3 ' 542-600 normal chain, long 59 bases.
Seq ID No.22:5 '-GCGCAGCGCTTGGGCGCCGCGAGAGACACCAGCACGTCAGGCCGCAGCCCGTCCTC CGA-3 ' 579-637 reverse complemental, long 59 bases.
Seq ID No.23:5 '-CGTGCTGGT GTCTCTCGCG GCGCCCAAGC GCTGCGCTGG CCGCTTCTCCGGGCGCCACC-3 ' 602-660 normal chain, long 59 bases.
Seq ID No.24:5 '-CTTGCGGCGCACGTCATCGGGCACGAACCTGCCGGCCACGAAGTGGTGGCGCCCGG AGA-3 ' 646-704 reverse complemental, long 59 bases.
Seq ID No.25:5 '-TG ACGTGCGCCG CAAGTTCGCT CTGCGCCTGC CGGGATACACGGGCACCG-3 ' 689-738 normal chain, long 50 bases.
Seq ID No.26:5 '-TAGCGGCCGCTCACAGTGCCGCGACGCAGTCGGTGCCCGTGTATCCCGGC-3 ' 719-768 reverse complemental, long 50 bases.
Seq ID No.27:5 '-CACATATGAT GAGCAGCGCA G-3 ' 1-21 normal chain, long 21 bases.
Seq ID No.28:5 '-TAGCGGCCGCTCACAGTGCCGC-3 ' 747-768 reverse complemental, long 22 bases.
Concrete operations are summarized as follows:
First few ribonic acid strand primer with dna sequence dna to be synthesized is a starting point, and at first per 4 few ribonic acid chains are one group, utilizes round pcr to synthesize a long-chain dna segment.For example Seq ID No.7, Seq ID No.8, Seq ID No.9 and Seq ID No.10 are one group.In the PCR of 25 microlitres damping fluid reaction volume, the content of primer is respectively the primer of 100pM:1pM:1pM:100pM, 20mM dNTP, the T4 DNA poly synthetic enzyme (T4 Taq Polymerase) of an amount of water and 1u.In the PCR instrument, with 94 ℃ 30 seconds, 55 ℃ 30 seconds, 72 ℃ were repeated 25 circulations in 30 seconds, last 72 ℃ of insulations 5 minutes, 4 ℃ are saved to the synthetic DNA segment and carry out purifying procedure.This product is first group of product.So prepare each group product.The product equal proportion of every two adjacent groups is mixed then, in the PCR damping fluid that has Taq enzyme and dNTP to exist, carry out the PCR circulating reaction of 5 programs earlier, and then the few ribonic acid strand primer at adding two ends (herein, first and second groups of product combinations in this way just add SeqID No.7 and each 100pM of Seq ID No.14).Adopt same PCR cycling program to carry out the bigger dna segment of prepared in reaction.But 72 ℃ of soaking times in the circulation can suitably increase.Can finish designed DNA complete sequence synthetic work by operation chart 9 demonstrations.Use this program BFC06016 and BFC06104 computer forecast gene order and obtain synthetic and preparation, its 5 ' end contains Nde I restriction enzyme.PCR synthetic full length DNA is inserted in the pTA carrier, and contains two EcoRI and NotI site respectively about the insertion site.Dna sequence dna is correct through order-checking calibrating proof institute synthetic dna sequence dna.This plasmid called after pTA-BFC06016.
Reference
Zailin?Yu?et?al(2002),WIPO?patent?publication#?WO?02/052047?A2;USPTOpublication#:20020155473A1.
Tang,YT?et?al.(2002),USPTO?Patent:6,365,371.
Bandman,O?et?al.(2000),USPTO?Patent:6,020,164.
Hamady?M?et?al.(2006),BMC?Bioinformatics.2006;7:1;Published?online?2006January?4:10.1186/1471-2105-7-1.
Schattner?P?et?al.(2006),RNA?12:15-25.
Skupski?MP?et?al.(1999),Nucleic?Acids?Research,27(1):35-38.
Aaron?Levine?et?al.(2001),Nucleic?Acids?Res.29(19):4006-4013.
Nishikawa?T?et?al.(2000),Genome?Informatics(11):12-23
Legato?J?et?al.(2003),Physiological?Genomics(13):179-181.
Gary?B?et?al.(2002),Nucleic?Acids?Res.30(23):5310-5317.
Zondervan?K?et?al.(2002),Fertil?Steril.78(4):777-781.
Kontkanen?O?et?al(2002),Expert?Opin?Ther?Targets.6(3):363-374.
Kumar?R?et?al(2002),J?Mol?Biol.319(3):593-602.
Ritter?M?et?al(2001),GENOMICS.79:693-702.
Chapman?MA?et?al.(2004),Genome?Res.14(2):313-318.
Uenishi?H?et?al(2004),Nucleic?Acids?Res.(32):484-488.
Bass?MP?et?al(2004),Pac?Symp?Biocomput.(9):93-103.
Ritter?M?et?al(2001),GENOMICS.79:693-702.
Yonan?AL?et?al.(2003),Genes?Brain?Behav.(5):303-320.
He Fu is elementary, Chinese patent publication number CN1657537A.
Zhang Deli, etc., Acta Genetica Sinica.2004 31 volume 5 phases: 431-443.
Li Yongqing, etc., life science.Calendar year 2001s 5 volume 2 phases: 141-145.
Zhu ChuanBing, etc., 2004 27 volumes of Journal of Natural Science of Hunan Normal University, 3 phases: 79-82.
Qi Zhenyu, etc., Chinese experimental surgery magazine.2005 22 volume 7 phases: 849-851.
Xie Zhengxiang, etc., the Chinese medicine physics magazine.2006 23 volume 1 phase: 62-63.
Sequence table
<110〉Beijing WeiMingFuYuan gene Drug Research center Co., Ltd
Tianjin Puying Bioisystech Co., Ltd
Tianjin Fu Yuan group
<120〉a kind of gene discovery of genomic medicine and medicine target and computer simulation prognoses system platform that function is determined of being used for
And the discovery of the conjugated protein new gene of similar lipophorin
<130>GBI06CN0282
<160>28
<170>PatentIn?version?3.3
<210>1
<211>750
<212>DNA
<213〉homo sapiens (Homo sapiens)
<400>1
atgagcagcg?cagccggccc?agacccgtcg?gaggcgcccg?aagagcggca?tttcctcagc 60
accgcggagg?cagccgccct?ggagcgggag?ctgctggagg?attatcgctt?tgggcggcag 120
cagctcgtgg?agctgtgcgg?teatgctagt?gccgtggctg?tgaccaaggc?gttcccgttg 180
cccgctctct?cccggaagca?gaggacggtg?ctggtcgtgt?gtggcccgga?gcagaacggg 240
gcagtggggc?tggtctgtgc?ccggcacctg?cgggtgtttg?agtatgaacc?caccatcttc 300
taccccacac?gctcgctgga?cctgctgcat?cgggacctga?ccacccagtg?cgagaagatg 360
gacatcccct?tcctgagcta?cctgcccact?gaggtgcagc?tcattaacga?agcctatggg 420
ctggtggtgg?atgccgtact?gggccccggc?gtggagccgg?gcgaggtcgg?gggcccctgc 480
acccgcgcgc?tggccacgct?caagctgctg?tccatccccc?tcgtgagcct?ggacatcccc 540
tcaggctggg?acgcagagac?cggcagcgat?tcggaggacg?ggctgcggcc?tgacgtgctg 600
gtgtctctcg?cggcgcccaa?gcgctgcgct?ggccgcttct?ccgggcgcca?ccacttcgtg 660
gccggcaggt?tcgtgcccga?tgacgtgcgc?cgcaagttcg?ctctgcgcct?gccgggatac 720
acgggcaccg?actgcgtcgc?ggcactgtga 750
<210>2
<211>249
<212>PRT
<213〉homo sapiens
<400>2
Met?Ser?Ser?Ala?Ala?Gly?Pro?Asp?Pro?Ser?Glu?Ala?Pro?Glu?Glu?Arg
1 5 10 15
His?Phe?Leu?Ser?Thr?Ala?Glu?Ala?Ala?Ala?Leu?Glu?Arg?Glu?Leu?Leu
20 25 30
Glu?Asp?Tyr?Arg?Phe?Gly?Arg?Gln?Gln?Leu?Val?Glu?Leu?Cys?Gly?His
35 40 45
Ala?Ser?Ala?Val?Ala?Val?Thr?Lys?Ala?Phe?Pro?Leu?Pro?Ala?Leu?Ser
50 55 60
Arg?Lys?Gln?Arg?Thr?Val?Leu?Val?Val?Cys?Gly?Pro?Glu?Gln?Asn?Gly
65 70 75 80
Ala?Val?Gly?Leu?Val?Cys?Ala?Arg?His?Leu?Arg?Val?Phe?Glu?Tyr?Glu
85 90 95
Pro?Thr?Ile?Phe?Tyr?Pro?Thr?Arg?Ser?Leu?Asp?Leu?Leu?His?Arg?Asp
100 105 110
Leu?Thr?Thr?Gln?Cys?Glu?Lys?Met?Asp?Ile?Pro?Phe?Leu?Ser?Tyr?Leu
115 120 125
Pro?Thr?Glu?Val?Gln?Leu?Ile?Asn?Glu?Ala?Tyr?Gly?Leu?Val?Val?Asp
130 135 140
Ala?Val?Leu?Gly?Pro?Gly?Val?Glu?Pro?Gly?Glu?Val?Gly?Gly?Pro?Cys
145 150 155 160
Thr?Arg?Ala?Leu?Ala?Thr?Leu?Lys?Leu?Leu?Ser?Ile?Pro?Leu?Val?Ser
165 170 175
Leu?Asp?Ile?Pro?Ser?Gly?Trp?Asp?Ala?Glu?Thr?Gly?Ser?Asp?Ser?Glu
180 185 190
Asp?Gly?Leu?Arg?Pro?Asp?Val?Leu?Val?Ser?Leu?Ala?Ala?Pro?Lys?Arg
195 200 205
Cys?Ala?Gly?Arg?Phe?Ser?Gly?Arg?His?His?Phe?Val?Ala?Gly?Arg?Phe
210 215 220
Val?Pro?Asp?Asp?Val?Arg?Arg?Lys?Phe?Ala?Leu?Arg?Leu?Pro?Gly?Tyr
225 230 235 240
Thr?Gly?Thr?Asp?Cys?Val?Ala?Ala?Leu
245
<210>3
<211>900
<212>DNA
<213〉homo sapiens
<400>3
atgagcagcg?cagccggccc?agacccgtcg?gaggcgcccg?aagagcggca?tttcctcagg 60
gccttggagc?tgcagccccc?acttgccgac?atgggaagag?cggagcttag?ctcaaatgct 120
accacctccc?ttgtccagag?gaggaaacag?gcctggggaa?ggcagtcatg?gctagagcag 180
atttggaacg?cagggcctgt?ttgccagagc?accgcggagg?cagccgccct?ggagcgggag 240
ctgctggagg?attatcgctt?tgggcggcag?cagctcgtgg?agctgtgcgg?tcatgctagt 300
gccgtggctg?tgaccaaggc?gttcccgttg?cccgctctct?cccggaagca?gaggacggtg 360
ctggtcgtgt?gtggcccgga?gcagaacggg?gcagtggggc?tggtctgtgc?ccggcacctg 420
cgggtgtttg?agtatgaacc?caccatcttc?taccccacac?gctcgctgga?cctgctgcat 480
cgggacctga?ccacccagtg?cgagaagatg?gacatcccct?tcctgagcta?cctgcccact 540
gaggtgcagc?tcattaacga?agcctatggg?ctggtggtgg?atgccgtact?gggccccggc 600
gtggagccgg?gcgaggtcgg?gggcccctgc?acccgcgcgc?tggccacgct?caagctgctg 660
tccatccccc?tcgtgagcct?ggacatcccc?tcaggctggg?acgcagagac?cggcagcgat 720
tcggaggacg?ggctgcggcc?tgacgtgctg?gtgtctctcg?cggcgcccaa?gcgctgcgct 780
ggccgcttct?ccgggcgcca?ccacttcgtg?gccggcaggt?tcgtgcccga?tgacgtgcgc 840
cgcaagttcg?ctctgcgcct?gccgggatac?acgggcaccg?actgcgtcgc?ggcactgtga 900
<210>4
<211>299
<212>PRT
<213〉homo sapiens
<400>4
Met?Ser?Ser?Ala?Ala?Gly?Pro?Asp?Pro?Ser?Glu?Ala?Pro?Glu?Glu?Arg
1 5 10 15
His?Phe?Leu?Arg?Ala?Leu?Glu?Leu?Gln?Pro?Pro?Leu?Ala?Asp?Met?Gly
20 25 30
Arg?Ala?Glu?Leu?Ser?Ser?Asn?Ala?Thr?Thr?Ser?Leu?Val?Gln?Arg?Arg
35 40 45
Lys?Gln?Ala?Trp?Gly?Arg?Gln?Ser?Trp?Leu?Glu?Gln?Ile?Trp?Asn?Ala
50 55 60
Gly?Pro?Val?Cys?Gln?Ser?Thr?Ala?Glu?Ala?Ala?Ala?Leu?Glu?Arg?Glu
65 70 75 80
Leu?Leu?Glu?Asp?Tyr?Arg?Phe?Gly?Arg?Gln?Gln?Leu?Val?Glu?Leu?Cys
85 90 95
Gly?His?Ala?Ser?Ala?Val?Ala?Val?Thr?Lys?Ala?Phe?Pro?Leu?Pro?Ala
100 105 110
Leu?Ser?Arg?Lys?Gln?Arg?Thr?Val?Leu?Val?Val?Cys?Gly?Pro?Glu?Gln
115 120 125
Asn?Gly?Ala?Val?Gly?Leu?Val?Cys?Ala?Arg?His?Leu?Arg?Val?Phe?Glu
130 135 140
Tyr?Glu?Pro?Thr?Ile?Phe?Tyr?Pro?Thr?Arg?Ser?Leu?Asp?Leu?Leu?His
145 150 155 160
Arg?Asp?Leu?Thr?Thr?Gln?Cys?Glu?Lys?Met?Asp?Ile?Pro?Phe?Leu?Ser
165 170 175
Tyr?Leu?Pro?Thr?Glu?Val?Gln?Leu?Ile?Asn?Glu?Ala?Tyr?Gly?Leu?Val
180 185 190
Val?Asp?Ala?Val?Leu?Gly?Pro?Gly?Val?Glu?Pro?Gly?Glu?Val?Gly?Gly
195 200 205
Pro?Cys?Thr?Arg?Ala?Leu?Ala?Thr?Leu?Lys?Leu?Leu?Ser?Ile?Pro?Leu
210 215 220
Val?Ser?Leu?Asp?Ile?Pro?Ser?Gly?Trp?Asp?Ala?Glu?Thr?Gly?Ser?Asp
225 230 235 240
Ser?Glu?Asp?Gly?Leu?Arg?Pro?Asp?Val?Leu?Val?Ser?Leu?Ala?Ala?Pro
245 250 255
Lys?Arg?Cys?Ala?Gly?Arg?Phe?Ser?Gly?Arg?His?His?Phe?Val?Ala?Gly
260 265 270
Arg?Phe?Val?Pro?Asp?Asp?Val?Arg?Arg?Lys?Phe?Ala?Leu?Arg?Leu?Pro
275 280 285
Gly?Tyr?Thr?Gly?Thr?Asp?Cys?Val?Ala?Ala?Leu
290 295
<210>5
<211>944
<212>cDNA
<213〉homo sapiens
<400>5
cctccctcca?cggatgcgct?taaaaggcgg?tggcggtggc?ggcagcgccc?ggcgcccggg 60
ctcacctcgg?ccatgagcag?cgcagccggc?ccagacccgt?cggaggcgcc?cgaagagcgg 120
catttcctca?gcaccgcgga?ggcagccgcc?ctggagcggg?agctgctgga?ggattatcgc 180
tttgggcggc?agcagctcgt?ggagctgtgc?ggtcatgcta?tgtgccgtgg?ctgtgaccaa 240
ggcgttcccg?ttgcccgctc?tctcccggaa?gcagaggacg?gtgctggtcg?tgtgtggccc 300
ggagcagaac?ggggcagtgg?ggctggtctg?tgcccggcac?ctgcgggtgt?ttgagtatga 360
acccaccatc?ttctacccca?cacgctcgct?ggacctgctg?catcgggacc?tgaccaccca 420
gtgcgagaag?atggacatcc?ccttcctgag?ctacctgccc?actgaggtgc?agctcattaa 480
cgaagcctat?gggctggtgg?tggatgccgt?actgggcccc?ggcgtggagc?cgggcgaggt 540
cgggggcccc?tgcacccgcg?cgctggccac?gctcaagctg?ctgtccatcc?ccctcgtgag 600
cctggacatc?ccctcaggct?gggacgcaga?gaccggcagc?gattcggagg?acgggctgcg 660
gcctgacgtg?ctggtgtctc?tcgcggcgcc?caagcgctgc?gctggccgct?tctccgggcg 720
ccacacttcg?tggccggcag?gtgcgtgccc?gatgacgtgc?gccgaaagtt?cgctctgcgc 780
ctgccgggat?acacgggcac?cgactggcgt?cgcggcactt?gtgaccgcca?cccgggggca 840
cacccggatg?gaccctcggc?aattaaacag?cctcccacaa?aaaaaaaaaa?aaagaacaaa 900
aacaaaagaa?ggaggaggac?ctaagataaa?cacagagaga?gagc 944
<210>6
<211>711
<212>cDNA
<213〉homo sapiens
<400>6
agcgggactt?gccgacatgg?gaagagcgga?gcttagctca?aatgctacca?cctcccttgt 60
ccagaggagg?aaacaggcct?ggggaaggca?gtcatggcta?gagcagattt?ggaacgcagg 120
gcctgtttgc?cagagcaccg?cggaggcagc?cgccctggag?cgggagctgc?tggaggatta 180
tcgctttggg?cggcagcagc?tcgtggagct?gtgcggtcat?gctagtgccg?tggctgtgac 240
caaggcgttc?ccgttgcccg?ctctctcccg?gaagcagagg?acggtgctgg?tcgtgtgtgg 300
cccggagcag?aacggggcag?tggggctggt?ctgtgcccgg?cacctgcggg?tgtttgagta 360
tgaacccacc?atcttctacc?ccacacgctc?gctggacctg?ctcatcggga?cctgaccacc 420
cagtgcgaga?agatggacat?ccccttcctg?agctacctgc?ccactgaggt?gcagctcatt 480
aacgaagcct?atgggctggt?ggtggatgcc?gtactgggcc?ccggcgtgga?gccgggcgag 540
gtcgggggcc?cctgcacccg?cgcgctggcc?acgctcaagc?tgctgtccat?ccccctcgtg 600
agcctggaca?tcccctcagg?ctgggacgca?gagaccggca?gcgattcgga?gggacgggct 660
gcggcctgac?gtgctggtgt?ctctcgcggc?gcccaagcgc?ttcgctggcc?a 711
<210>7
<211>54
<212>DNA
<213〉artificial sequence
<220>
<223〉1-57 normal chain
<400>7
cacatatgag?cagcgcagcc?ggcccagacc?cgtcggaggc?gcccgaagag?cggc 54
<210>8
<211>51
<212>DNA
<213〉artificial sequence
<220>
<223〉37-87 reverse complemental
<400>8
gggcggctgc?ctccgcggtg?ctgaggaaat?gccgctcttc?gggcgcctcc?g 51
<210>9
<211>57
<212>DNA
<213〉artificial sequence
<220>
<223〉70-126 normal chain
<400>9
ccgcggaggc?agccgccctg?gagcgggagc?tgctggagga?ttatcgcttt?gggcggc 57
<210>10
<211>58
<212>DNA
<213〉artificial sequence
<220>
<223〉111-168 reverse complemental
<400>10
cagccacggc?actagcatga?ccgcacagct?ccacgagctg?ctgccgccca?aagcgata 58
<210>11
<211>58
<212>DNA
<213〉artificial sequence
<220>
<223〉152-209 normal chain
<400>11
tgctagtgcc?gtggctgtga?ccaaggcgtt?cccgttgccc?gctctctccc?ggaagcag 58
<210>12
<211>58
<212>DNA
<213〉artificial sequence
<220>
<223〉195-252 reverse complemental
<400>12
ctgccccgtt?ctgctccggg?ccacacacga?ccagcaccgt?cctctgcttc?cgggagag 58
<210>13
<211>57
<212>DNA
<213〉artificial sequence
<220>
<223〉239-295 normal chain
<400>13
gcagaacggg?gcagtggggc?tggtctgtgc?ccggcacctg?cgggtgtttg?agtatga 57
<210>14
<211>56
<212>DNA
<213〉artificial sequence
<220>
<223〉278-333 reverse complemental
<400>14
gcaggtccag?cgagcgtgtg?gggtagaaga?tggtgggttc?atactcaaac?acccgc 56
<210>15
<211>56
<212>DNA
<213〉artificial sequence
<220>
<223〉316-371 normal chain
<400>15
cacgctcgct?ggacctgctg?catcgggacc?tgaccaccca?gtgcgagaag?atggac 56
<210>16
<211>55
<212>DNA
<213〉artificial sequence
<220>
<223〉358-412 reverse complemental
<400>16
atgagctgca?cctcagtggg?caggtagctc?aggaagggga?tgtccatctt?ctcgc 55
<210>17
<211>57
<212>DNA
<213〉artificial sequence
<220>
<223〉389-445 normal chain
<400>17
cctgcccact?gaggtgcagc?tcattaacga?agcctatggg?ctggtggtgg?atgccgt 57
<210>18
<211>55
<212>DNA
<213〉artificial sequence
<220>
<223〉431-485 reverse complemental
<400>18
ggggcccccg?acctcgcccg?gctccacgcc?ggggcccagt?acggcatcca?ccacc 55
<210>19
<211>58
<212>DNA
<213〉artificial sequence
<220>
<223〉464-521 normal chain
<400>19
gccgggcgag?gtcgggggcc?cctgcacccg?cgcgctggcc?acgctcaagc?tgctgtcc 58
<210>20
<211>55
<212>DNA
<213〉artificial sequence
<220>
<223〉500-554 reverse complemental
<400>20
gcctgagggg?atgtccaggc?tcacgagggg?gatggacagc?agcttgagcg?tggcc 55
<210>21
<211>59
<212>DNA
<213〉artificial sequence
<220>
<223〉542-600 normal chain
<400>21
catcccctca?ggctgggacg?cagagaccgg?cagcgattcg?gaggacgggc?tgcggcctg 59
<210>22
<211>59
<212>DNA
<213〉artificial sequence
<220>
<223〉579-637 reverse complemental
<400>22
gcgcagcgct?tgggcgccgc?gagagacacc?agcacgtcag?gccgcagccc?gtcctccga 59
<210>23
<211>59
<212>DNA
<213〉artificial sequence
<220>
<223〉602-660 normal chain
<400>23
cgtgctggtg?tctctcgcgg?cgcccaagcg?ctgcgctggc?cgcttctccg?ggcgccacc 59
<210>24
<211>59
<212>DNA
<213〉artificial sequence
<220>
<223〉646-704 reverse complemental
<400>24
cttgcggcgc?acgtcatcgg?gcacgaacct?gccggccacg?aagtggtggc?gcccggaga 59
<210>25
<211>50
<212>DNA
<213〉artificial sequence
<220>
<223〉689-738 normal chain
<400>25
tgacgtgcgc?cgcaagttcg?ctctgcgcct?gccgggatac?acgggcaccg 50
<210>26
<211>50
<212>DNA
<213〉artificial sequence
<220>
<223〉719-768 reverse complemental
<400>26
tagcggccgc?tcacagtgcc?gcgacgcagt?cggtgcccgt?gtatcccggc 50
<210>27
<211>21
<212>DNA
<213〉artificial sequence
<220>
<223〉1-21 normal chain
<400>27
cacatatgat?gagcagcgca?g 21
<210>28
<211>22
<212>DNA
<213〉artificial sequence
<220>
<223〉747-768 reverse complemental
<400>28
tagcggccgc?tcacagtgcc?gc 22

Claims (10)

1, a kind of method of finding new gene, this method may further comprise the steps:
1) from the protein sequence database of having published, obtain length less than 300AA or 400AA or 500AA, first-selected is 300AA, preferably 400AA, all proteins sequence of 500AA more preferably, and transfer these sequences to uniform layout;
2) above-mentioned protein sequence being carried out striding in batches the diaphragm area analysis therefrom gets rid of and contains all sequences of striding diaphragm area;
3) sequence that keeps is carried out the coding sequence of secretory signal peptide analysis in batches;
4) sequence fragment that obtains is compared to the expressed sequence tag library as model, obtain to have the expressed sequence tag of certain coupling;
5) expressed sequence tag is spliced; With
6) sequence with the given data storehouse compares, and obtains new full-length gene.
2, the computer system platform that is used for new gene discovery and functional analysis, this system is based on (SuSE) Linux OS, and it comprises the bioinformation and the genetic analysis program of (a), local server-side networkization; (b), tbl2fasta_n/fasta2tbl_n: the Format Series Lines switching software can transfer the sequence of fasta form to the sequence of table format; Gb2fasta: the Format Series Lines switching software transfers the sequence of genbank form to the sequence of fasta form; DrawBlast:blast result does the figure program, can make roughly comparison synoptic diagram by the result data of blast; Ed_cap4: the Cap4 program of recompility, can realize finishing automatically the configuration of cap4 running environment; Extractcontigs: the file that the score matrix data of cap4 output is transferred to the fasta form; Im_delete: the database software for editing, can realize the deletion to any one sequence in the database; Im_insert: the database software for editing, can realize the insertion of sequence library is increased the operation of sequence; Im_retrieve: database software for editing, batch or single some sequence of obtaining in the large database; Pepsigp: the signalp software of recompility, program that originally can only single predicted signal peptide is improved, realize that automatization is predicted comprehensively in batches; Primers_for_fulllength_clone: batch primer-design software; Ps_signalp: data parser software, resolve the result data that the pepsigp program produces; Ps_scan: protein active site/functional domain analysis software; Translate: database software for editing, dna sequence dna translation program; Tt_comp_dna: database software for editing, dna sequence dna reverse complemental program; Tt_cycle: assistant software is mainly used in mating part and can not realizes that the program of automated operation realizes the comprehensive automation operation; Tt_fasty_1: improved fasty program, main purpose are to realize handled easily; Tt_get: DNA is carried out in the indexed data storehouse, protein sequence obtains the software of operation to interim not setting up; Tt_pblast:blastn result resolves software, a large amount of results is exported the realization machine analyze automatically; Tt_sub_seq: sequence editor's software, conveniently obtain certain fragment in the sequence; Tt_subseq_genome: genome sequence is carried out the software that fragment sequence obtains; Tt_tmpred: the protein sequence after the recompility is striden the diaphragm area forecasting software, and the sequence after the improvement can realize batch quantity analysis; Tt_tmpred_p: data parser software is exclusively used in and resolves tt_tmpred generation analysis result data; Tt_zip_2: the sequence software for editing is mainly used in two simple sequence fragments of merging, and filters out the repeating part between them; Biofaseqindex: the database software for editing, set up the program of index at the database of Fasta form; Biogbseqindex: the database software for editing, set up the program of index at the database of GenBank form; Gb2cds: the sequence software for editing, obtain the CDS sequence in the sequential file of GenBank form; Parser_bx: resolver software, to blastn, the software that the result of blastp blastx supervisor output resolves; Parser_fasta: resolver software, the software that the result that fasty comparison program is exported resolves; Rfetch: database manipulation software, directly obtain sequence data on the GenBank by network remote; 1fetch: database manipulation software, directly obtain sequence data software on the local data base by local network.
3, the described system platform of claim 2 is in the discovery of new gene and the application in the analysis.
4, the described application of claim 3, it is used for new gene discovery and the analysis of people, animal, plant, microorganism.
5, the new gene of two similar aPoA 1BP genes, BFC06016 and BFC06104 have the nucleotide sequence shown in Seq ID No.1 and the Seq ID No.3 respectively.
6, the application of the described new gene of claim 5 in the pharmaceutical field diagnosis.
7, application as claimed in claim 6 is characterized in that as the diagnosis relevant with cardiovascular disorder and the medicine or the medicine target gene of therapeutic purpose.
8, application as claimed in claim 7, wherein said medicine or medicine target are genomic medicine or gene therapy medicine target.
9, the protein of the described genes encoding of claim 5, it has the aminoacid sequence shown in Seq ID No.2 and the Seq ID No.4 respectively.
10, the described protein of claim 9 is preparing relevant medicine of cardiovascular disorder and the application in the diagnostic reagent.
CNA2006100893399A 2006-06-21 2006-06-21 Method for finding novel gene and computer system platform using same and novel gene Pending CN1884521A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CNA2006100893399A CN1884521A (en) 2006-06-21 2006-06-21 Method for finding novel gene and computer system platform using same and novel gene
CNA2007800202904A CN101460625A (en) 2006-06-21 2007-06-21 A method for identifying novel gene and the resulting novel genes
PCT/CN2007/070153 WO2008000186A1 (en) 2006-06-21 2007-06-21 A method for identifying novel gene and the resulting novel genes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006100893399A CN1884521A (en) 2006-06-21 2006-06-21 Method for finding novel gene and computer system platform using same and novel gene

Publications (1)

Publication Number Publication Date
CN1884521A true CN1884521A (en) 2006-12-27

Family

ID=37582826

Family Applications (2)

Application Number Title Priority Date Filing Date
CNA2006100893399A Pending CN1884521A (en) 2006-06-21 2006-06-21 Method for finding novel gene and computer system platform using same and novel gene
CNA2007800202904A Pending CN101460625A (en) 2006-06-21 2007-06-21 A method for identifying novel gene and the resulting novel genes

Family Applications After (1)

Application Number Title Priority Date Filing Date
CNA2007800202904A Pending CN101460625A (en) 2006-06-21 2007-06-21 A method for identifying novel gene and the resulting novel genes

Country Status (2)

Country Link
CN (2) CN1884521A (en)
WO (1) WO2008000186A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008000186A1 (en) * 2006-06-21 2008-01-03 Beijing Bioway-Fortune Research Center For Gene Drugs Ltd. A method for identifying novel gene and the resulting novel genes
CN101930502A (en) * 2010-09-03 2010-12-29 深圳华大基因科技有限公司 Method and system for detection of phenotype genes and analysis of biological information
CN103186716A (en) * 2011-12-29 2013-07-03 上海生物信息技术研究中心 Metagenomics-based unknown pathogeny rapid identification system and analysis method
CN105095623A (en) * 2014-05-13 2015-11-25 中国人民解放军总医院 Disease biomarker screening analysis method, disease biomarker screening analysis platform, server and disease biomarker screening analysis system
CN112585687A (en) * 2018-08-15 2021-03-30 齐默尔根公司 Bioaccessible predictive tool with biological sequence selection

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019155B (en) * 2017-09-30 2023-04-07 山西医科大学 MicroRNA omics data perturbation platform
WO2020081648A1 (en) * 2018-10-17 2020-04-23 Quest Diagnostics Investments Llc Genomic sequencing selection system
CN110033826B (en) * 2018-12-10 2023-08-08 上海派森诺生物科技股份有限公司 Analysis method applied to macrovirome high-throughput sequencing data
CN109785900B (en) * 2018-12-12 2023-05-23 上海派森诺生物科技股份有限公司 Microbial community functional gene analysis method based on protein sequence similarity
CN111199772B (en) * 2019-12-27 2023-05-23 上海派森诺生物科技股份有限公司 PEDV (porcine reproductive and respiratory syndrome Virus) genome analysis method based on second-generation sequencing
CN112750501B (en) * 2020-12-29 2024-04-02 上海派森诺生物科技股份有限公司 Optimized analysis method for macro virus group flow

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6633819B2 (en) * 1999-04-15 2003-10-14 The Trustees Of Columbia University In The City Of New York Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins
JP2004086568A (en) * 2002-08-27 2004-03-18 Hitachi Ltd New gene producing method and its program
CN1884521A (en) * 2006-06-21 2006-12-27 北京未名福源基因药物研究中心有限公司 Method for finding novel gene and computer system platform using same and novel gene

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008000186A1 (en) * 2006-06-21 2008-01-03 Beijing Bioway-Fortune Research Center For Gene Drugs Ltd. A method for identifying novel gene and the resulting novel genes
CN101930502A (en) * 2010-09-03 2010-12-29 深圳华大基因科技有限公司 Method and system for detection of phenotype genes and analysis of biological information
CN101930502B (en) * 2010-09-03 2011-12-21 深圳华大基因科技有限公司 Method and system for detection of phenotype genes and analysis of biological information
CN103186716A (en) * 2011-12-29 2013-07-03 上海生物信息技术研究中心 Metagenomics-based unknown pathogeny rapid identification system and analysis method
CN103186716B (en) * 2011-12-29 2017-02-08 上海生物信息技术研究中心 Metagenomics-based unknown pathogeny rapid identification system and analysis method
CN105095623A (en) * 2014-05-13 2015-11-25 中国人民解放军总医院 Disease biomarker screening analysis method, disease biomarker screening analysis platform, server and disease biomarker screening analysis system
CN105095623B (en) * 2014-05-13 2017-11-17 中国人民解放军总医院 Screening assays, platform, server and the system of disease biomarkers
CN112585687A (en) * 2018-08-15 2021-03-30 齐默尔根公司 Bioaccessible predictive tool with biological sequence selection

Also Published As

Publication number Publication date
WO2008000186A1 (en) 2008-01-03
CN101460625A (en) 2009-06-17
WO2008000186A8 (en) 2009-07-09

Similar Documents

Publication Publication Date Title
CN1884521A (en) Method for finding novel gene and computer system platform using same and novel gene
CN1048731C (en) Drug binding protein
CN1497049A (en) Androgen receptor compound-associated protein
CN1636058A (en) Novel alanine transaminase enzyme and methods of use
CN101037671A (en) Hybridoma cell line and anti-human erythrocyte surface H antigen monoclonal antibodies generated thereof
CN1916167A (en) Modulation of peroxisome proliferation-activated receptors
CN1102437A (en) Materials and methods for screening anti-osteoporosis agents
CN1592793A (en) Hepatocellular carcinoma-related genes and polypeptides, and method for detecting hepatocellular carcinomas
CN1609616A (en) Specific markers for diabetes
CN1850269A (en) Function of GPR39 gene for central nervous system of mammal and its use
CN1932016A (en) Polynucleotide affecting SRE activity and its coding polypeptides and use
CN1160370C (en) A novel human cell cysle control related protein and a sequence encoding the same
CN1769436A (en) Nanjing bass 3-hydroxyl-3-methyl glutaryl coenzyme A reductase protein encoding sequence
CN1199998C (en) Human protein with suppression to cancer cell growth and its coding sequence
CN1222616C (en) Novel human protein with cancer-inhibiting function and coding sequence thereof
CN1304425C (en) Fusion protein containing soluble tumor necrosis factor II type receptor and interleukin I receptor agonist IL1Ra its preparation process and medicine composition
CN1760363A (en) Coded sequence of reductase enzyme protein of eucommia 3-hydroxy-3-coenzyme of methyl glutaryl A
CN1241941C (en) Protein for promoting nerve differentiation and resisting cell death, and its coding gene
CN1177048C (en) Human protein with function of suppressing cancer cell growth and its coding sequence
CN1229386C (en) Novel human protein with function for suppressing cancer and coding sequence thereof
CN1169958C (en) Human protein able to suppress growth of cancer cells and its coding sequence
CN1199994C (en) New human protein with cancer cell growth inhibiting function and its coding sequence
CN1249082C (en) Apoptosis promoting gene BNIPL, coding albumen and uses thereof
CN1570138A (en) Cholelithiasis susceptibility detecting method and kit
CN1169833C (en) Human Protein with cancer inhibiting function and its coding sequence

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication