CN100552686C - Gene detecting probe information annotate method - Google Patents

Gene detecting probe information annotate method Download PDF

Info

Publication number
CN100552686C
CN100552686C CNB2006100259721A CN200610025972A CN100552686C CN 100552686 C CN100552686 C CN 100552686C CN B2006100259721 A CNB2006100259721 A CN B2006100259721A CN 200610025972 A CN200610025972 A CN 200610025972A CN 100552686 C CN100552686 C CN 100552686C
Authority
CN
China
Prior art keywords
gene
information
probe
common source
detecting probe
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100259721A
Other languages
Chinese (zh)
Other versions
CN101063988A (en
Inventor
金刚
谢松旻
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institutes for Biological Sciences SIBS of CAS
Original Assignee
Shanghai Institutes for Biological Sciences SIBS of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institutes for Biological Sciences SIBS of CAS filed Critical Shanghai Institutes for Biological Sciences SIBS of CAS
Priority to CNB2006100259721A priority Critical patent/CN100552686C/en
Publication of CN101063988A publication Critical patent/CN101063988A/en
Application granted granted Critical
Publication of CN100552686C publication Critical patent/CN100552686C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a kind of gene detecting probe information annotate method, integration technology will be linked and data warehouse technology combines, solve the limitation that exists in traditional gene chip probes annotation system, made the number of the common source database that annotation system can comprise and Data Update time significantly increase.Its technical scheme is: the identification information that (1) will prepare the gene chip probes library is input to data warehouse; (2) corresponding relation of extraction gene probe identification information and common source Database Identification information; (3) set up link relevant according to corresponding relation, and directly extract the specifying information relevant in the common source database with gene chip probes by link with the common source database; (4) resolve specifying information and with its output.The present invention is applied to make up the biochip technology platform.

Description

Gene detecting probe information annotate method
Technical field
The present invention relates to a kind of construction method of biochip technology platform, relate in particular to a kind of information annotate method of gene chip probes.
Background technology
The appearance of genetic chip is the major progress that has characteristics of the times in recent years in the high-technology field, is the new and high technology that physics, microelectronics and molecular biology comprehensively intersect to form.Biochip technology is a kind of high-throughout technology, its ultimate principle is to be integrated with ten hundreds of dna probes by micro fabrication on the chip of centimeter square, realize mRNA and dna sequence dna are carried out the detection by quantitative of efficient quick, in the exploration of elaboration, disease reason and the mechanism of gene function, possible diagnosis and the applications such as discovery of treatment target spot, genetic chip is just being brought into play increasing purposes.
Because genetic chip has the characteristic of high flux and high information quantity, so its probe annotation system is a committed step that makes up the biochip technology platform.Genetic chip annotation system major function is gene probe ten hundreds of on the note chip, integrates the up-to-date relevant information about sequence, function and the metabolic pathway of gene, to satisfy the needs of genetic chip testing result automated analysis and gene chip probes design.Genetic chip annotation system famous on the our times has: the SOURCE system of people's inventions such as the DRAGON system of people's inventions such as the DAVID system of people such as the state-run health science Button of institute of U.S. invention, the U.S. Wilkinson of Johns Hopkins University and the invention Diehn of Stanford University.The ultimate principle of these systems all is to utilize data warehouse technology, by each common source database physics is integrated, sets up the record of " one-stop " of gene chip probes relevant information.
Yet this technology has significant limitation: this technology that is limited in of data warehouse maximum can not real-time update.Simultaneously since development of life science day the crescent benefit, the common source database is all included and upgrade a large amount of new information every day, whenever two to three months just upgrade once data warehouse technology and can not include up-to-date information with the integrator gene probe timely.With the DAVID system is example, and its note result just contains a large amount of useless URL (Universal Resource Location) in the inside, and these URL can not offer the correct note result of user.
The another one limitation of data warehouse annotate method is because the disunity of common source database data form, cause along with the number that comprises source database and data type is many more, the data warehouse management meeting becomes more and more difficult, so the annotation capability of data warehouse method is limited.With aforesaid another kind of probe annotation system DRAGON system is example, because the annotation capability of DRAGON system is limited, causes its can not note the most frequently used database GenBank and the information of LocusLink.
Because gene detecting probe information annotate is set up and the automatic importance analytically of chip detection result at the genetic chip platform, overcome the limitation of above-mentioned annotation system, set up more accurate, more perfect annotate method, be biochip technology field urgent problem.
Summary of the invention
The objective of the invention is to address the above problem, a kind of gene detecting probe information annotate method is provided, it has overcome the limitation of traditional gene chip probes annotation system, overcome the problem that the data warehouse remarking technology exists, integration helps the probe design of the automated analysis and the genetic chip of genetic chip testing result about the up-to-date information of sequence, function and the metabolic pathway of chip probe target spot gene.
Technical scheme of the present invention is: a kind of gene detecting probe information annotate method, comprising: the identification information in (1) preparation gene chip probes library is input to data warehouse; (2) extract the corresponding relation of described gene probe identification information and common source Database Identification information by the interface routine of data warehouse; (3) set up and relevant the linking of described common source database according to described corresponding relation, and directly extract specifying information relevant in the common source database with described gene chip probes by described link; (4) resolve described specifying information and with its output.
Above-mentioned gene detecting probe information annotate method, wherein, described data warehouse comprises Ensembl and the Uniport data warehouse of Entrez, the EBI of NCBI.
Above-mentioned gene detecting probe information annotate method, wherein, described interface comprises the E-Utilities interface of Entrez, the ensmart interface of Ensembl and the SRS interface of UniPort.
Above-mentioned gene detecting probe information annotate method, wherein, described specifying information comprises the gene-correlation information of probe correspondence, the gene coded protein relevant information of probe correspondence, the documentation ﹠ info of probe correspondence and the data that help the chip results automated analysis.
Above-mentioned gene detecting probe information annotate method, wherein, the identification information in preparation gene chip probes library is to land number, UniGene Cluster identifier or LocusLink identifier in the step (1).
Above-mentioned gene detecting probe information annotate method, wherein, in the step (3), described specifying information arrangement back exports file to text formatting.
Above-mentioned gene detecting probe information annotate method, wherein, in the step (3), the described URL that is linked as links.
Gene detecting probe information annotate method contrast prior art of the present invention has following beneficial effect.The present invention utilizes the corresponding relation that extracts various data types in the online data warehouse system, and then utilize link to integrate (link integration) technology and directly in the common source database, go to extract the specifying information relevant, thereby reach the function of note and integration with gene chip probes according to these corresponding relations.Because the technology of obtaining the common source database in real time that adopted is integrated in link, therefore can overcome the untimely problem of renewal that the data warehouse remarking technology exists.But the link integration technology has a major defect to be to be difficult to corresponding relation between the handle data structures, and data warehouse has perfect data type corresponding relation.The present invention will link integration technology and data warehouse technology combines, and has solved the limitation that exists in traditional gene chip probes annotation system, makes the number of the common source database that annotation system can comprise and Data Update time significantly increase.
Description of drawings
Fig. 1 is the process flow diagram of gene detecting probe information annotate method of the present invention.
Fig. 2 is the data flow diagram of gene detecting probe information annotate method of the present invention.
Embodiment
The invention will be further described below in conjunction with drawings and Examples.
Fig. 1 shows the flow process of gene detecting probe information annotate method of the present invention, and Fig. 2 shows the data stream of gene detecting probe information annotate method of the present invention.Please be detailed description below simultaneously referring to Fig. 1 and Fig. 2 to this method step.
Step S11: the identification number (ID) in input preparation gene chip probes library.The information of gene chip probes is landed number (Accession Number) as discernible unique ID with it usually, UniGeneCluster ID (hereinafter to be referred as UniGene ID) and LocusLink ID (being the Gene ID of Entrez Gene database) the more appearance because of the importance in chip information simultaneously also can be used as initial unique ID input.Gene chip probes ID input format can be defined as one of delegation, is the legibility that guarantees the note structure and correct, comprises that ID can read in the form of one of delegation in the text of ID.Here, we are example with P53 gene (famous tumor suppressor gene), import the ID:AF136271 of this probe.
Step S12: this ID is submitted in the data warehouse.These data warehouses comprise Ensembl and the Uniport data warehouse of Entrez, the EBI of NCBI.Collected the information of a large amount of common source databases in these data warehouses again, for example, Entrez has collected the information of MIM database, GO database and other databases, Ensembl has collected the information of Promotor database, and Uniport has collected the information of Pfam database, Prosite database and other databases.
Step S13: the corresponding relation that extracts probe I D and other common source databases ID by the interface of data warehouse successively.In above-mentioned data warehouse, Entrez provides a CGI interface that is called E-Utilities, can be to Entrez inquiry and data download by this interface.Same Ensembl provides ensmart interface, and Uniport provides SRS interface, offers the user and carries out personalization and large batch of inquiry and obtain information.Can utilize these interfaces to extract corresponding relation between the various ID.
Be example with the P53 gene described in the S11 still, the CGI interface of this probe I D by E-Utilities is delivered to Entrez UniGene inquiry, and the compiler by system extracts Unigene ID, obtains P53 UnigeneID number: Hs.408312.If do not have corresponding Unigene ID then be returned as " data do not find " printed words.
Unigene ID ordering back is delivered to Entrez Gene inquiry again by the CGI interface of E-Utilities, obtain the ID corresponding relation of other common source databases, these ID comprise the relevant common source database ID of range gene such as MIM ID, GO ID, PMID, RefseqID and CDD ID.
Extract corresponding relation between the ID of its storage with the Ensmart CGI interface of Unigene ID by Ensembl then, comprise Uniport ID, Ensembl ID etc.Here, the Ensembl ID of P53 correspondence number is ENSG00000141510, and corresponding Uniport ID is Q761V2.
Then extract the ID corresponding relation of the relevant common source database of various albumen such as Pfam ID, Prosite ID again by the SRS CGI interface of Uniport data warehouse with Uniport ID.
Step S14: according to the corresponding relation of each ID among the step S13, write the URL of each common source database regulation, utilize online database, long-range direct extraction relevant information.
Step S15: resolve and put in order the information that obtains among the step S14.After extracting each common source database data, utilize the text resolution device to extract the chip probe relevant information, these finish messages are become the gene-correlation information of (1) chip probe correspondence, comprising the descriptive and functional annotation of sequence, as the product of gene title, code name, accession number, pertinent literature, GO item, chromosome position, E.C. number and many this gene-correlation features and coding; (2) relevant information of the gene coded protein of probe correspondence comprises structural information, function information, classification, domain, conservative region and sequence die body (motif) information of albumen; (3) documentation ﹠ info of probe correspondence, comprising in the omim database about the documents and materials of individual gene relevant disease and quoted passage information, GeneRif about the ID that publishes an article on the Pubmed and the summary etc. of the meticulous compilation of documents and materials recently; (4) other help the significant data of chip results automated analysis, comprise can be used to predict promoter sequence in the Ensembl database gene in batches the probe note and be used for analyzing KEGG ID and the GO ID that relates to the path situation.
Step S16: the information among the output step S15.For the probe of short run, can directly demonstrate the note result; For large batch of probe, the note result can be preserved hereof.
Should understand, inventive point of the present invention is to carry out note in conjunction with linking integration and the data warehouse technology probe to genetic chip, the concrete data warehouse of mentioning in the foregoing description, common source database and concrete information such as gene probe is all for the example explanation, and is not used for limiting the present invention.
The foregoing description provides to those of ordinary skills and realizes or use of the present invention; those of ordinary skills can be under the situation that does not break away from invention thought of the present invention; the foregoing description is made various modifications or variation; thereby protection scope of the present invention do not limit by the foregoing description, and should be the maximum magnitude that meets the inventive features that claims mention.

Claims (7)

1 one kinds of gene detecting probe information annotate methods is characterized in that, comprising:
(1) identification information that will prepare the gene chip probes library is input to data warehouse;
(2) extract the corresponding relation of described gene probe identification information and common source Database Identification information by the interface routine of data warehouse;
(3) set up and relevant the linking of described common source database according to described corresponding relation, and directly extract specifying information relevant in the common source database with described gene chip probes by described link;
(4) resolve described specifying information and with its output.
2 gene detecting probe information annotate methods according to claim 1 is characterized in that, described data warehouse comprises Ensembl and the Uniport data warehouse of Entrez, the EBI of NCBI.
3 gene detecting probe information annotate methods according to claim 2 is characterized in that, described interface comprises the E-Utilities interface of Entrez, the ensmart interface of Ensembl and the SRS interface of UniPort.
4 gene detecting probe information annotate methods according to claim 1, it is characterized in that described specifying information comprises the gene-correlation information of probe correspondence, the gene coded protein relevant information of probe correspondence, the documentation ﹠ info of probe correspondence and the data that help the chip results automated analysis.
5 gene detecting probe information annotate methods according to claim 1 is characterized in that, the identification information in preparation gene chip probes library is to land number, UniGene Cluster identifier or LocusLink identifier in the step (1).
6 gene detecting probe information annotate methods according to claim 1 is characterized in that, in the step (3), described specifying information arrangement back exports file to text formatting.
7 gene detecting probe information annotate methods according to claim 1 is characterized in that, in the step (3), the described URL that is linked as links.
CNB2006100259721A 2006-04-24 2006-04-24 Gene detecting probe information annotate method Expired - Fee Related CN100552686C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100259721A CN100552686C (en) 2006-04-24 2006-04-24 Gene detecting probe information annotate method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100259721A CN100552686C (en) 2006-04-24 2006-04-24 Gene detecting probe information annotate method

Publications (2)

Publication Number Publication Date
CN101063988A CN101063988A (en) 2007-10-31
CN100552686C true CN100552686C (en) 2009-10-21

Family

ID=38965008

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100259721A Expired - Fee Related CN100552686C (en) 2006-04-24 2006-04-24 Gene detecting probe information annotate method

Country Status (1)

Country Link
CN (1) CN100552686C (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136780B (en) * 2019-05-14 2022-03-04 杭州链康医学检验实验室有限公司 Method for constructing probe specificity database based on comparison algorithm

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194201A1 (en) * 2001-06-05 2002-12-19 Wilbanks John Thompson Systems, methods and computer program products for integrating biological/chemical databases to create an ontology network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020194201A1 (en) * 2001-06-05 2002-12-19 Wilbanks John Thompson Systems, methods and computer program products for integrating biological/chemical databases to create an ontology network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
K2/Kleisli and GUS:Experiments in integrated access to genomic data sources. S.B. Davidson et al.IBM Systems Journal,Vol.40 No.2. 2001 *
WIT:integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Ross Overbeek et al.Nucleic Acids Research,Vol.28 No.1. 2000 *
生物数据仓库研究及应用. 丁建华,彭政,王飞.计算机工程与应用,第12期. 2005 *

Also Published As

Publication number Publication date
CN101063988A (en) 2007-10-31

Similar Documents

Publication Publication Date Title
Mi et al. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools
Camp et al. Mapping human cell phenotypes to genotypes with single-cell genomics
Dufayard et al. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases
Pearson Using the FASTA program to search protein and DNA sequence databases
Diehn et al. SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data
US20020150966A1 (en) Specimen-linked database
Rainer et al. CARMAweb: comprehensive R-and bioconductor-based web service for microarray data analysis
Yi et al. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data
Schneider et al. OMA Browser—exploring orthologous relations across 352 complete genomes
Whitham et al. Extending genomics to natural communities and ecosystems
Brigand et al. An open-access long oligonucleotide microarray resource for analysis of the human and mouse transcriptomes
Masseroli et al. GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining
Becich The role of the pathologist as tissue refiner and data miner: the impact of functional genomics on the modern pathology laboratory and the critical roles of pathology informatics and bioinformatics
Carroll et al. DNA reference alignment benchmarks based on tertiary structure of encoded proteins
Greenberg DNA microarray gene expression analysis technology and its application to neurological disorders
Zhang et al. Bioinformatics analysis of microarray data
Herwig et al. Statistical evaluation of differential expression on cDNA nylon arrays with replicated experiments
Gifford Blazing pathways through genetic mountains
Yue et al. A guidebook of spatial transcriptomic technologies, data resources and analysis approaches
Guo et al. ERGR: An ethanol-related gene resource
Chen et al. Minimal gene set discovery in single-cell mRNA-seq datasets with ActiveSVM
US20030027223A1 (en) Specimen-linked G protein coupled receptor database
CN100552686C (en) Gene detecting probe information annotate method
Powell Proteomics delivers on promise of cancer biomarkers
Mohapatra et al. Microarray data analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091021

Termination date: 20160424

CF01 Termination of patent right due to non-payment of annual fee