WO2002103954A2 - Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance - Google Patents
Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance Download PDFInfo
- Publication number
- WO2002103954A2 WO2002103954A2 PCT/US2002/019202 US0219202W WO02103954A2 WO 2002103954 A2 WO2002103954 A2 WO 2002103954A2 US 0219202 W US0219202 W US 0219202W WO 02103954 A2 WO02103954 A2 WO 02103954A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- features
- feature
- gene
- genes
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- FIG. 9 is an exemplary screen shot of an interface for the Gene Search Assistant application for bioinformatics for use in searching published information.
- the following sequence illustrates application of recursive feature elimination (RFE) to a SVM using the weight magnitude as the ranking criterion.
- SRS6 SRS is a program developed at the European Bioinfonnatics Institute for the indexing and cross-referencing of databases of textual information. It provides unified access to molecular biology databases, integration of analysis tools and advanced parsing tools for disseminating and reformatting information stored in ASCII text.
- Ranlced lists of features can also be visualized as a matrix of colored coefficients.
- the columns of the matrix represent all of the values a given feature takes across all patterns.
- the columns are ordered according to the feature ranlcing.
- the rows of the matrix may be ordered, for example, to group the examples of a same class together.
- a matrix can be transposed.
- One can also represent ranlced lists of feature subsets, particularly equivalent features, in this way. Nested subsets of features with cardinality increments of one can be visualized by printing the feature identifiers in the order that they are added to increase the cardinality of the feature subsets.
- the identifiers, or their background can then be optionally colored according to the score of the subset containing all the features from the beginning of the list to that feature.
- feature f a singleton
- color 1 illustrated as low density dots
- feature f 5 is filled indicated by a box filled with color 8 (illustrated as grid lines) to indicate the highest score.
- FIG. 14 illustrates the gene tree (observation graph) corresponding to the screen information in FIG. 11.
- This tree was generated from DNA microarray data of colon cancer and normal patients.
- Several runs using the RFE-SVM algorithm were used to generate alternative nested subsets of genes.
- the nodes are labeled with GANs.
- the quality of every subset of genes can be assessed, for example, by the success rate of a classifier trained with these genes.
- the shading (color) of the last node of a given path indicates the quality of the subset, hi the present example, a scale of 64 shades, or colors, was used to map the leave-one-out success rate.
- a binary tree of depth 4 is construed. This means that for every gene selection, only two alternatives are presented, and that up to four genes can be selected. Wider trees (with more children at every node) permit selection from a wider variety of genes. Deeper tree provide for selection of a larger number of genes.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002304006A AU2002304006A1 (en) | 2001-06-15 | 2002-06-17 | Data mining platform for bioinformatics and other knowledge discovery |
US10/481,068 US7444308B2 (en) | 2001-06-15 | 2002-06-17 | Data mining platform for bioinformatics and other knowledge discovery |
US11/928,641 US7542947B2 (en) | 1998-05-01 | 2007-10-30 | Data mining platform for bioinformatics and other knowledge discovery |
US13/079,198 US8126825B2 (en) | 1998-05-01 | 2011-04-04 | Method for visualizing feature ranking of a subset of features for classifying data using a learning machine |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US29875701P | 2001-06-15 | 2001-06-15 | |
US29886701P | 2001-06-15 | 2001-06-15 | |
US29884201P | 2001-06-15 | 2001-06-15 | |
US60/298,867 | 2001-06-15 | ||
US60/298,842 | 2001-06-15 | ||
US60/298,757 | 2001-06-15 |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/016012 Continuation-In-Part WO2002095534A2 (fr) | 1998-05-01 | 2002-05-20 | Procedes de selection de caracteristiques dans une machine a enseigner |
Related Child Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10481068 A-371-Of-International | 2002-06-17 | ||
US11/928,606 Continuation US7921068B2 (en) | 1998-05-01 | 2007-10-30 | Data mining platform for knowledge discovery from heterogeneous data types and/or heterogeneous data sources |
US11/928,641 Continuation US7542947B2 (en) | 1998-05-01 | 2007-10-30 | Data mining platform for bioinformatics and other knowledge discovery |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002103954A2 true WO2002103954A2 (fr) | 2002-12-27 |
WO2002103954A3 WO2002103954A3 (fr) | 2003-04-03 |
Family
ID=27404588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2002/019202 WO2002103954A2 (fr) | 1998-05-01 | 2002-06-17 | Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002304006A1 (fr) |
WO (1) | WO2002103954A2 (fr) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7243100B2 (en) * | 2003-07-30 | 2007-07-10 | International Business Machines Corporation | Methods and apparatus for mining attribute associations |
WO2008078293A1 (fr) * | 2006-12-22 | 2008-07-03 | International Business Machines Corporation | Procédé mis en oeuvre par ordinateur, programme d'ordinateur et système destiné à analyser des enregistrements de données |
WO2010072382A1 (fr) * | 2008-12-22 | 2010-07-01 | Roche Diagnostics Gmbh | Système et procédé d'analyse de données génomiques |
US10515715B1 (en) | 2019-06-25 | 2019-12-24 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
CN112116952A (zh) * | 2020-08-06 | 2020-12-22 | 温州大学 | 基于扩散及混沌局部搜索的灰狼优化算法的基因选择方法 |
US11521751B2 (en) * | 2020-11-13 | 2022-12-06 | Zhejiang Lab | Patient data visualization method and system for assisting decision making in chronic diseases |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6266668B1 (en) * | 1998-08-04 | 2001-07-24 | Dryken Technologies, Inc. | System and method for dynamic data-mining and on-line communication of customized information |
US20020052882A1 (en) * | 2000-07-07 | 2002-05-02 | Seth Taylor | Method and apparatus for visualizing complex data sets |
US20020083067A1 (en) * | 2000-09-28 | 2002-06-27 | Pablo Tamayo | Enterprise web mining system and method |
US20020095260A1 (en) * | 2000-11-28 | 2002-07-18 | Surromed, Inc. | Methods for efficiently mining broad data sets for biological markers |
US20020111742A1 (en) * | 2000-09-19 | 2002-08-15 | The Regents Of The University Of California | Methods for classifying high-dimensional biological data |
US20020120405A1 (en) * | 2000-09-27 | 2002-08-29 | Aled Edwards | Protein data analysis |
US20020119462A1 (en) * | 2000-07-31 | 2002-08-29 | Mendrick Donna L. | Molecular toxicology modeling |
US20020133504A1 (en) * | 2000-10-27 | 2002-09-19 | Harry Vlahos | Integrating heterogeneous data and tools |
US6470333B1 (en) * | 1998-07-24 | 2002-10-22 | Jarg Corporation | Knowledge extraction system and method |
US20020165845A1 (en) * | 2001-05-02 | 2002-11-07 | Gogolak Victor V. | Method and system for web-based analysis of drug adverse effects |
-
2002
- 2002-06-17 WO PCT/US2002/019202 patent/WO2002103954A2/fr not_active Application Discontinuation
- 2002-06-17 AU AU2002304006A patent/AU2002304006A1/en not_active Abandoned
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6470333B1 (en) * | 1998-07-24 | 2002-10-22 | Jarg Corporation | Knowledge extraction system and method |
US6266668B1 (en) * | 1998-08-04 | 2001-07-24 | Dryken Technologies, Inc. | System and method for dynamic data-mining and on-line communication of customized information |
US20020049704A1 (en) * | 1998-08-04 | 2002-04-25 | Vanderveldt Ingrid V. | Method and system for dynamic data-mining and on-line communication of customized information |
US20020052882A1 (en) * | 2000-07-07 | 2002-05-02 | Seth Taylor | Method and apparatus for visualizing complex data sets |
US20020119462A1 (en) * | 2000-07-31 | 2002-08-29 | Mendrick Donna L. | Molecular toxicology modeling |
US20020111742A1 (en) * | 2000-09-19 | 2002-08-15 | The Regents Of The University Of California | Methods for classifying high-dimensional biological data |
US20020120405A1 (en) * | 2000-09-27 | 2002-08-29 | Aled Edwards | Protein data analysis |
US20020083067A1 (en) * | 2000-09-28 | 2002-06-27 | Pablo Tamayo | Enterprise web mining system and method |
US20020133504A1 (en) * | 2000-10-27 | 2002-09-19 | Harry Vlahos | Integrating heterogeneous data and tools |
US20020095260A1 (en) * | 2000-11-28 | 2002-07-18 | Surromed, Inc. | Methods for efficiently mining broad data sets for biological markers |
US20020165845A1 (en) * | 2001-05-02 | 2002-11-07 | Gogolak Victor V. | Method and system for web-based analysis of drug adverse effects |
Non-Patent Citations (6)
Title |
---|
KEMP ET AL.: 'Using the functional data model to integrate distributed biological data sources' PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE SYSTEMS June 1996, pages 176 - 185, XP002958893 * |
MOORE S.K.: 'Harmonizing data, setting standards' GENOMICS INFORMATION SETS, IEEE SPECTRUM vol. 38, no. 1, January 2001, pages 111 - 112, XP002958891 * |
PAVLIDIS ET AL.: 'Gene functional classification from heterogeneous data' PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL BIOLOGY April 2001, pages 249 - 255, XP000988076 * |
SYED ET AL.: 'A study of support vectors on model independent example selection' PROCEEDINGS OF THE 5TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING July 1999, pages 272 - 276, XP002958894 * |
WALKER R.L.: 'Parallel clustering system using the methodologies of evolutionary computations' PROCEEDINGS OF THE 2001 CONGRESS ON EVOLUTIONARY COMPUTATION 2001, pages 831 - 838, XP002958892 * |
YANG ET AL.: 'Data-driven theory refinement algorithms for bioformatics' INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS July 1999, pages 4064 - 4068, XP010372571 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7243100B2 (en) * | 2003-07-30 | 2007-07-10 | International Business Machines Corporation | Methods and apparatus for mining attribute associations |
WO2008078293A1 (fr) * | 2006-12-22 | 2008-07-03 | International Business Machines Corporation | Procédé mis en oeuvre par ordinateur, programme d'ordinateur et système destiné à analyser des enregistrements de données |
US7953677B2 (en) | 2006-12-22 | 2011-05-31 | International Business Machines Corporation | Computer-implemented method, computer program and system for analyzing data records by generalizations on redundant attributes |
WO2010072382A1 (fr) * | 2008-12-22 | 2010-07-01 | Roche Diagnostics Gmbh | Système et procédé d'analyse de données génomiques |
US10839942B1 (en) | 2019-06-25 | 2020-11-17 | Colgate-Palmolive Company | Systems and methods for preparing a product |
US10839941B1 (en) | 2019-06-25 | 2020-11-17 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
US10515715B1 (en) | 2019-06-25 | 2019-12-24 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
US10861588B1 (en) | 2019-06-25 | 2020-12-08 | Colgate-Palmolive Company | Systems and methods for preparing compositions |
US11315663B2 (en) | 2019-06-25 | 2022-04-26 | Colgate-Palmolive Company | Systems and methods for producing personal care products |
US11342049B2 (en) | 2019-06-25 | 2022-05-24 | Colgate-Palmolive Company | Systems and methods for preparing a product |
US11728012B2 (en) | 2019-06-25 | 2023-08-15 | Colgate-Palmolive Company | Systems and methods for preparing a product |
CN112116952A (zh) * | 2020-08-06 | 2020-12-22 | 温州大学 | 基于扩散及混沌局部搜索的灰狼优化算法的基因选择方法 |
CN112116952B (zh) * | 2020-08-06 | 2024-02-09 | 温州大学 | 基于扩散及混沌局部搜索的灰狼优化算法的基因选择方法 |
US11521751B2 (en) * | 2020-11-13 | 2022-12-06 | Zhejiang Lab | Patient data visualization method and system for assisting decision making in chronic diseases |
Also Published As
Publication number | Publication date |
---|---|
AU2002304006A1 (en) | 2003-01-02 |
WO2002103954A3 (fr) | 2003-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8126825B2 (en) | Method for visualizing feature ranking of a subset of features for classifying data using a learning machine | |
US7444308B2 (en) | Data mining platform for bioinformatics and other knowledge discovery | |
JP7305656B2 (ja) | 確率分布をモデル化するためのシステムおよび方法 | |
Guyon et al. | An introduction to variable and feature selection | |
Malley et al. | Statistical learning for biomedical data | |
Srinivasu et al. | Using recurrent neural networks for predicting type-2 diabetes from genomic and tabular data | |
Yip et al. | A survey of classification techniques for microarray data analysis | |
Kumar et al. | A case study on machine learning and classification | |
WO2002103954A2 (fr) | Plate-forme d'exploration de donnees en bio-informatique et autres domaines de decouverte de connaissance | |
Sánchez-Maroño et al. | Classification of microarray data | |
Altınçay | Decision trees using model ensemble-based nodes | |
WO2022212337A1 (fr) | Techniques de base de données de graphes pour apprentissage automatique | |
Shaer et al. | Learning to increase the power of conditional randomization tests | |
AU2020101987A4 (en) | DIMA-Dataset Discovery: DATASET DISCOVERY IN DATA INVESTIGATIVE USING MACHINE LEARNING AND AI-BASED PROGRAMMING | |
Ni et al. | HEAL: Brain-inspired Hyperdimensional Efficient Active Learning | |
Tan et al. | Machine learning and its application to bioinformatics: an overview | |
Nilsson | Nonlinear dimensionality reduction of gene expression data | |
Mumbuçoğlu | Classification of microarray gene expression cancer data by using artificial intelligence methods | |
Sevilla-Villanueva | A methodology for pre-post intervention studies: An application for a nutritional case study | |
Sun | Improving classification performance of microarray analysis by feature selection and feature extraction methods | |
Young II | Disease endotypes of type 1 diabetes: Exploration through machine learning and topological data analysis | |
Joshi et al. | Smart Health Prediction System Using Data Mining | |
Sasirekha et al. | Identification and Classification of Leukemia Using Machine Learning Approaches | |
Troisi et al. | Data analysis in metabolomics: from information to knowledge | |
Thanigainathan | USING ENSEMBLE CLUSTERING TO IDENTIFY PHENOTYPES OF DIABETES PATIENTS FOR EVALUATING DISEASE PROGRESSION |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 69(1) EPC |
|
122 | Ep: pct application non-entry in european phase | ||
ENP | Entry into the national phase |
Ref document number: 2006064415 Country of ref document: US Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10481068 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 10481068 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |