WO2014145234A2 - Systems and apparatus for integrated and comprehensive biomedical annotation of bioassay data - Google Patents

Systems and apparatus for integrated and comprehensive biomedical annotation of bioassay data Download PDF

Info

Publication number
WO2014145234A2
WO2014145234A2 PCT/US2014/029958 US2014029958W WO2014145234A2 WO 2014145234 A2 WO2014145234 A2 WO 2014145234A2 US 2014029958 W US2014029958 W US 2014029958W WO 2014145234 A2 WO2014145234 A2 WO 2014145234A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
database
information
biological
drug
Prior art date
Application number
PCT/US2014/029958
Other languages
French (fr)
Other versions
WO2014145234A3 (en
Inventor
Minzi Y. RUAN
Yuhong Yang
Jason J. Ruan
Shenzhi YU
Shaoquan JI
Original Assignee
Vigenetech, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vigenetech, Inc. filed Critical Vigenetech, Inc.
Publication of WO2014145234A2 publication Critical patent/WO2014145234A2/en
Publication of WO2014145234A3 publication Critical patent/WO2014145234A3/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies

Definitions

  • the invention generally relates to systems and apparatus for biological data analysis and annotation. More particularly, the invention relates to novel systems and apparatus for linking and integrating comprehensive functions of searching, recording, storage, organizing, classification, data reduction, retrieval, analysis and imaging of biological, medical and healthcare data.
  • Bio assays are biological testing tools for measuring the presence and/or
  • Bio assays generate analog or digital raw data, images, signals, spectra, or graphs from which useful patterns, trends, results and conclusions may be extracted.
  • the assays may fall in a variety of categories including genes, genetic mutations, gene expression and regulation, biological pathways as well as molecular interactions.
  • Typical biological assays include three stages: (1) selection and preparation of a biological sample (e.g., cells or tissues), (2) addition of biological or chemical agents or probes (e.g., proteins and organic molecules) to the sample, and (3) capture of the resulting responses or affects on the tested sample.
  • a biological sample e.g., cells or tissues
  • biological or chemical agents or probes e.g., proteins and organic molecules
  • analysis of bioassay data includes two steps.
  • the first (Step 1) is to detect and identify significant and noteworthy factors from insignificant and inconsequential ones mainly by statistical methods.
  • the second step (Step 2) is to understand and decode and analyze the collected and filtered information in a relevant biological context.
  • Step 1 Many commercial application systems or in-house built systems have been developed for bioassay data analysis. Most of them, however, suffer from the same problem: they are strong for the first step, but weak or non-existent for the second. As a result, biomedical researchers or drug developers have to spend a large amount of time performing the second step. For example, for bioassays using microarray technology, hundreds or even thousands of genes can be identified to be significant by statistical methods (Step 1). After that, in Step 2, researchers have to spend a large amount of time (from weeks to months) to compile these genes' biological functions, possible actions in human diseases, and roles in biological pathways. Once with such list available, they have to use their domain knowledge and make biomedical senses of the collected data. For Step 2, the conventional manual annotation is not only time consuming, it usually can only help scientists to see leaves or trees, but not the integrated forest. From our knowledge, there are no available tools that help researchers to understand the collected data in a biological context.
  • the invention provides unique systems and apparatus for linking and integrating vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has, for example, by using the biological annotation of bioassay data as a basic link, packing the available information in application based methods, presenting the data in an intuitive way, and analyzing the current data set with integration with available online information to better and more fully analyze the data and significance thereof.
  • the invention generally relates to a system for annotation of bioassay data comprising dynamically integrated functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.
  • the user's desktop or instrument computer is dynamically linked to and has direct access to the content of one or more remote databases.
  • the user's desktop or instrument computer dynamically retrieves content of one or more remote databases.
  • the user's desktop or instrument computer dynamically integrate local data content and content of one or more remote databases.
  • the local data content and remote data content are packaged to reflect biological properties, functions and profiles of analytes.
  • the local data content and remote data content are presented in tabular, map, color or grey barcodes reflecting biological properties, functions and profiles of analytes.
  • the local data content and remote data content are presented using color and/or gray barcodes.
  • the local data content and remote data content are analyzed using a 3-D image analysis tool.
  • the local data content and remote data content are analyzed using an artificial intelligent tool.
  • the bioassay data are analyzed using current pathway information dynamically obtained from remote database content.
  • the invention generally relates to a computer linked to an instrument control having post analysis software and through a web server connection to a plurality of remote information resources and databases on genome and biology via the Internet.
  • the invention generally relates to a method for integrating experimental data of multiple analytes from different technology and platform using gene symbols or
  • IDs for advanced data analysis comprising clustering, classification and statics analysis.
  • the invention generally relates to a method for using gene symbols or
  • the invention generally relates to a method for presenting
  • the barcode represents a specific pathway, disease, drug, SNP and sequence with defined order.
  • the barcodes are shaded, gray or colored, and/or having scoring colors representing the analyte experiment value, comprising bioassay experimental expression, bio content concentration, and gene regulation.
  • the invention generally relates to a method of data reduction and integrated analysis comprising applying clustering to biology grouped barcode data among multiple biology groups to perform 3D clustering analysis.
  • the method includes using known data remotely obtained from online resources and databases to run classification to train the computer for artificial intelligent algorithm or rule set.
  • the method performs classifying experimental data by the algorithm obtained from training.
  • the invention generally relates to a method for parsing and constructing a disease database for known human diseases as provided in OMIM and other public disease database, including: retrieving disease information from OMIM and other disease information data sources via an automated process; defining a database schema or structure for storing disease information; parsing and loading disease information data files into the above database schema; and performing corresponding indexing.
  • the invention generally relates to a method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, including: retrieving drug information from DrugBank and other drug information data sources via an automated process; defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and performing corresponding indexing.
  • the invention generally relates to a method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, including: retrieving SNP and sequence information from data sources via an automated process; defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and performing corresponding indexing.
  • the invention generally relates to a method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, including: packing the list of genes in a defined format; sending the list of genes to a web server using a defined protocol; performing a comprehensive searches; receiving the search results using a defined protocol; and unpacking the search results.
  • the invention generally relates to a method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server,
  • the invention generally relates to a method for searching a database for a given gene via web service and returning the results in a defined format, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a gene ontology (GO) information database, for example.
  • the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a gene ontology (GO) information database, for example.
  • the invention generally relates to a method for user interfacing (UI) for a summary view of multiple analytes in the client computer program or a web browser, wherein for each analyte, a list of top 5 (which can be any user selected value, e.g., 10) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs human SNPs and sequence is made in a tabular form.
  • UI user interfacing
  • the invention generally relates to a method for user interfacing for a view of single analyte in the client computer program, wherein a list of top 5 (which can be any user selected value, e.g., 10) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs, human SNPs and sequence of the analyte is made in a tabular form.
  • the invention generally relates to a method for user interfacing for a pathway map or other biology group map or form of multiple analytes, wherein in the pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results.
  • Various makings may be used such as with red marking for gene up regulation, and green for gene down regulation, and with the brightness of color matching the experimentally determined level in a quantitative way.
  • the invention generally relates to a method for user interfacing (UI) for a pathway view or other biology group structure view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols.
  • UI user interfacing
  • the invention generally relates to a method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding DrugBank and other public information resources web page of the drug.
  • the invention generally relates to a method for user interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the
  • corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding OMIM, HGMD and other disease resources web pages.
  • the invention generally relates to a method and user interfacing for a SNP view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNPs, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page.
  • the invention provides unique systems and apparatus for linking vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has.
  • the systems and apparatus of the invention use the biological annotation of bioassay data as a basic link, pack the available information in application based methods, present the data in an intuitive way, and analyze the current data set with integration with available online information, in order to reveal hidden and fundamental biology content and significance.
  • the invention provides unique systems and apparatus for biological annotation of bioassay data, which provide for integrated, comprehensive and contextual recording, storage, organizing, annotation, classification, retrieval, analysis and imaging of biological, medical and healthcare data.
  • biomarker refers to an indicator of a biological state. It is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.
  • a biomarker is a molecule that allows for the detection and isolation of a particular cell type.
  • a biomarker can be a traceable substance that is introduced into an organism as a means to examine organ function or other aspects of health.
  • a biomarker being measured or analyzed is also referred to as an
  • the system of the invention allows one to make a dynamic link directly between a biologist's or a bench-top instrument's computer to all relevant information, data bases and resources in the public domain.
  • information, data bases and resources are available before and after an experiment and data analysis.
  • the information can be packed in various ways give the researcher an overview as well as guidance regarding his experiment and analysis.
  • the information can be presented such as to make it easier to see both the details and the big picture in the larger the context, and the collective information rather than just individual data point or result, which enables one to group data based on pathways, diseases, drugs and genetics.
  • the information and data can be packaged and presented in any suitable form, for example, in tabular, map, color or grey barcodes with the group based biological functions and profiles.
  • Biology guided data clustering based on color or gray barcodes can be processed by advanced image analysis and artificial intelligent tools to achieve higher level data reduction and reveal otherwise hard to deduct insight.
  • the system allows a user to set up and perform the analysis on the user's and bench-top or desk computer, where he can have at his disposal all relevant, available information via online access. For example, this allows one to perform clustering analysis and use current pathway information to guide the classification.
  • the system also can accommodate different operating systems and platforms as they will be running same API, which lets each individual computer run much powerful programs without worrying about a computer's OS is and capacity.
  • the system is built to make a way to connect (1) program run on individual PC to connect public available data on the Internet or for user with a set of data from a web browser to search biology information on the Internet.
  • Appliction Web Server will contain database for packet biology information as well as algorithm to form packed information dynamically for specific applcications.
  • User PC has database to store the packed biology information associated with his own expreiment and data set.
  • the public donmain database will be connected for the application with Web App Server supports, such as GO, Pathway or ⁇ .
  • FIG. 3 An exemplary main work flow is schematically depicted in FIG. 3.
  • FIG. 4 shows an exemplary data system for gene annotaion linking to DNA, protein or other types of biological assay.
  • GO database represents public online Genome Ontology annotation database. Data flow allows assay annotaion built on other biological data bases and pathway, drug, desease and sequencing data and other biology information.
  • the analyte data can be grouped according to pathway, desease, drug, sequece or certain biology function or catogories.
  • the data can be presented in tabular forms and maps, such as pathway map to show what analytes belong to the same pathway or group.
  • All analyte data obtained from experiments, databases, or other resources will be placed into groups, such as same pathway, same biology function association group and will be listed, for example, in a defined order in a barcode.
  • Each analyt will be represented as one bar in the barcoder. For example, as shown in FIG.
  • an analyt can be reprented in a barcode of (1) a color or (2) gray, with the color or gray shades representing a value of the analyte, such as the degree of expression level or analyte concentraion.
  • the analyt value also can be represented in (3) discreted color, such as scoring.
  • a set of supervised clustering analysis can be performed for further data reduction, guided by pathway, biological function and genetic information.
  • the clustering analysis can run similar barcode data across multiple pathways, diseases and biological groups, and to perform biological supervised 3D clustering analysis.
  • artificial intelligence classification methods can be applied to barcode images or data sets to perform finger print analysis and biology measurement for further data reduction to uncover insights and patterns.
  • An exemplary computer application system of the invention includes a Windows client program or a Web App, a MySQL, SQL or Oracle database, and a set of web services.
  • a system provides integrated and comprehensive biomedical annotation and application based data searching and packing method and rules of bioassay data.
  • bioassay data such as microarray and tissue array
  • hundreds or even thousands of genes can be identified to be significant for biomedical research goals.
  • the system disclosed herein will search and identify major and important facts and knowledge about these genes, present the findings in a user friendly and effective way, and perform novel data analyses by integrating statistical tools and biomedical knowledge.
  • FIG. 1 depicts a basic architecture of an exemplary system according to the invention.
  • This exemplary system includes two major components: Application or App sever and client program.
  • This server can run on Linux or Microsoft platform, include a MySQL , SQL or Oracle relational database, and provide a number of web services.
  • the web services can be written in computer language C#, Java and other web service capable language deployed in either Tomcat (http://tomcat.apache.org/) as SOAP (Simple Object Access Protocol) service or customized web service archtechture frame, which can be accessible to variety of computer platforms including Microsoft Windows, Linux, Mac OS X and mobile portable systems.
  • image analysis programs written in Microsoft C# and running on Microsoft Windows machines are able to remotely invoke such web service or SOAP services on such Web App Server and request application based defined services.
  • a major function of the Web App Server is to take a gene list request from client program, search public biomedical databases on the Internet with application based rules, filters and algorithm integrate and packing the search results in a special clear format, and send the obtained annotations of genes back to the client program.
  • one web service can find its gene ontology annotations, which include a gene's possible biological function, biological process, and cellular component. This web service can search QuickGO's web site
  • the second web service can search NCBI's gene database
  • Gene ID is an identified unique to each gene.
  • the third web service can search KEGG's pathway
  • the fourth web service can search OMIM
  • the fifth web service can search Drugbank database
  • the sixth web service can search NCBI's SNP database
  • the seventh web service goes through each KEGG pathway html pages and look for whether it contains the genes in the gene list. If not, this page is ignored. If yes, it can cache the HTML content in the Web App Server database.
  • the search results can be cached or temporary saved in the Web App Server database.
  • the Web App Server can check and see if the biological annotations are already available. If yes, it can directly retrieve such information from the Web App Server database. Otherwise, web services can be invoked to search and save the
  • the second major component of the contextual bioassay data analysis system of the invention is the client computer program.
  • This program can be written in Microsoft C# and other programming language and deployed on Microsoft Windows platform, or HTML/Javascript and deployed on any platform with a modern web browser, and it includes several major parts.
  • FIG. 8 schematically illustrates an exemplary server database scheme.
  • the first part provides a summary view for a list of genes.
  • the sample is usually called analyte, and it can represent a gene, for example.
  • This part or computer graphic user interface takes a list of analytes, which are found to be statistically significant from preliminary statistical data analysis, sends the list to Web App Server via an internet standard protocol such as SOAP protocol, retrieve the corresponding major biological annotation, and then present researchers with a summary table.
  • GUI computer graphic user interface
  • each row represents a gene or analyte
  • each column can represent the top numbers of entries of the corresponding annotation, for example, top 5 KEGG pathways.
  • the second part of the client program component provides in-depth, detailed and specific view of a gene list to identify possible contextual relationship between genes.
  • KEGG biological pathway we demonstrate here using KEGG biological pathway here. The idea and computer program can be applicable for other annotations as well.
  • this table provides a quick view of possible contextual relationship between genes.
  • Analyte 1 appears in many pathways, and Analyte 8 only in one. This is a good indication that Analyte 8 is more specific, and it may be biologically more interesting if research looking for high specificity disease biomarker.
  • Analytes 6 and 7 share two pathways, and it may indicate that they are linked. Other meaningful inferences can be made by researchers allow for productive for bioassay data analysis.
  • the third part of the client program sends Web App Server a list of genes and request for html contents and images of possible biological pathways, which contains these genes.
  • these three genes (BRCA1, FA 1, MLH1) happen to be in the same fanconi anemia pathway.
  • This program downloads the image map of this pathway from KEGG
  • this program colors BRCA1 and FA 1 red and MLH1 blue in the pathway map according the flowchart shown in FIG. 11.
  • This program finds all genes in the gene list in the pathway map, and highlights the gene name that can label in a rectangular text box. (FIG. 10) If the pathway map is from a public database, downloaded as web html format, the program can just use the map and only change the color of the gene name label shaded (FIG. 12) with a level representing the expression, intensity, concentration and/or other experimental measurements of the gene. The shades can be gray or color or in distinct scoring color to visually display in image of the experiment values. After that, this program produces a modified image and presents it to researchers. (FIG. 5 and FIG. 16) The system of the invention greatly helps researchers in making sense out of his microarray experiment results.
  • search rule-algorithm set can expend to more application related database and data source. Only need is create another search criteria, resource link, and rules in application based table. New data can be linked to client program similar as Pathway method described in previous session.
  • other pathway databases can be employed such as Science magazine's biological signal pathway map and Wikipathway map and other biomedical knowledge such as biological function, biological process, biological component, and disease network.
  • the knowledge and the corresponding representation does not need to be static image maps like the above example. It can be a dynamically generated (via computer programs or other devices) plot or a network with connected nodes.
  • the invention generally relates to a system for annotation of bioassay data comprising dynamically integrated functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.
  • the invention generally relates to a method for parsing and constructing a disease database for known human diseases as provided in OMIM and other public disease database, including: retrieving disease information from OMIM and other disease information data sources via an automated process; defining a database schema or structure for storing disease information; parsing and loading disease information data files into the above database schema; and performing corresponding indexing.
  • the invention generally relates to a method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, including: retrieving drug information from DrugBank and other drug information data sources via an automated process; defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and performing corresponding indexing.
  • the invention generally relates to a method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, including: retrieving SNP and sequence information from data sources via an automated process; defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and performing
  • the invention generally relates to a method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, including: packing the list of genes in a defined format; sending the list of genes to a web server using a defined protocol; performing a comprehensive searches; receiving the search results using a defined protocol; and unpacking the search results.
  • the invention generally relates to a method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server,
  • the invention generally relates to a method for searching a database for a given gene via web service and returning the results in a defined format, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a GO (gene ontology) information database, for example.
  • the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a GO (gene ontology) information database, for example.
  • the invention generally relates to a method for user interfacing (UI) for a summary view of multiple analytes in the client computer program or client computer's web browser interface, wherein for each analyte, a list top numbers of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs, human SNPs and sequence is made in a tabular form. (FIG. 17)
  • the invention generally relates to a method for user interfacing for a view of single analyte in the client computer program or client computer's web browser interface, wherein a list of top numbers (e.g., 5) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs of the analyte is made in a tabular form.
  • top numbers e.g., 5
  • the invention generally relates to a method for user interfacing for a pathway map of multiple analytes, wherein in the pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results (e.g., with red marking for gene up regulation, and green for gene down regulation, and with the brightness of color shade or different scoring color matching the experimentally determined level in a quantitative way (FIG. 12).
  • the invention generally relates to a method for user interfacing for a biology group or category view, like pathway view, drug list view, disease view and SNP, sequencing view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the
  • corresponding analyte belongs to the corresponding such biology criteria or category like pathway, diseas, drug and sequencign and wherein clicking the pathway column brings up the corresponding detail map or link of all relevant genes in such group, like a pathway map with highlight of gene symbols.
  • the invention generally relates to a method for user interfacing for a pathway view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the
  • the invention generally relates to a method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding a set of web links to certain defined public drug information database (e.g.,
  • the invention generally relates to a method for user interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the
  • the invention generally relates to a method and user interfacing -for a SNP sequence view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNP and sequences, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page.
  • the invention generally relates to a method for user interfacing for sample with an integrated biology content results view of pathway, drug, disease view and SNP, sequencing of multiple relevant analytes
  • the UI is a table with rows representing samples and result integrated data columns representing different biology groups, like different pathways, and wherein the UI shows a set of columns for each biological group (e.g., shown in (2) in FIG. 16)
  • the UI shows a set of columns for each biological group (e.g., shown in (2) in FIG. 16)
  • one, (3) lists pathway relevant analyte's gene symbles in a defined order.
  • One column, (4) lists a barcode which list a bar for each relevant analyte in this pathway.
  • the shad of the bar or color bar represent the intensity or biology experiment value for this analyte.
  • One column, (5) lists the result of integrated data reduction value, like a finger print analysis results, classification scoring or biology qualidation measurement, like level of the toxicities or phase of disease.
  • the invention generally relates to a method for data structure or format which group multiple data results of different analyte into barcode image format, shown as (4) in FIG. 16.
  • the invention generally relates to a method for data grouping in biological categories (e.g., shown as (2) in the FIG. 16). It can group the sample data, (1) in FIG. 16, into different group with relevant analyte measurements, then assess the data collective according to the biological group as an barcode, as piece of image digital data, or finger print. Then run the analysis on these collective data among those biology groups, for example, a group of different pathways, assess the collective impact among different biology groups, or pathways.
  • Such method construct a 3 dimensional analysis, like a 3-D clustering,
  • the invention generally relates to a process method for plug in the application based algorithm to run specific integrated analysis, for example to differentiate type sof cancers, estimate disease phases or stages, measure drug toxicity levels, shown as (5) in FIG. 16.
  • the invention generally relates to a process method for comparing the data online to analyze experiment data intelligently. For example, from public domain and gather a lot of analyte's data for different cancers, integrates data into the barcode or an integrated data set, then use it to train the computer program artificial function, further to apply such intelligence to classify the experiment data.
  • the singular forms "a,” “an,” and “the” include plural reference, unless the context clearly dictates otherwise.

Abstract

The invention provides unique systems and apparatus for linking and integrating vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has, for example, by using the biological annotation of bioassay data as a basic link, packing the available information in application based methods, presenting the data in an intuitive way, and analyzing the current data set with integration with available online information to better and more fully analyze the data and significance thereof.

Description

SYSTEMS AND APPARATUS FOR INTEGRATED AND COMPREHENSIVE
BIOMEDICAL ANNOTATION OF BIOASSAY DATA
Priority Claims and Related Patent Applications
[0001] This application claims the benefit of priority from U.S. Provisional Application Serial No. 61/799,906, filed on March 15, 2013, the entire content of which is incorporated herein by reference in its entirety.
Technical Fields of the Invention
[0001] The invention generally relates to systems and apparatus for biological data analysis and annotation. More particularly, the invention relates to novel systems and apparatus for linking and integrating comprehensive functions of searching, recording, storage, organizing, classification, data reduction, retrieval, analysis and imaging of biological, medical and healthcare data.
Background of the Invention
[0002] Biological assays are biological testing tools for measuring the presence and/or
concentration of biologically relevant markers or pharmaceutical substances in a patient or research sample or specimen. Detailed information on the biological markers or drug substances can be obtained, which can guide biological research, drug development, as well as disease prevention, diagnosis and prognosis.
[0003] High throughput technologies generate enormous amount of data, especially in genomics and biology. Many government supported projects and databases have vast information with the volume and variety of available information rapidly increase every day. This vast collection of data is very useful but is difficult for a researcher to utilize. Currently, one must put in significant effort and time to gain access to and utilize the information at a variety of sources. Additionally, such information is not in an integrated format and difficult to use. It would be a major technological advancement if a user can have an easy method to have connection and access to the vast online resources and databases. Unfortunately, at the present day the bioassay data analyses are
disconnected with various online information and resources. Data review and analyses are typically done without direct reference to other relevant and pertinent information, which makes it very hard to put an individual data point in the context of as much as possible all the currently available information and resources. [0004] Biological assays generate analog or digital raw data, images, signals, spectra, or graphs from which useful patterns, trends, results and conclusions may be extracted. The assays may fall in a variety of categories including genes, genetic mutations, gene expression and regulation, biological pathways as well as molecular interactions. Typical biological assays include three stages: (1) selection and preparation of a biological sample (e.g., cells or tissues), (2) addition of biological or chemical agents or probes (e.g., proteins and organic molecules) to the sample, and (3) capture of the resulting responses or affects on the tested sample.
[0005] In the past, one biomarker was usually tested at a time for a sample to generate one data point. During the past decade, thanks to the widespread application of robotics and the exponentially increasing computational power, tens to thousands of biomarkers can be tested for one sample at the same time to provide large amount of information about the sample, which may be used for research and medical purposes.
[0006] Typically, analysis of bioassay data includes two steps. The first (Step 1) is to detect and identify significant and noteworthy factors from insignificant and inconsequential ones mainly by statistical methods. The second step (Step 2) is to understand and decode and analyze the collected and filtered information in a relevant biological context.
[0007] Many commercial application systems or in-house built systems have been developed for bioassay data analysis. Most of them, however, suffer from the same problem: they are strong for the first step, but weak or non-existent for the second. As a result, biomedical researchers or drug developers have to spend a large amount of time performing the second step. For example, for bioassays using microarray technology, hundreds or even thousands of genes can be identified to be significant by statistical methods (Step 1). After that, in Step 2, researchers have to spend a large amount of time (from weeks to months) to compile these genes' biological functions, possible actions in human diseases, and roles in biological pathways. Once with such list available, they have to use their domain knowledge and make biomedical senses of the collected data. For Step 2, the conventional manual annotation is not only time consuming, it usually can only help scientists to see leaves or trees, but not the integrated forest. From our knowledge, there are no available tools that help researchers to understand the collected data in a biological context.
[0008] Thus, there remains a need for novel systems and apparatus for connected and integrated and comprehensive searching, retrieval, organizing, classification, analysis, storage and presentation of biological, medical and healthcare data.
Summary of the Invention [0009] The invention provides unique systems and apparatus for linking and integrating vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has, for example, by using the biological annotation of bioassay data as a basic link, packing the available information in application based methods, presenting the data in an intuitive way, and analyzing the current data set with integration with available online information to better and more fully analyze the data and significance thereof.
[0010] In one aspect, the invention generally relates to a system for annotation of bioassay data comprising dynamically integrated functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.
[001 1] In certain embodiments, the user's desktop or instrument computer is dynamically linked to and has direct access to the content of one or more remote databases.
[0012] In certain embodiments, the user's desktop or instrument computer dynamically retrieves content of one or more remote databases.
[0013] In certain embodiments, the user's desktop or instrument computer dynamically integrate local data content and content of one or more remote databases.
[0014] In certain embodiments, the local data content and remote data content are packaged to reflect biological properties, functions and profiles of analytes.
[0015] In certain embodiments, the local data content and remote data content are presented in tabular, map, color or grey barcodes reflecting biological properties, functions and profiles of analytes.
[0016] In certain embodiments, the local data content and remote data content are presented using color and/or gray barcodes.
[0017] In certain embodiments, the local data content and remote data content are analyzed using a 3-D image analysis tool.
[0018] In certain embodiments, the local data content and remote data content are analyzed using an artificial intelligent tool. [0019] In certain embodiments, the bioassay data are analyzed using current pathway information dynamically obtained from remote database content.
[0020] In another aspect, the invention generally relates to a computer linked to an instrument control having post analysis software and through a web server connection to a plurality of remote information resources and databases on genome and biology via the Internet.
[0021 ] In yet another aspect, the invention generally relates to a method for integrating experimental data of multiple analytes from different technology and platform using gene symbols or
IDs for advanced data analysis comprising clustering, classification and statics analysis.
[0022] In yet another aspect, the invention generally relates to a method for using gene symbols or
IDs to group experimental data into pathways, diseases, drugs, SNPs and sequences and/or further biology groups using gene symbols or IDs as a factor in grouping the experimental data.
[0023] In yet another aspect, the invention generally relates to a method for presenting
experimental data from a bioassay sample into a barcode like image, wherein each bar represents one analyte.
[0024] In certain embodiments, the barcode represents a specific pathway, disease, drug, SNP and sequence with defined order.
[0025] In certain embodiments, the barcodes are shaded, gray or colored, and/or having scoring colors representing the analyte experiment value, comprising bioassay experimental expression, bio content concentration, and gene regulation.
[0026] In yet another aspect, the invention generally relates to a method of data reduction and integrated analysis comprising applying clustering to biology grouped barcode data among multiple biology groups to perform 3D clustering analysis.
[0027] In certain embodiments, the method includes using known data remotely obtained from online resources and databases to run classification to train the computer for artificial intelligent algorithm or rule set.
[0028] In certain embodiments, the method performs classifying experimental data by the algorithm obtained from training.
[0029] In yet another aspect, the invention generally relates to a method for parsing and constructing a disease database for known human diseases as provided in OMIM and other public disease database, including: retrieving disease information from OMIM and other disease information data sources via an automated process; defining a database schema or structure for storing disease information; parsing and loading disease information data files into the above database schema; and performing corresponding indexing.
[0030] In yet another aspect, the invention generally relates to a method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, including: retrieving drug information from DrugBank and other drug information data sources via an automated process; defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and performing corresponding indexing.
[0031 ] In yet another aspect, the invention generally relates to a method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, including: retrieving SNP and sequence information from data sources via an automated process; defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and performing corresponding indexing.
[0032] In yet another aspect, the invention generally relates to a method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, including: packing the list of genes in a defined format; sending the list of genes to a web server using a defined protocol; performing a comprehensive searches; receiving the search results using a defined protocol; and unpacking the search results.
[0033] In yet another aspect, the invention generally relates to a method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server,
comprising: unpacking the list of genes; calling corresponding web service to perform the specific search for each gene; processing, combining and packing the search results; and
sending the search results back to the client program.
[0034] In yet another aspect, the invention generally relates to a method for searching a database for a given gene via web service and returning the results in a defined format, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a gene ontology (GO) information database, for example.
[0035] In yet another aspect, the invention generally relates to a method for user interfacing (UI) for a summary view of multiple analytes in the client computer program or a web browser, wherein for each analyte, a list of top 5 (which can be any user selected value, e.g., 10) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs human SNPs and sequence is made in a tabular form.
[0036] In yet another aspect, the invention generally relates to a method for user interfacing for a view of single analyte in the client computer program, wherein a list of top 5 (which can be any user selected value, e.g., 10) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs, human SNPs and sequence of the analyte is made in a tabular form.
[0037] In yet another aspect, the invention generally relates to a method for user interfacing for a pathway map or other biology group map or form of multiple analytes, wherein in the pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results. Various makings may be used such as with red marking for gene up regulation, and green for gene down regulation, and with the brightness of color matching the experimentally determined level in a quantitative way.
[0038] In yet another aspect, the invention generally relates to a method for user interfacing (UI) for a pathway view or other biology group structure view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols.
[0039] In yet another aspect, the invention generally relates to a method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding DrugBank and other public information resources web page of the drug.
[0040] In yet another aspect, the invention generally relates to a method for user interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the
corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding OMIM, HGMD and other disease resources web pages.
[0041] In yet another aspect, the invention generally relates to a method and user interfacing for a SNP view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNPs, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page.
Detailed Description of the Invention
[0042] The invention provides unique systems and apparatus for linking vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has. The systems and apparatus of the invention use the biological annotation of bioassay data as a basic link, pack the available information in application based methods, present the data in an intuitive way, and analyze the current data set with integration with available online information, in order to reveal hidden and fundamental biology content and significance. The invention provides unique systems and apparatus for biological annotation of bioassay data, which provide for integrated, comprehensive and contextual recording, storage, organizing, annotation, classification, retrieval, analysis and imaging of biological, medical and healthcare data.
[0043] As used herein, the term "biological marker" or "biomarker", refers to an indicator of a biological state. It is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. In cell biology, for example, a biomarker is a molecule that allows for the detection and isolation of a particular cell type. In medicine, for example, a biomarker can be a traceable substance that is introduced into an organism as a means to examine organ function or other aspects of health. In the context of bioassays, a biomarker being measured or analyzed is also referred to as an
"analyte".
[0044] The system of the invention allows one to make a dynamic link directly between a biologist's or a bench-top instrument's computer to all relevant information, data bases and resources in the public domain. Such information, data bases and resources are available before and after an experiment and data analysis. The information can be packed in various ways give the researcher an overview as well as guidance regarding his experiment and analysis.
[0045] The information can be presented such as to make it easier to see both the details and the big picture in the larger the context, and the collective information rather than just individual data point or result, which enables one to group data based on pathways, diseases, drugs and genetics. The information and data can be packaged and presented in any suitable form, for example, in tabular, map, color or grey barcodes with the group based biological functions and profiles. Biology guided data clustering based on color or gray barcodes can be processed by advanced image analysis and artificial intelligent tools to achieve higher level data reduction and reveal otherwise hard to deduct insight. Thus, the system allows a user to set up and perform the analysis on the user's and bench-top or desk computer, where he can have at his disposal all relevant, available information via online access. For example, this allows one to perform clustering analysis and use current pathway information to guide the classification. The system also can accommodate different operating systems and platforms as they will be running same API, which lets each individual computer run much powerful programs without worrying about a computer's OS is and capacity.
[0046] As schematically illustrated in FIG. 1, the system is built to make a way to connect (1) program run on individual PC to connect public available data on the Internet or for user with a set of data from a web browser to search biology information on the Internet.
[0047] Appliction Web Server will contain database for packet biology information as well as algorithm to form packed information dynamically for specific applcications. (FIG. 2) User PC has database to store the packed biology information associated with his own expreiment and data set. The public donmain database will be connected for the application with Web App Server supports, such as GO, Pathway or ΟΜΓΝ.
[0048] An exemplary main work flow is schematically depicted in FIG. 3.
[0049] FIG. 4 shows an exemplary data system for gene annotaion linking to DNA, protein or other types of biological assay. GO database represents public online Genome Ontology annotation database. Data flow allows assay annotaion built on other biological data bases and pathway, drug, desease and sequencing data and other biology information.
[0050] As shown in FIGs. 5, 6 and 7, the analyte data can be grouped according to pathway, desease, drug, sequece or certain biology function or catogories. The data can be presented in tabular forms and maps, such as pathway map to show what analytes belong to the same pathway or group. All analyte data obtained from experiments, databases, or other resources will be placed into groups, such as same pathway, same biology function association group and will be listed, for example, in a defined order in a barcode. Each analyt will be represented as one bar in the barcoder. For example, as shown in FIG. 5, an analyt can be reprented in a barcode of (1) a color or (2) gray, with the color or gray shades representing a value of the analyte, such as the degree of expression level or analyte concentraion. The analyt value also can be represented in (3) discreted color, such as scoring.
Differtent levels of analyt values will be shown in different colors. A set of supervised clustering analysis can be performed for further data reduction, guided by pathway, biological function and genetic information. The clustering analysis can run similar barcode data across multiple pathways, diseases and biological groups, and to perform biological supervised 3D clustering analysis.
Furthermore, artificial intelligence classification methods can be applied to barcode images or data sets to perform finger print analysis and biology measurement for further data reduction to uncover insights and patterns.
[0051 ] In another aspect, and to use an example to help explain the challenges the unique system of the invention intend to address, let us explain a typical microarray bioassay briefly. We human beings have about 22,000 genes. Commercial companies sell a device with information about these 22,000 genes. This device is able to detect how much each gene is present in a biological sample such as liver tissue. Let us say a researcher wants to compare the gene expression difference between liver cancer cells and normal cells. He could buy such a device from a vendor and then measure the expression level of the 22,000 genes. Then he will compare the gene expression differences between cancer and normal cells. For demonstrational purpose, let us assume that he used statistical tool and found that three genes are significantly different between tumor and normal cells: BRCA1, FA 1, and MLH1. Among the three genes, BRCA1 and FA 1 are higher in tumor than normal cells, and MLH1 lower. With these three genes, this researcher has to do two things. First, find known knowledge about these three genes. There are quite a few web sites he could use such as NCBI (www.ncbi.nlm.nih.gov) or Genecard (www. genecards . or g) . For three genes, this should be trivial, but it is totally another story for 1,000 genes, which is not uncommon. Second, he has to understand the biological context and meaning of this finding. For this part, no commercial tools, to our knowledge, exist for help.
[0052] An exemplary computer application system of the invention includes a Windows client program or a Web App, a MySQL, SQL or Oracle database, and a set of web services. Such a system provides integrated and comprehensive biomedical annotation and application based data searching and packing method and rules of bioassay data. In a typical statistical analysis of bioassay data, such as microarray and tissue array, hundreds or even thousands of genes can be identified to be significant for biomedical research goals. Given such lists of interested genes, the system disclosed herein will search and identify major and important facts and knowledge about these genes, present the findings in a user friendly and effective way, and perform novel data analyses by integrating statistical tools and biomedical knowledge.
[0053] For example, FIG. 1 depicts a basic architecture of an exemplary system according to the invention. This exemplary system includes two major components: Application or App sever and client program. This server can run on Linux or Microsoft platform, include a MySQL , SQL or Oracle relational database, and provide a number of web services. The web services can be written in computer language C#, Java and other web service capable language deployed in either Tomcat (http://tomcat.apache.org/) as SOAP (Simple Object Access Protocol) service or customized web service archtechture frame, which can be accessible to variety of computer platforms including Microsoft Windows, Linux, Mac OS X and mobile portable systems. For example, image analysis programs written in Microsoft C# and running on Microsoft Windows machines are able to remotely invoke such web service or SOAP services on such Web App Server and request application based defined services.
[0054] A major function of the Web App Server is to take a gene list request from client program, search public biomedical databases on the Internet with application based rules, filters and algorithm integrate and packing the search results in a special clear format, and send the obtained annotations of genes back to the client program. For each gene with one or more gene symbols, one web service can find its gene ontology annotations, which include a gene's possible biological function, biological process, and cellular component. This web service can search QuickGO's web site
(http://www.ebi.ac.uk/OuickGO/GSearch?q=brcal), parse, cleanup, and save the search results into the Web App Server database. For convenience, we will use gene BRCA1 as example here.
[0055] The second web service can search NCBI's gene database
(http://www.ncbi.nlm.nih.gov/gene/?term=brcal) and find a gene symbol's gene id. Gene ID is an identified unique to each gene.
[0056] From gene id, the third web service can search KEGG's pathway
(http://www.genome.ip/dbget-bin/www bget?hsa:672) and other pathway databases, parse, cleanup, and save the search results into the Web App Server database.
[0057] The fourth web service can search OMIM
(http://www.omim.org/search?index=entry&sort=score+desc%2C+prefix sort+desc&start=l&limit= 10&search=brcal), HGMD and other disease databases, parse, cleanup and save the search results into the Web App Server database.
[0058] The fifth web service can search Drugbank database
(http://www.drugbank.ca/search?utf8=%E2%9C%93&query=breast+cancer&commit=Search), parse, cleanup and save the results into the Web App Server database.
[0059] The sixth web service can search NCBI's SNP database
(http://www.ncbi.nlm.nih.gov/snp/?term=brcal&SITE=NcbiHome&submit=Go), parse, cleanup and save the results into the Web App Server database. [0060] The seventh web service goes through each KEGG pathway html pages and look for whether it contains the genes in the gene list. If not, this page is ignored. If yes, it can cache the HTML content in the Web App Server database.
[0061] The search results can be cached or temporary saved in the Web App Server database. For a new request and gene list from client program, the Web App Server can check and see if the biological annotations are already available. If yes, it can directly retrieve such information from the Web App Server database. Otherwise, web services can be invoked to search and save the
annotations.
[0062] The second major component of the contextual bioassay data analysis system of the invention is the client computer program. This program can be written in Microsoft C# and other programming language and deployed on Microsoft Windows platform, or HTML/Javascript and deployed on any platform with a modern web browser, and it includes several major parts.
[0063] FIG. 8 schematically illustrates an exemplary server database scheme.
[0064] The first part provides a summary view for a list of genes. In a typical biological assay data, the sample is usually called analyte, and it can represent a gene, for example. This part or computer graphic user interface (GUI) takes a list of analytes, which are found to be statistically significant from preliminary statistical data analysis, sends the list to Web App Server via an internet standard protocol such as SOAP protocol, retrieve the corresponding major biological annotation, and then present researchers with a summary table. In this table, each row represents a gene or analyte, and each column can represent the top numbers of entries of the corresponding annotation, for example, top 5 KEGG pathways.
[0065] For a significant number of genes, this program and GUI can dramatically reduce researcher's time in finding such annotations. It is also much less error-prone. The tabular
presentation can provide researchers a quick biological view of a given gene list and enhance their research productivity.
[0066] The second part of the client program component provides in-depth, detailed and specific view of a gene list to identify possible contextual relationship between genes. For convenience of discussion, we demonstrate here using KEGG biological pathway here. The idea and computer program can be applicable for other annotations as well.
[0067] For a given list of analytes (e.g., gene), we first compile a list of all relevant pathways. Basically, we do a non-redundant union of their pathways. Second, we write computer program to dynamically create a table. In this table, each row represents a gene. Each column represents a pathway. If a gene appears in a pathway, the corresponding cell has a checkmark; otherwise blank, as illustrated in FIG. 9.
[0068] For researchers, this table provides a quick view of possible contextual relationship between genes. For example, in this table, Analyte 1 appears in many pathways, and Analyte 8 only in one. This is a good indication that Analyte 8 is more specific, and it may be biologically more interesting if research looking for high specificity disease biomarker. Analytes 6 and 7 share two pathways, and it may indicate that they are linked. Other meaningful inferences can be made by researchers allow for productive for bioassay data analysis.
[0069] The third part of the client program sends Web App Server a list of genes and request for html contents and images of possible biological pathways, which contains these genes. In this example case, these three genes (BRCA1, FA 1, MLH1) happen to be in the same fanconi anemia pathway. This program downloads the image map of this pathway from KEGG
(http://www.genome.jp/kegg/) including its HTML source. The image map is shown in FIG. 10.
[0070] After that, this program colors BRCA1 and FA 1 red and MLH1 blue in the pathway map according the flowchart shown in FIG. 11.
[0071 ] In a typical html file of a KEGG pathway map, the visible part is the image map, and a gene is usually represented by a rectangle. All gene symbols and their corresponding rectangle coordinates are given in the hidden source section.
[0072] This program finds all genes in the gene list in the pathway map, and highlights the gene name that can label in a rectangular text box. (FIG. 10) If the pathway map is from a public database, downloaded as web html format, the program can just use the map and only change the color of the gene name label shaded (FIG. 12) with a level representing the expression, intensity, concentration and/or other experimental measurements of the gene. The shades can be gray or color or in distinct scoring color to visually display in image of the experiment values. After that, this program produces a modified image and presents it to researchers. (FIG. 5 and FIG. 16) The system of the invention greatly helps researchers in making sense out of his microarray experiment results. In this case, he immediately sees and proposes a reasonable theory: over expression of BRCA1 and FA 1 in FIG. 12 causes down expression of MLH1. He can then either perform further database searches or experiment to verify his theory. For the past decade, biological pathways and other knowledge databases have been increasingly available to researchers, and this contextual bioassay data analysis tool can be a great help to researchers. In particular, majority of diseases and biological processes involve multiple genes, and this tool makes it easier for researchers to have better understanding of these diseases and processes and bioassay data. If the research in this example has more than three genes, these genes could appear in different pathways. In that case, the proposed tool can present pathways individually or in a bigger image map.
[0073] With such request, search rule-algorithm set, can expend to more application related database and data source. Only need is create another search criteria, resource link, and rules in application based table. New data can be linked to client program similar as Pathway method described in previous session. For example, other pathway databases can be employed such as Science magazine's biological signal pathway map and Wikipathway map and other biomedical knowledge such as biological function, biological process, biological component, and disease network. The knowledge and the corresponding representation does not need to be static image maps like the above example. It can be a dynamically generated (via computer programs or other devices) plot or a network with connected nodes.
[0074] In one aspect, the invention generally relates to a system for annotation of bioassay data comprising dynamically integrated functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.
[0075] In yet another aspect, the invention generally relates to a method for parsing and constructing a disease database for known human diseases as provided in OMIM and other public disease database, including: retrieving disease information from OMIM and other disease information data sources via an automated process; defining a database schema or structure for storing disease information; parsing and loading disease information data files into the above database schema; and performing corresponding indexing.
[0076] In yet another aspect, the invention generally relates to a method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, including: retrieving drug information from DrugBank and other drug information data sources via an automated process; defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and performing corresponding indexing. [0077] In yet another aspect, the invention generally relates to a method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, including: retrieving SNP and sequence information from data sources via an automated process; defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and performing
corresponding indexing.
[0078] In yet another aspect, the invention generally relates to a method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, including: packing the list of genes in a defined format; sending the list of genes to a web server using a defined protocol; performing a comprehensive searches; receiving the search results using a defined protocol; and unpacking the search results.
[0079] In yet another aspect, the invention generally relates to a method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server,
comprising: unpacking the list of genes; calling corresponding web service to perform the specific search for each gene; processing, combining and packing the search results; and
sending the search results back to the client program.
[0080] In yet another aspect, the invention generally relates to a method for searching a database for a given gene via web service and returning the results in a defined format, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a GO (gene ontology) information database, for example.
[0081] In yet another aspect, the invention generally relates to a method for user interfacing (UI) for a summary view of multiple analytes in the client computer program or client computer's web browser interface, wherein for each analyte, a list top numbers of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs, human SNPs and sequence is made in a tabular form. (FIG. 17)
[0082] In yet another aspect, the invention generally relates to a method for user interfacing for a view of single analyte in the client computer program or client computer's web browser interface, wherein a list of top numbers (e.g., 5) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs of the analyte is made in a tabular form. (FIG. 18)
[0083] In yet another aspect, the invention generally relates to a method for user interfacing for a pathway map of multiple analytes, wherein in the pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results (e.g., with red marking for gene up regulation, and green for gene down regulation, and with the brightness of color shade or different scoring color matching the experimentally determined level in a quantitative way (FIG. 12).
[0084] In yet another aspect, the invention generally relates to a method for user interfacing for a biology group or category view, like pathway view, drug list view, disease view and SNP, sequencing view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the
corresponding analyte belongs to the corresponding such biology criteria or category like pathway, diseas, drug and sequencign and wherein clicking the pathway column brings up the corresponding detail map or link of all relevant genes in such group, like a pathway map with highlight of gene symbols.
[0085] In yet another aspect, the invention generally relates to a method for user interfacing for a pathway view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the
corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols. (FIG. 9)
[0086] In yet another aspect, the invention generally relates to a method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding a set of web links to certain defined public drug information database (e.g.,
DrugBank). (FIG. 13)
[0087] In yet another aspect, the invention generally relates to a method for user interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the
corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding disease database web page. (FIG. 14)
[0088] In yet another aspect, the invention generally relates to a method and user interfacing -for a SNP sequence view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNP and sequences, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page. (FIG. 15)
[0089] In yet another aspect, the invention generally relates to a method for user interfacing for sample with an integrated biology content results view of pathway, drug, disease view and SNP, sequencing of multiple relevant analytes, wherein the UI is a table with rows representing samples and result integrated data columns representing different biology groups, like different pathways, and wherein the UI shows a set of columns for each biological group (e.g., shown in (2) in FIG. 16) In this set of column, one, (3), lists pathway relevant analyte's gene symbles in a defined order. One column, (4), lists a barcode which list a bar for each relevant analyte in this pathway. The shad of the bar or color bar represent the intensity or biology experiment value for this analyte. One column, (5), lists the result of integrated data reduction value, like a finger print analysis results, classification scoring or biology qualidation measurement, like level of the toxicities or phase of disease.
[0090] In yet another aspect, the invention generally relates to a method for data structure or format which group multiple data results of different analyte into barcode image format, shown as (4) in FIG. 16. The data presentation and enabling to adapt a fast image analysis algorithm to achieve intelligent integrated data analysis among a lot of relevant information quickly and with in biology sense.
[0091 ] In yet another aspect, the invention generally relates to a method for data grouping in biological categories (e.g., shown as (2) in the FIG. 16). It can group the sample data, (1) in FIG. 16, into different group with relevant analyte measurements, then assess the data collective according to the biological group as an barcode, as piece of image digital data, or finger print. Then run the analysis on these collective data among those biology groups, for example, a group of different pathways, assess the collective impact among different biology groups, or pathways. Such method construct a 3 dimensional analysis, like a 3-D clustering,
[0092] In yet another aspect, the invention generally relates to a process method for plug in the application based algorithm to run specific integrated analysis, for example to differentiate type sof cancers, estimate disease phases or stages, measure drug toxicity levels, shown as (5) in FIG. 16.
[0093] In yet another aspect, the invention generally relates to a process method for comparing the data online to analyze experiment data intelligently. For example, from public domain and gather a lot of analyte's data for different cancers, integrates data into the barcode or an integrated data set, then use it to train the computer program artificial function, further to apply such intelligence to classify the experiment data. [0094] In this specification and the appended claims, the singular forms "a," "an," and "the" include plural reference, unless the context clearly dictates otherwise.
[0095] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, the preferred methods and materials are now described. Methods recited herein may be carried out in any order that is logically possible, in addition to a particular order disclosed.
Incorporation by Reference
[0096] References and citations to other documents, such as patents, patent applications, patent publications, journals, books, papers, web contents, have been made in this disclosure. All such documents are hereby incorporated herein by reference in their entirety for all purposes. Any material, or portion thereof, that is said to be incorporated by reference herein, but which conflicts with existing definitions, statements, or other disclosure material explicitly set forth herein is only incorporated to the extent that no conflict arises between that incorporated material and the present disclosure material. In the event of a conflict, the conflict is to be resolved in favor of the present disclosure as the preferred disclosure.
Equivalents
[0097] The representative examples are intended to help illustrate the invention, and are not intended to, nor should they be construed to, limit the scope of the invention. Indeed, various modifications of the invention and many further embodiments thereof, in addition to those shown and described herein, will become apparent to those skilled in the art from the full contents of this document, including the examples and the references to the scientific and patent literature included herein. The examples contain important additional information, exemplification and guidance that can be adapted to the practice of this invention in its various embodiments and equivalents thereof.
What is claimed is:

Claims

1. A system for bioassay data analysis comprising dynamically linking and integrating functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop or a computing device access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.
2. The system of Claim 1, wherein the user's desktop or a computing device, or instrument computer is dynamically linked to and has direct access to the content of one or more remote databases.
3. The system of Claim 2, wherein the user's desktop or a computing device, or instrument computer dynamically retrieves content of one or more remote databases.
4. The system of Claim 3, wherein the user's desktop or a computing device, or instrument computer dynamically integrate local data content and content of one or more remote databases.
5. The system of Claim 1, wherein the local data content and remote data content are packaged to reflect biological properties, functions and profiles of analytes.
6. The system of Claim 1, wherein the local data content and remote data content are presented in tabular, map, color or grey barcodes reflecting biological properties, functions and profiles of analytes.
7. The system of Claim 6, wherein the local data content and remote data content are presented using color and/or gray barcodes.
8. The system of Claim 1, wherein the local data content and remote data content are analyzed using a 3-D image analysis tool.
9. The system of Claim 1, wherein the local data content and remote data content are analyzed using an artificial intelligent tool.
10. The system of Claim 1, wherein the bioassay data are analyzed using current pathway
information dynamically obtained from remote database content.
11. A computer linked to an instrument control having post analysis software and through a web server connection to a plurality of remote information resources and databases on genome and biology via the Internet.
12. A method for integrating experimental data of multiple analytes from different technology and platform using gene symbols or IDs for advanced data analysis comprising clustering, classification and statics analysis.
13. A method for using gene symbols or IDs to group experimental data into pathways, diseases, drugs, SNPs and sequences and/or further biology groups using gene symbols or IDs as a factor in grouping the experimental data.
14. A method for presenting experimental data from a bioassay sample into a barcode like image, wherein each bar represents one analyte.
15. The method of Claim 14, wherein the barcode represents a specific pathway, disease, drug, SNP and sequence with defined order.
16. The method of Claim 15, wherein the barcodes are shaded, gray or colored, and/or having scoring colors representing the analyte experiment value, comprising bioassay experimental expression, bio content concentration, and gene regulation.
17. A method of data reduction and integrated analysis comprising applying clustering to biology grouped barcode data among multiple biology groups to perform 3D clustering analysis.
18. The method of Claim 15, comprising using known data remotely obtained from online
resources and databases to run classification to train the computer for artificial intelligent algorithm or rule set.
19. A method for classifying experimental data by the algorithm obtained from training
according to Claim 18.
20. A method for parsing and constructing a disease database for known human diseases as
provided in OMIM and other public disease database , comprising:
retrieving disease information from OMIM and other disease information data sources via an automated process;
defining a database schema or structure for storing disease information;
parsing and loading disease information data files into the above database schema; and
performing corresponding indexing.
21. A method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, comprising:
retrieving drug information from DrugBank and other drug information data sources via an automated process;
defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and
performing corresponding indexing.
22. A method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, comprising:
retrieving SNP and sequence information from data sources via an automated process;
defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and
performing corresponding indexing.
23. A method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, comprising:
packing the list of genes in a defined format;
sending the list of genes to a web server using a defined protocol; performing a comprehensive searches;
receiving the search results using a defined protocol; and
unpacking the search results.
24. A method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server, comprising:
unpacking the list of genes;
calling corresponding web service to perform the specific search for each gene; processing, combining and packing the search results; and
sending the search results back to the client program.
25. A method for searching a database for a given gene via web service and returning the results in a defined format.
26. The method of Claim 8, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a GO (gene ontology) information database.
27. A method for user interfacing (UI) for a summary view of multiple analytes in the client computer program, wherein for each analyte, a list of a user defined number of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs and sequence is made in a tabular form.
28. A method for user interfacing for a view of single analyte in the client computer program, wherein a list of a user defined number of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs of the analyte is made in a tabular form.
29. A method for user interfacing for a pathway map of multiple analytes, wherein in the
pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results.
30. A method for user interfacing for a pathway view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols.
31. A method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding DrugBank web page of the drug.
32. A method for use interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding disease database web page.
33. A method for user interfacing for a SNP view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNPs, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page.
PCT/US2014/029958 2013-03-15 2014-03-15 Systems and apparatus for integrated and comprehensive biomedical annotation of bioassay data WO2014145234A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361799906P 2013-03-15 2013-03-15
US61/799,906 2013-03-15

Publications (2)

Publication Number Publication Date
WO2014145234A2 true WO2014145234A2 (en) 2014-09-18
WO2014145234A3 WO2014145234A3 (en) 2014-11-27

Family

ID=51538436

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/029958 WO2014145234A2 (en) 2013-03-15 2014-03-15 Systems and apparatus for integrated and comprehensive biomedical annotation of bioassay data

Country Status (1)

Country Link
WO (1) WO2014145234A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10395759B2 (en) 2015-05-18 2019-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for copy number variant detection
CN110838338A (en) * 2018-08-15 2020-02-25 上海美吉生物医药科技有限公司 System, method, storage medium, and electronic device for creating biological analysis item
CN111724873A (en) * 2020-06-18 2020-09-29 北京嘉和海森健康科技有限公司 Data processing method and device
CN113075308A (en) * 2021-03-09 2021-07-06 商丘医学高等专科学校 Screening system and method for individualized lutein-containing anti-cancer drug

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080124753A1 (en) * 2006-11-15 2008-05-29 Lee Seunghun Paul SpiceMatrix Technology for Taste Compound Identification
RU2345416C1 (en) * 2007-05-31 2009-01-27 НАСЫПНАЯ Галина Анатольевна Method of synthesis of self-trained analytical question-answer system with extraction of knowledge from texts
WO2011079846A2 (en) * 2009-12-30 2011-07-07 Rigshospitalet Mrna classification of thyroid follicular neoplasia

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080124753A1 (en) * 2006-11-15 2008-05-29 Lee Seunghun Paul SpiceMatrix Technology for Taste Compound Identification
RU2345416C1 (en) * 2007-05-31 2009-01-27 НАСЫПНАЯ Галина Анатольевна Method of synthesis of self-trained analytical question-answer system with extraction of knowledge from texts
WO2011079846A2 (en) * 2009-12-30 2011-07-07 Rigshospitalet Mrna classification of thyroid follicular neoplasia

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANALYTI ANASTASIA ET AL.: 'Integrating Clinical and Genomic Information through the PrognoChip Mediator.' INTERNATIONAL SYMPOSIUM ON MEDICAL DATA ANALYSIS - ISMDA 2006, *
BELYSHEV D. V. ET AL.: 'Ispolzovanie tekhnology shtrikh-kodirovaniya v meditsinskikh informatsionnykh sistemakh. Programmnye sistemy: teoriya i prilozheniya.' PERESLAVL-ZALESSKY 2009, *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10395759B2 (en) 2015-05-18 2019-08-27 Regeneron Pharmaceuticals, Inc. Methods and systems for copy number variant detection
US11568957B2 (en) 2015-05-18 2023-01-31 Regeneron Pharmaceuticals Inc. Methods and systems for copy number variant detection
CN110838338A (en) * 2018-08-15 2020-02-25 上海美吉生物医药科技有限公司 System, method, storage medium, and electronic device for creating biological analysis item
CN110838338B (en) * 2018-08-15 2023-09-29 上海美吉生物医药科技有限公司 Biological analysis item establishment system, biological analysis item establishment method, storage medium, and electronic device
CN111724873A (en) * 2020-06-18 2020-09-29 北京嘉和海森健康科技有限公司 Data processing method and device
CN111724873B (en) * 2020-06-18 2024-01-09 北京嘉和海森健康科技有限公司 Data processing method and device
CN113075308A (en) * 2021-03-09 2021-07-06 商丘医学高等专科学校 Screening system and method for individualized lutein-containing anti-cancer drug
CN113075308B (en) * 2021-03-09 2023-02-28 商丘医学高等专科学校 Screening system and method for individualized lutein-containing anti-cancer drug

Also Published As

Publication number Publication date
WO2014145234A3 (en) 2014-11-27

Similar Documents

Publication Publication Date Title
Subramanian et al. GSEA-P: a desktop application for Gene Set Enrichment Analysis
Brown et al. Automated workflows for accurate mass-based putative metabolite identification in LC/MS-derived metabolomic datasets
Zhu et al. Targeted exploration and analysis of large cross-platform human transcriptomic compendia
Liberzon A description of the molecular signatures database (MSigDB) web site
Zhang et al. GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies
Wild The exposome: from concept to utility
Shen et al. EADB: an estrogenic activity database for assessing potential endocrine activity
Longnecker et al. Environmental metabolomics: Databases and tools for data analysis
Zhou et al. Using the Wash U Epigenome Browser to examine genome‐wide sequencing data
Moreno et al. BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology
JP2009520278A (en) Systems and methods for scientific information knowledge management
US20160019335A1 (en) Method, apparatus and computer program product for metabolomics analysis
Xie et al. MOBCdb: a comprehensive database integrating multi-omics data on breast cancer for precision medicine
Canny et al. PubChem promiscuity: a web resource for gathering compound promiscuity data from PubChem
Carvalho et al. Analyzing shotgun proteomic data with PatternLab for proteomics
Ara et al. Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses
Fahy et al. Bioinformatics for lipidomics
Mias et al. MathIOmica: an integrative platform for dynamic omics
WO2014145234A2 (en) Systems and apparatus for integrated and comprehensive biomedical annotation of bioassay data
Abugessaisa et al. The FANTOM5 computation ecosystem: genomic information hub for promoters and active enhancers
Kirov et al. Functional annotation of differentially regulated gene set using WebGestalt: a gene set predictive of response to ipilimumab in tumor biopsies
Sun et al. WebGIVI: a web-based gene enrichment analysis and visualization tool
Conroy et al. LIPID MAPS: update to databases and tools for the lipidomics community
Musa et al. Systems pharmacogenomic landscape of drug similarities from LINCS data: drug association networks
Xie et al. Getting started with LINCS datasets and tools

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14762756

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 14762756

Country of ref document: EP

Kind code of ref document: A2