WO2014145234A2

WO2014145234A2 - Systems and apparatus for integrated and comprehensive biomedical annotation of bioassay data

Info

Publication number: WO2014145234A2
Application number: PCT/US2014/029958
Authority: WO
Inventors: Minzi Y. RUAN; Yuhong Yang; Jason J. Ruan; Shenzhi YU; Shaoquan JI
Original assignee: Vigenetech, Inc.
Priority date: 2013-03-15
Filing date: 2014-03-15
Publication date: 2014-09-18
Also published as: WO2014145234A3

Abstract

The invention provides unique systems and apparatus for linking and integrating vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has, for example, by using the biological annotation of bioassay data as a basic link, packing the available information in application based methods, presenting the data in an intuitive way, and analyzing the current data set with integration with available online information to better and more fully analyze the data and significance thereof.

Description

SYSTEMS AND APPARATUS FOR INTEGRATED AND COMPREHENSIVE

BIOMEDICAL ANNOTATION OF BIOASSAY DATA

Priority Claims and Related Patent Applications

[0001] This application claims the benefit of priority from U.S. Provisional Application Serial No. 61/799,906, filed on March 15, 2013, the entire content of which is incorporated herein by reference in its entirety.

Technical Fields of the Invention

[0001] The invention generally relates to systems and apparatus for biological data analysis and annotation. More particularly, the invention relates to novel systems and apparatus for linking and integrating comprehensive functions of searching, recording, storage, organizing, classification, data reduction, retrieval, analysis and imaging of biological, medical and healthcare data.

Background of the Invention

[0002] Biological assays are biological testing tools for measuring the presence and/or

concentration of biologically relevant markers or pharmaceutical substances in a patient or research sample or specimen. Detailed information on the biological markers or drug substances can be obtained, which can guide biological research, drug development, as well as disease prevention, diagnosis and prognosis.

[0003] High throughput technologies generate enormous amount of data, especially in genomics and biology. Many government supported projects and databases have vast information with the volume and variety of available information rapidly increase every day. This vast collection of data is very useful but is difficult for a researcher to utilize. Currently, one must put in significant effort and time to gain access to and utilize the information at a variety of sources. Additionally, such information is not in an integrated format and difficult to use. It would be a major technological advancement if a user can have an easy method to have connection and access to the vast online resources and databases. Unfortunately, at the present day the bioassay data analyses are

disconnected with various online information and resources. Data review and analyses are typically done without direct reference to other relevant and pertinent information, which makes it very hard to put an individual data point in the context of as much as possible all the currently available information and resources. [0004] Biological assays generate analog or digital raw data, images, signals, spectra, or graphs from which useful patterns, trends, results and conclusions may be extracted. The assays may fall in a variety of categories including genes, genetic mutations, gene expression and regulation, biological pathways as well as molecular interactions. Typical biological assays include three stages: (1) selection and preparation of a biological sample (e.g., cells or tissues), (2) addition of biological or chemical agents or probes (e.g., proteins and organic molecules) to the sample, and (3) capture of the resulting responses or affects on the tested sample.

[0005] In the past, one biomarker was usually tested at a time for a sample to generate one data point. During the past decade, thanks to the widespread application of robotics and the exponentially increasing computational power, tens to thousands of biomarkers can be tested for one sample at the same time to provide large amount of information about the sample, which may be used for research and medical purposes.

[0006] Typically, analysis of bioassay data includes two steps. The first (Step 1) is to detect and identify significant and noteworthy factors from insignificant and inconsequential ones mainly by statistical methods. The second step (Step 2) is to understand and decode and analyze the collected and filtered information in a relevant biological context.

[0007] Many commercial application systems or in-house built systems have been developed for bioassay data analysis. Most of them, however, suffer from the same problem: they are strong for the first step, but weak or non-existent for the second. As a result, biomedical researchers or drug developers have to spend a large amount of time performing the second step. For example, for bioassays using microarray technology, hundreds or even thousands of genes can be identified to be significant by statistical methods (Step 1). After that, in Step 2, researchers have to spend a large amount of time (from weeks to months) to compile these genes' biological functions, possible actions in human diseases, and roles in biological pathways. Once with such list available, they have to use their domain knowledge and make biomedical senses of the collected data. For Step 2, the conventional manual annotation is not only time consuming, it usually can only help scientists to see leaves or trees, but not the integrated forest. From our knowledge, there are no available tools that help researchers to understand the collected data in a biological context.

[0008] Thus, there remains a need for novel systems and apparatus for connected and integrated and comprehensive searching, retrieval, organizing, classification, analysis, storage and presentation of biological, medical and healthcare data.

Summary of the Invention [0009] The invention provides unique systems and apparatus for linking and integrating vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has, for example, by using the biological annotation of bioassay data as a basic link, packing the available information in application based methods, presenting the data in an intuitive way, and analyzing the current data set with integration with available online information to better and more fully analyze the data and significance thereof.

[0010] In one aspect, the invention generally relates to a system for annotation of bioassay data comprising dynamically integrated functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.

[001 1] In certain embodiments, the user's desktop or instrument computer is dynamically linked to and has direct access to the content of one or more remote databases.

[0012] In certain embodiments, the user's desktop or instrument computer dynamically retrieves content of one or more remote databases.

[0013] In certain embodiments, the user's desktop or instrument computer dynamically integrate local data content and content of one or more remote databases.

[0014] In certain embodiments, the local data content and remote data content are packaged to reflect biological properties, functions and profiles of analytes.

[0015] In certain embodiments, the local data content and remote data content are presented in tabular, map, color or grey barcodes reflecting biological properties, functions and profiles of analytes.

[0016] In certain embodiments, the local data content and remote data content are presented using color and/or gray barcodes.

[0017] In certain embodiments, the local data content and remote data content are analyzed using a 3-D image analysis tool.

[0018] In certain embodiments, the local data content and remote data content are analyzed using an artificial intelligent tool. [0019] In certain embodiments, the bioassay data are analyzed using current pathway information dynamically obtained from remote database content.

[0020] In another aspect, the invention generally relates to a computer linked to an instrument control having post analysis software and through a web server connection to a plurality of remote information resources and databases on genome and biology via the Internet.

[0021 ] In yet another aspect, the invention generally relates to a method for integrating experimental data of multiple analytes from different technology and platform using gene symbols or

IDs for advanced data analysis comprising clustering, classification and statics analysis.

[0022] In yet another aspect, the invention generally relates to a method for using gene symbols or

IDs to group experimental data into pathways, diseases, drugs, SNPs and sequences and/or further biology groups using gene symbols or IDs as a factor in grouping the experimental data.

[0023] In yet another aspect, the invention generally relates to a method for presenting

experimental data from a bioassay sample into a barcode like image, wherein each bar represents one analyte.

[0024] In certain embodiments, the barcode represents a specific pathway, disease, drug, SNP and sequence with defined order.

[0025] In certain embodiments, the barcodes are shaded, gray or colored, and/or having scoring colors representing the analyte experiment value, comprising bioassay experimental expression, bio content concentration, and gene regulation.

[0026] In yet another aspect, the invention generally relates to a method of data reduction and integrated analysis comprising applying clustering to biology grouped barcode data among multiple biology groups to perform 3D clustering analysis.

[0027] In certain embodiments, the method includes using known data remotely obtained from online resources and databases to run classification to train the computer for artificial intelligent algorithm or rule set.

[0028] In certain embodiments, the method performs classifying experimental data by the algorithm obtained from training.

[0029] In yet another aspect, the invention generally relates to a method for parsing and constructing a disease database for known human diseases as provided in OMIM and other public disease database, including: retrieving disease information from OMIM and other disease information data sources via an automated process; defining a database schema or structure for storing disease information; parsing and loading disease information data files into the above database schema; and performing corresponding indexing.

[0030] In yet another aspect, the invention generally relates to a method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, including: retrieving drug information from DrugBank and other drug information data sources via an automated process; defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and performing corresponding indexing.

[0031 ] In yet another aspect, the invention generally relates to a method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, including: retrieving SNP and sequence information from data sources via an automated process; defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and performing corresponding indexing.

[0032] In yet another aspect, the invention generally relates to a method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, including: packing the list of genes in a defined format; sending the list of genes to a web server using a defined protocol; performing a comprehensive searches; receiving the search results using a defined protocol; and unpacking the search results.

[0033] In yet another aspect, the invention generally relates to a method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server,

comprising: unpacking the list of genes; calling corresponding web service to perform the specific search for each gene; processing, combining and packing the search results; and

sending the search results back to the client program.

[0034] In yet another aspect, the invention generally relates to a method for searching a database for a given gene via web service and returning the results in a defined format, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a gene ontology (GO) information database, for example.

[0035] In yet another aspect, the invention generally relates to a method for user interfacing (UI) for a summary view of multiple analytes in the client computer program or a web browser, wherein for each analyte, a list of top 5 (which can be any user selected value, e.g., 10) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs human SNPs and sequence is made in a tabular form.

[0036] In yet another aspect, the invention generally relates to a method for user interfacing for a view of single analyte in the client computer program, wherein a list of top 5 (which can be any user selected value, e.g., 10) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs, human SNPs and sequence of the analyte is made in a tabular form.

[0037] In yet another aspect, the invention generally relates to a method for user interfacing for a pathway map or other biology group map or form of multiple analytes, wherein in the pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results. Various makings may be used such as with red marking for gene up regulation, and green for gene down regulation, and with the brightness of color matching the experimentally determined level in a quantitative way.

[0038] In yet another aspect, the invention generally relates to a method for user interfacing (UI) for a pathway view or other biology group structure view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols.

[0039] In yet another aspect, the invention generally relates to a method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding DrugBank and other public information resources web page of the drug.

[0040] In yet another aspect, the invention generally relates to a method for user interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the

corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding OMIM, HGMD and other disease resources web pages.

[0041] In yet another aspect, the invention generally relates to a method and user interfacing for a SNP view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNPs, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page.

Detailed Description of the Invention

[0042] The invention provides unique systems and apparatus for linking vast online available information and resources to a researcher conducting specific biological experiment or analysis and to the experimental data collection the researcher has. The systems and apparatus of the invention use the biological annotation of bioassay data as a basic link, pack the available information in application based methods, present the data in an intuitive way, and analyze the current data set with integration with available online information, in order to reveal hidden and fundamental biology content and significance. The invention provides unique systems and apparatus for biological annotation of bioassay data, which provide for integrated, comprehensive and contextual recording, storage, organizing, annotation, classification, retrieval, analysis and imaging of biological, medical and healthcare data.

[0043] As used herein, the term "biological marker" or "biomarker", refers to an indicator of a biological state. It is a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention. In cell biology, for example, a biomarker is a molecule that allows for the detection and isolation of a particular cell type. In medicine, for example, a biomarker can be a traceable substance that is introduced into an organism as a means to examine organ function or other aspects of health. In the context of bioassays, a biomarker being measured or analyzed is also referred to as an

"analyte".

[0044] The system of the invention allows one to make a dynamic link directly between a biologist's or a bench-top instrument's computer to all relevant information, data bases and resources in the public domain. Such information, data bases and resources are available before and after an experiment and data analysis. The information can be packed in various ways give the researcher an overview as well as guidance regarding his experiment and analysis.

[0045] The information can be presented such as to make it easier to see both the details and the big picture in the larger the context, and the collective information rather than just individual data point or result, which enables one to group data based on pathways, diseases, drugs and genetics. The information and data can be packaged and presented in any suitable form, for example, in tabular, map, color or grey barcodes with the group based biological functions and profiles. Biology guided data clustering based on color or gray barcodes can be processed by advanced image analysis and artificial intelligent tools to achieve higher level data reduction and reveal otherwise hard to deduct insight. Thus, the system allows a user to set up and perform the analysis on the user's and bench-top or desk computer, where he can have at his disposal all relevant, available information via online access. For example, this allows one to perform clustering analysis and use current pathway information to guide the classification. The system also can accommodate different operating systems and platforms as they will be running same API, which lets each individual computer run much powerful programs without worrying about a computer's OS is and capacity.

[0046] As schematically illustrated in FIG. 1, the system is built to make a way to connect (1) program run on individual PC to connect public available data on the Internet or for user with a set of data from a web browser to search biology information on the Internet.

[0047] Appliction Web Server will contain database for packet biology information as well as algorithm to form packed information dynamically for specific applcications. (FIG. 2) User PC has database to store the packed biology information associated with his own expreiment and data set. The public donmain database will be connected for the application with Web App Server supports, such as GO, Pathway or ΟΜΓΝ.

[0048] An exemplary main work flow is schematically depicted in FIG. 3.

[0049] FIG. 4 shows an exemplary data system for gene annotaion linking to DNA, protein or other types of biological assay. GO database represents public online Genome Ontology annotation database. Data flow allows assay annotaion built on other biological data bases and pathway, drug, desease and sequencing data and other biology information.

[0050] As shown in FIGs. 5, 6 and 7, the analyte data can be grouped according to pathway, desease, drug, sequece or certain biology function or catogories. The data can be presented in tabular forms and maps, such as pathway map to show what analytes belong to the same pathway or group. All analyte data obtained from experiments, databases, or other resources will be placed into groups, such as same pathway, same biology function association group and will be listed, for example, in a defined order in a barcode. Each analyt will be represented as one bar in the barcoder. For example, as shown in FIG. 5, an analyt can be reprented in a barcode of (1) a color or (2) gray, with the color or gray shades representing a value of the analyte, such as the degree of expression level or analyte concentraion. The analyt value also can be represented in (3) discreted color, such as scoring.

Differtent levels of analyt values will be shown in different colors. A set of supervised clustering analysis can be performed for further data reduction, guided by pathway, biological function and genetic information. The clustering analysis can run similar barcode data across multiple pathways, diseases and biological groups, and to perform biological supervised 3D clustering analysis.

Furthermore, artificial intelligence classification methods can be applied to barcode images or data sets to perform finger print analysis and biology measurement for further data reduction to uncover insights and patterns.

[0051 ] In another aspect, and to use an example to help explain the challenges the unique system of the invention intend to address, let us explain a typical microarray bioassay briefly. We human beings have about 22,000 genes. Commercial companies sell a device with information about these 22,000 genes. This device is able to detect how much each gene is present in a biological sample such as liver tissue. Let us say a researcher wants to compare the gene expression difference between liver cancer cells and normal cells. He could buy such a device from a vendor and then measure the expression level of the 22,000 genes. Then he will compare the gene expression differences between cancer and normal cells. For demonstrational purpose, let us assume that he used statistical tool and found that three genes are significantly different between tumor and normal cells: BRCA1, FA 1, and MLH1. Among the three genes, BRCA1 and FA 1 are higher in tumor than normal cells, and MLH1 lower. With these three genes, this researcher has to do two things. First, find known knowledge about these three genes. There are quite a few web sites he could use such as NCBI (www.ncbi.nlm.nih.gov) or Genecard (www. genecards . or g) . For three genes, this should be trivial, but it is totally another story for 1,000 genes, which is not uncommon. Second, he has to understand the biological context and meaning of this finding. For this part, no commercial tools, to our knowledge, exist for help.

[0052] An exemplary computer application system of the invention includes a Windows client program or a Web App, a MySQL, SQL or Oracle database, and a set of web services. Such a system provides integrated and comprehensive biomedical annotation and application based data searching and packing method and rules of bioassay data. In a typical statistical analysis of bioassay data, such as microarray and tissue array, hundreds or even thousands of genes can be identified to be significant for biomedical research goals. Given such lists of interested genes, the system disclosed herein will search and identify major and important facts and knowledge about these genes, present the findings in a user friendly and effective way, and perform novel data analyses by integrating statistical tools and biomedical knowledge.

[0053] For example, FIG. 1 depicts a basic architecture of an exemplary system according to the invention. This exemplary system includes two major components: Application or App sever and client program. This server can run on Linux or Microsoft platform, include a MySQL , SQL or Oracle relational database, and provide a number of web services. The web services can be written in computer language C#, Java and other web service capable language deployed in either Tomcat (http://tomcat.apache.org/) as SOAP (Simple Object Access Protocol) service or customized web service archtechture frame, which can be accessible to variety of computer platforms including Microsoft Windows, Linux, Mac OS X and mobile portable systems. For example, image analysis programs written in Microsoft C# and running on Microsoft Windows machines are able to remotely invoke such web service or SOAP services on such Web App Server and request application based defined services.

[0054] A major function of the Web App Server is to take a gene list request from client program, search public biomedical databases on the Internet with application based rules, filters and algorithm integrate and packing the search results in a special clear format, and send the obtained annotations of genes back to the client program. For each gene with one or more gene symbols, one web service can find its gene ontology annotations, which include a gene's possible biological function, biological process, and cellular component. This web service can search QuickGO's web site

(http://www.ebi.ac.uk/OuickGO/GSearch?q=brcal), parse, cleanup, and save the search results into the Web App Server database. For convenience, we will use gene BRCA1 as example here.

[0055] The second web service can search NCBI's gene database

(http://www.ncbi.nlm.nih.gov/gene/?term=brcal) and find a gene symbol's gene id. Gene ID is an identified unique to each gene.

[0056] From gene id, the third web service can search KEGG's pathway

(http://www.genome.ip/dbget-bin/www bget?hsa:672) and other pathway databases, parse, cleanup, and save the search results into the Web App Server database.

[0057] The fourth web service can search OMIM

(http://www.omim.org/search?index=entry&sort=score+desc%2C+prefix sort+desc&start=l&limit= 10&search=brcal), HGMD and other disease databases, parse, cleanup and save the search results into the Web App Server database.

[0058] The fifth web service can search Drugbank database

(http://www.drugbank.ca/search?utf8=%E2%9C%93&query=breast+cancer&commit=Search), parse, cleanup and save the results into the Web App Server database.

[0059] The sixth web service can search NCBI's SNP database

(http://www.ncbi.nlm.nih.gov/snp/?term=brcal&SITE=NcbiHome&submit=Go), parse, cleanup and save the results into the Web App Server database. [0060] The seventh web service goes through each KEGG pathway html pages and look for whether it contains the genes in the gene list. If not, this page is ignored. If yes, it can cache the HTML content in the Web App Server database.

[0061] The search results can be cached or temporary saved in the Web App Server database. For a new request and gene list from client program, the Web App Server can check and see if the biological annotations are already available. If yes, it can directly retrieve such information from the Web App Server database. Otherwise, web services can be invoked to search and save the

annotations.

[0062] The second major component of the contextual bioassay data analysis system of the invention is the client computer program. This program can be written in Microsoft C# and other programming language and deployed on Microsoft Windows platform, or HTML/Javascript and deployed on any platform with a modern web browser, and it includes several major parts.

[0063] FIG. 8 schematically illustrates an exemplary server database scheme.

[0064] The first part provides a summary view for a list of genes. In a typical biological assay data, the sample is usually called analyte, and it can represent a gene, for example. This part or computer graphic user interface (GUI) takes a list of analytes, which are found to be statistically significant from preliminary statistical data analysis, sends the list to Web App Server via an internet standard protocol such as SOAP protocol, retrieve the corresponding major biological annotation, and then present researchers with a summary table. In this table, each row represents a gene or analyte, and each column can represent the top numbers of entries of the corresponding annotation, for example, top 5 KEGG pathways.

[0065] For a significant number of genes, this program and GUI can dramatically reduce researcher's time in finding such annotations. It is also much less error-prone. The tabular

presentation can provide researchers a quick biological view of a given gene list and enhance their research productivity.

[0066] The second part of the client program component provides in-depth, detailed and specific view of a gene list to identify possible contextual relationship between genes. For convenience of discussion, we demonstrate here using KEGG biological pathway here. The idea and computer program can be applicable for other annotations as well.

[0067] For a given list of analytes (e.g., gene), we first compile a list of all relevant pathways. Basically, we do a non-redundant union of their pathways. Second, we write computer program to dynamically create a table. In this table, each row represents a gene. Each column represents a pathway. If a gene appears in a pathway, the corresponding cell has a checkmark; otherwise blank, as illustrated in FIG. 9.

[0068] For researchers, this table provides a quick view of possible contextual relationship between genes. For example, in this table, Analyte 1 appears in many pathways, and Analyte 8 only in one. This is a good indication that Analyte 8 is more specific, and it may be biologically more interesting if research looking for high specificity disease biomarker. Analytes 6 and 7 share two pathways, and it may indicate that they are linked. Other meaningful inferences can be made by researchers allow for productive for bioassay data analysis.

[0069] The third part of the client program sends Web App Server a list of genes and request for html contents and images of possible biological pathways, which contains these genes. In this example case, these three genes (BRCA1, FA 1, MLH1) happen to be in the same fanconi anemia pathway. This program downloads the image map of this pathway from KEGG

(http://www.genome.jp/kegg/) including its HTML source. The image map is shown in FIG. 10.

[0070] After that, this program colors BRCA1 and FA 1 red and MLH1 blue in the pathway map according the flowchart shown in FIG. 11.

[0071 ] In a typical html file of a KEGG pathway map, the visible part is the image map, and a gene is usually represented by a rectangle. All gene symbols and their corresponding rectangle coordinates are given in the hidden source section.

[0072] This program finds all genes in the gene list in the pathway map, and highlights the gene name that can label in a rectangular text box. (FIG. 10) If the pathway map is from a public database, downloaded as web html format, the program can just use the map and only change the color of the gene name label shaded (FIG. 12) with a level representing the expression, intensity, concentration and/or other experimental measurements of the gene. The shades can be gray or color or in distinct scoring color to visually display in image of the experiment values. After that, this program produces a modified image and presents it to researchers. (FIG. 5 and FIG. 16) The system of the invention greatly helps researchers in making sense out of his microarray experiment results. In this case, he immediately sees and proposes a reasonable theory: over expression of BRCA1 and FA 1 in FIG. 12 causes down expression of MLH1. He can then either perform further database searches or experiment to verify his theory. For the past decade, biological pathways and other knowledge databases have been increasingly available to researchers, and this contextual bioassay data analysis tool can be a great help to researchers. In particular, majority of diseases and biological processes involve multiple genes, and this tool makes it easier for researchers to have better understanding of these diseases and processes and bioassay data. If the research in this example has more than three genes, these genes could appear in different pathways. In that case, the proposed tool can present pathways individually or in a bigger image map.

[0073] With such request, search rule-algorithm set, can expend to more application related database and data source. Only need is create another search criteria, resource link, and rules in application based table. New data can be linked to client program similar as Pathway method described in previous session. For example, other pathway databases can be employed such as Science magazine's biological signal pathway map and Wikipathway map and other biomedical knowledge such as biological function, biological process, biological component, and disease network. The knowledge and the corresponding representation does not need to be static image maps like the above example. It can be a dynamically generated (via computer programs or other devices) plot or a network with connected nodes.

[0074] In one aspect, the invention generally relates to a system for annotation of bioassay data comprising dynamically integrated functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.

[0075] In yet another aspect, the invention generally relates to a method for parsing and constructing a disease database for known human diseases as provided in OMIM and other public disease database, including: retrieving disease information from OMIM and other disease information data sources via an automated process; defining a database schema or structure for storing disease information; parsing and loading disease information data files into the above database schema; and performing corresponding indexing.

[0076] In yet another aspect, the invention generally relates to a method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, including: retrieving drug information from DrugBank and other drug information data sources via an automated process; defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and performing corresponding indexing. [0077] In yet another aspect, the invention generally relates to a method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, including: retrieving SNP and sequence information from data sources via an automated process; defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and performing

corresponding indexing.

[0078] In yet another aspect, the invention generally relates to a method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, including: packing the list of genes in a defined format; sending the list of genes to a web server using a defined protocol; performing a comprehensive searches; receiving the search results using a defined protocol; and unpacking the search results.

[0079] In yet another aspect, the invention generally relates to a method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server,

sending the search results back to the client program.

[0080] In yet another aspect, the invention generally relates to a method for searching a database for a given gene via web service and returning the results in a defined format, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a GO (gene ontology) information database, for example.

[0081] In yet another aspect, the invention generally relates to a method for user interfacing (UI) for a summary view of multiple analytes in the client computer program or client computer's web browser interface, wherein for each analyte, a list top numbers of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs, human SNPs and sequence is made in a tabular form. (FIG. 17)

[0082] In yet another aspect, the invention generally relates to a method for user interfacing for a view of single analyte in the client computer program or client computer's web browser interface, wherein a list of top numbers (e.g., 5) of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs of the analyte is made in a tabular form. (FIG. 18)

[0083] In yet another aspect, the invention generally relates to a method for user interfacing for a pathway map of multiple analytes, wherein in the pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results (e.g., with red marking for gene up regulation, and green for gene down regulation, and with the brightness of color shade or different scoring color matching the experimentally determined level in a quantitative way (FIG. 12).

[0084] In yet another aspect, the invention generally relates to a method for user interfacing for a biology group or category view, like pathway view, drug list view, disease view and SNP, sequencing view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the

corresponding analyte belongs to the corresponding such biology criteria or category like pathway, diseas, drug and sequencign and wherein clicking the pathway column brings up the corresponding detail map or link of all relevant genes in such group, like a pathway map with highlight of gene symbols.

[0085] In yet another aspect, the invention generally relates to a method for user interfacing for a pathway view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the

corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols. (FIG. 9)

[0086] In yet another aspect, the invention generally relates to a method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding a set of web links to certain defined public drug information database (e.g.,

DrugBank). (FIG. 13)

[0087] In yet another aspect, the invention generally relates to a method for user interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the

corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding disease database web page. (FIG. 14)

[0088] In yet another aspect, the invention generally relates to a method and user interfacing -for a SNP sequence view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNP and sequences, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page. (FIG. 15)

[0089] In yet another aspect, the invention generally relates to a method for user interfacing for sample with an integrated biology content results view of pathway, drug, disease view and SNP, sequencing of multiple relevant analytes, wherein the UI is a table with rows representing samples and result integrated data columns representing different biology groups, like different pathways, and wherein the UI shows a set of columns for each biological group (e.g., shown in (2) in FIG. 16) In this set of column, one, (3), lists pathway relevant analyte's gene symbles in a defined order. One column, (4), lists a barcode which list a bar for each relevant analyte in this pathway. The shad of the bar or color bar represent the intensity or biology experiment value for this analyte. One column, (5), lists the result of integrated data reduction value, like a finger print analysis results, classification scoring or biology qualidation measurement, like level of the toxicities or phase of disease.

[0090] In yet another aspect, the invention generally relates to a method for data structure or format which group multiple data results of different analyte into barcode image format, shown as (4) in FIG. 16. The data presentation and enabling to adapt a fast image analysis algorithm to achieve intelligent integrated data analysis among a lot of relevant information quickly and with in biology sense.

[0091 ] In yet another aspect, the invention generally relates to a method for data grouping in biological categories (e.g., shown as (2) in the FIG. 16). It can group the sample data, (1) in FIG. 16, into different group with relevant analyte measurements, then assess the data collective according to the biological group as an barcode, as piece of image digital data, or finger print. Then run the analysis on these collective data among those biology groups, for example, a group of different pathways, assess the collective impact among different biology groups, or pathways. Such method construct a 3 dimensional analysis, like a 3-D clustering,

[0092] In yet another aspect, the invention generally relates to a process method for plug in the application based algorithm to run specific integrated analysis, for example to differentiate type sof cancers, estimate disease phases or stages, measure drug toxicity levels, shown as (5) in FIG. 16.

[0093] In yet another aspect, the invention generally relates to a process method for comparing the data online to analyze experiment data intelligently. For example, from public domain and gather a lot of analyte's data for different cancers, integrates data into the barcode or an integrated data set, then use it to train the computer program artificial function, further to apply such intelligence to classify the experiment data. [0094] In this specification and the appended claims, the singular forms "a," "an," and "the" include plural reference, unless the context clearly dictates otherwise.

[0095] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, the preferred methods and materials are now described. Methods recited herein may be carried out in any order that is logically possible, in addition to a particular order disclosed.

Incorporation by Reference

[0096] References and citations to other documents, such as patents, patent applications, patent publications, journals, books, papers, web contents, have been made in this disclosure. All such documents are hereby incorporated herein by reference in their entirety for all purposes. Any material, or portion thereof, that is said to be incorporated by reference herein, but which conflicts with existing definitions, statements, or other disclosure material explicitly set forth herein is only incorporated to the extent that no conflict arises between that incorporated material and the present disclosure material. In the event of a conflict, the conflict is to be resolved in favor of the present disclosure as the preferred disclosure.

Equivalents

[0097] The representative examples are intended to help illustrate the invention, and are not intended to, nor should they be construed to, limit the scope of the invention. Indeed, various modifications of the invention and many further embodiments thereof, in addition to those shown and described herein, will become apparent to those skilled in the art from the full contents of this document, including the examples and the references to the scientific and patent literature included herein. The examples contain important additional information, exemplification and guidance that can be adapted to the practice of this invention in its various embodiments and equivalents thereof.

What is claimed is:

Claims

1. A system for bioassay data analysis comprising dynamically linking and integrating functions of search, retrieval, organization, recording, analysis, classification, annotation, display, recording and storage and of biological, medical and healthcare data, wherein a user of the system has desktop or a computing device access via personal or instrument computer to comprehensive and contextual biological information and analytical programs pertaining to an analysis via online-based functionalities comprising one or more of data retrieval, comparison, clustering, display, artificial intelligence analysis, and annotation.

2. The system of Claim 1, wherein the user's desktop or a computing device, or instrument computer is dynamically linked to and has direct access to the content of one or more remote databases.

3. The system of Claim 2, wherein the user's desktop or a computing device, or instrument computer dynamically retrieves content of one or more remote databases.

4. The system of Claim 3, wherein the user's desktop or a computing device, or instrument computer dynamically integrate local data content and content of one or more remote databases.

5. The system of Claim 1, wherein the local data content and remote data content are packaged to reflect biological properties, functions and profiles of analytes.

6. The system of Claim 1, wherein the local data content and remote data content are presented in tabular, map, color or grey barcodes reflecting biological properties, functions and profiles of analytes.

7. The system of Claim 6, wherein the local data content and remote data content are presented using color and/or gray barcodes.

8. The system of Claim 1, wherein the local data content and remote data content are analyzed using a 3-D image analysis tool.

9. The system of Claim 1, wherein the local data content and remote data content are analyzed using an artificial intelligent tool.

10. The system of Claim 1, wherein the bioassay data are analyzed using current pathway

information dynamically obtained from remote database content.

11. A computer linked to an instrument control having post analysis software and through a web server connection to a plurality of remote information resources and databases on genome and biology via the Internet.

12. A method for integrating experimental data of multiple analytes from different technology and platform using gene symbols or IDs for advanced data analysis comprising clustering, classification and statics analysis.

13. A method for using gene symbols or IDs to group experimental data into pathways, diseases, drugs, SNPs and sequences and/or further biology groups using gene symbols or IDs as a factor in grouping the experimental data.

14. A method for presenting experimental data from a bioassay sample into a barcode like image, wherein each bar represents one analyte.

15. The method of Claim 14, wherein the barcode represents a specific pathway, disease, drug, SNP and sequence with defined order.

16. The method of Claim 15, wherein the barcodes are shaded, gray or colored, and/or having scoring colors representing the analyte experiment value, comprising bioassay experimental expression, bio content concentration, and gene regulation.

17. A method of data reduction and integrated analysis comprising applying clustering to biology grouped barcode data among multiple biology groups to perform 3D clustering analysis.

18. The method of Claim 15, comprising using known data remotely obtained from online

resources and databases to run classification to train the computer for artificial intelligent algorithm or rule set.

19. A method for classifying experimental data by the algorithm obtained from training

according to Claim 18.

20. A method for parsing and constructing a disease database for known human diseases as

provided in OMIM and other public disease database , comprising:

retrieving disease information from OMIM and other disease information data sources via an automated process;

defining a database schema or structure for storing disease information;

parsing and loading disease information data files into the above database schema; and

performing corresponding indexing.

21. A method for parsing and constructing a drug database for known human drugs as provided in DrugBank and other drug database, comprising:

retrieving drug information from DrugBank and other drug information data sources via an automated process;

defining a database schema or structure for storing drug information; parsing and loading drug information data files into the above database schema; and

performing corresponding indexing.

22. A method for parsing and constructing a SNP and sequence database as provided in dbSNP database and other sequence database, comprising:

retrieving SNP and sequence information from data sources via an automated process;

defining a database schema or structure for storing SNP and sequence information; parsing and loading SNP and sequence data files into the above database schema; and

performing corresponding indexing.

23. A method for user's computing device or instrument computer submitting a list of genes from a client program to a web server and receiving the corresponding search results, comprising:

packing the list of genes in a defined format;

sending the list of genes to a web server using a defined protocol; performing a comprehensive searches;

receiving the search results using a defined protocol; and

unpacking the search results.

24. A method for a web app server of performing searches of biomedical knowledge for a list of genes on the web server, comprising:

unpacking the list of genes;

calling corresponding web service to perform the specific search for each gene; processing, combining and packing the search results; and

sending the search results back to the client program.

25. A method for searching a database for a given gene via web service and returning the results in a defined format.

26. The method of Claim 8, wherein the database is a pathway database, a disease database, a drug database, a SNP and sequence database, a human genome database, or a GO (gene ontology) information database.

27. A method for user interfacing (UI) for a summary view of multiple analytes in the client computer program, wherein for each analyte, a list of a user defined number of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs and sequence is made in a tabular form.

28. A method for user interfacing for a view of single analyte in the client computer program, wherein a list of a user defined number of corresponding gene symbols, biological pathways, biological functions, biological processes, cellular components, human diseases, human drugs and human SNPs of the analyte is made in a tabular form.

29. A method for user interfacing for a pathway map of multiple analytes, wherein in the

pathway map, all nodes with the genes in the list are marked by colors that match the experimentally determined bioassay results.

30. A method for user interfacing for a pathway view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing pathways, and wherein the UI shows a check mark for a cell if the corresponding analyte belongs to the corresponding pathway and wherein clicking the pathway column brings up the corresponding pathway map with highlight of gene symbols.

31. A method for user interfacing for a drug view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing drugs, and wherein the UI shows a check mark for a cell if the corresponding analyte is the target of the corresponding drug and wherein clicking the drug column brings up the corresponding DrugBank web page of the drug.

32. A method for use interfacing for a disease view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing diseases, and wherein the UI shows a check mark for a cell if the corresponding analyte is involved in the corresponding disease and wherein clicking the disease column brings up the corresponding disease database web page.

33. A method for user interfacing for a SNP view of multiple analytes, wherein the UI is a table with rows representing analytes and columns representing SNPs, and wherein the UI shows a check mark for a cell if the corresponding analyte is next to the SNP on human genome and wherein clicking the SNP column brings up the corresponding genome web page.