CN1886658A - Systems and methods for detecting biological features - Google Patents

Systems and methods for detecting biological features Download PDF

Info

Publication number
CN1886658A
CN1886658A CN 200480034992 CN200480034992A CN1886658A CN 1886658 A CN1886658 A CN 1886658A CN 200480034992 CN200480034992 CN 200480034992 CN 200480034992 A CN200480034992 A CN 200480034992A CN 1886658 A CN1886658 A CN 1886658A
Authority
CN
China
Prior art keywords
model
test organism
score
cell
cell component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200480034992
Other languages
Chinese (zh)
Inventor
格兰达·G·安德森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PATHWORK INFORMATICS Inc
Original Assignee
PATHWORK INFORMATICS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PATHWORK INFORMATICS Inc filed Critical PATHWORK INFORMATICS Inc
Publication of CN1886658A publication Critical patent/CN1886658A/en
Pending legal-status Critical Current

Links

Images

Abstract

A computer having a memory stores instructions for receiving data. The data comprises one or more characteristics for each cellular constituent in a plurality of cellular constituents that have been measured in a test organism of a species or a test biological specimen from an organism of the species. The memory further stores instructions for computing a model in a plurality of models, wherein the model is characterized by a model score that represents the likelihood of a biological feature in the test organism or the test biological specimen. Computation of the model comprises determining the model score using one or more characteristics for one or more cellular constituents in the plurality of cellular constituents. The memory also stores instructions for repeating the instructions for computing one or more times, thereby computing the plurality of models. The memory also stores instructions for communicating computed model scores.

Description

The system and method that is used for the detection of biological feature
The cross reference of related application
According to the regulation of 35U.S.C. § 119 (e), the application requires the right of priority of the U.S. Provisional Patent Application submitted on June 5th, 2004 number 60/577,416, and this application is included in this paper as a reference in full.According to the regulation of 35U.S.C. § 119 (e), the application also requires the right of priority of the U.S. Provisional Patent Application submitted on September 29th, 2003 number 60/507,381, and this application is included in this paper as a reference in full.According to the regulation of 35U.S.C. § 119 (e), the application also requires the right of priority of the U.S. Provisional Patent Application submitted on September 29th, 2003 number 60/507,445, and this application is included in this paper as a reference in full.The application is the part continuation application of the Application No. 10/861,216 of submission on June 4th, 2004, and this application is included in this paper as a reference in full.The present invention still is the part continuation application of the Application No. 10/861,177 submitted on June 4th, 2004, and this application is included in this paper as a reference in full.
1. invention field
The present invention relates to the biological property in the characterization of biological sample such as the system and method for disease.
2. background of invention
The first step of rational therapy disease is to come evaluate patient according to the classification of disease, and its result is used for determining that the patient suffers from the disease of which kind of type and is used for predicting the reaction of patient to various therapies.The validity of this method depends on the quality of classification.At least for cancer, the appearance that is used for analyzing the microarray method of DNA, the RNA of tumour cell or protein begins refinement and has improved the classification of cancer cell.Referring to, for example, Golub etc., 1999, Science286, p.531.
In addition, van ' t Veer etc., 2002, Nature 415, p.530 illustrated this " molecular number type analysis " be how to improve cancer classification.Demonstrations such as Van ' t Veer, the result of the tumor of breast gene expression preface type analysis that carries out after the surgical resection tumor of breast can be used to predict that clinical metastasis (tumour expands to other position and develops into secondary tumors at this place) will take place which patient.Methods of treatment to individual patient with breast cancer is selected according to various standards, and for example whether tumour degree of expansion (comprise and determine the tumour size), cancer cell have expanded to auxiliary lymph node and had how many lymph nodes to be subjected to attacking and whether existing remote clinical metastasis.In do not have shifting the women of evidence, to treat this sick main method be tumor resection and carry out radiotherapy.Unfortunately, the some of them patient can develop into clinical metastasis afterwards.Therefore, need to determine those women, they need the treatment of further (" assisting ") with the micro-cancer cell that diffusion may take place from primary tumor that deposits after operation.Referring to, for example, Caldas and Aparicio, 2002, Nature 415, p.484; With Goldhirsch etc., 1998, J.Natl.Cancer Inst.90, p.1601.
Complementary therapy is used pharmaceutical agents, as estrogen modulators or can arrive the cytotoxic drug of cancer cell by blood flow.The common toxic side effect of this treatment.Identify the women that may need this treatment and depend on various clinical and histopathology indexs (for example, whether patient's age, cancer cell degree, ' tumour rank ' and the cancer cell similar to its normal homologue express estrogen receptor) usually.Yet even if take all factors into consideration, the foresight of these indexs still is very poor.Therefore, for saving huge amount but the very little life of ratio, majority can continue to accept unnecessary and poisonous supplemental treatment by the patient of operation and radiotherapy in the treatment.
Van ' t Veer etc. (2002, Nature 415, p.530) result of study and other result of study begin to be used for sorting technique, and this method attempts patient's biological samples (as tumour) is divided into a plurality of biological sample kinds (as the breast cancer that needs complementary therapy and the breast cancer that does not need complementary therapy).Many clinical testings of being subsidized by AvonFoundation, Millennium Pharmaceuticals, European tumor research and treated tissue (EuropeanOrganization for Research and Treatment of Cancer) and state-run institute of oncology companies such as (National Cancer Institute) and tissue have been found that and have approved this sorting technique.Referring to, for example, Branca, 2003, Science 300, p.238.
For breast cancer, can use many biology sorting techniques.For example, Ramaswamy etc., 2003, p.49 Nature Genetics 33 provides primary gland cancer gene to be different from adenocarcinoma metastatic expression of gene figure.Su etc., 2001, p.7388, Cancer Research 61 has described with extensive RNA preface type analysis and the rote learning algorithm (supervised machine-learningalgorithm) that is subjected to prosecution and has made up first generation molecular classification method to identify prostate cancer, breast cancer, lung cancer, oophoroma, colorectal cancer, kidney, pancreas cancer, carcinoma of urinary bladder/carcinoma of ureter and the stomach cancer of the esophagus.The molecular classification method of Su etc. is used to the undetermined metastatic cancer in source of diagnosing primary tumour.Wilson etc., 2002, American Journal of Pathology 161 provides the HER2/neu positive expression characteristic figure that organize relevant with positive (node-positive) patient with breast cancer's of joint low survival rate.Richer etc., 2002, p.5209 The Journal of Biological Chemistry 277, provides the hereditary feature of the human breast cancer cell of the hereditary feature of human breast cancer cell of overexpression PgR-A (PR-A) and overexpression PgR-B (PR-B).As described in (2002) such as Richer, one or another kind of PR obform body is excessive may to cause the prognostic of tumour and the tumour that the hormone response feature is different from two kinds of PR obform body levels of mole such as having.Gruvberger etc., 2001, p.5979 Cancer Research 61, provides a kind of molecular classification based on the dna microarray data, and it can distinguish tumour according to the estrogen receptor state.
Above-mentioned biology sorting technique only is the example of many existing breast cancer biology sorting techniques.In addition, breast cancer has only been represented a kind of in many interested biology classification.Other representational biology is sorted in the diagnosis that broadly comprises cancer, even more broadly comprises the diagnosis of disease.A problem of every kind of method existence is in above-mentioned these biology sorting techniques: they all need special input (for example, formative microarray data).Therefore, every kind of input and output that the biology sorting technique is special of essential employing when characterising biological is learned sample.Because this obstacle, the medical nursing professional only uses limited several in these biology sorting techniques usually at most.
Therefore, consider above-mentioned reason, what this area needed is possible cause the biology sorting technique to be used for dividing improving one's methods into category with sample.
The reference that the present invention discussed or quoted can not be interpreted as admitting that these references are prior aries of the present invention.
3. summary of the invention
First embodiment of the present invention provides the computing machine with CPU (central processing unit) and the storer that is connected with this CPU (central processing unit).This memory storage receives the instruction of data, and wherein these data comprise one or more features of every kind of cell component in the various kinds of cell composition, and this feature records in the test organism sample of the test organism of a certain species or this species biology.This storer also stores the instruction of calculating the model in a plurality of models, and wherein to keep the score with model be feature to this model, and this model mark is represented the existence or the deletion condition of a kind of biological property in described test organism or the test organism sample.Calculating this model comprises with one or more features of one or more cell components in the various kinds of cell composition and determines that model keeps the score.Thereby this storer also comprises and repeats the instruction that described computations one or many calculates a plurality of models.This storer also stores the instruction that makes each model that calculates keep the score and communicate.
In some embodiments, two or more models are kept the score and are communicated by described communication instruction, and these two or more models each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.In some embodiments, five or more a plurality of model are kept the score and are communicated by described communication instruction, and wherein said five or more a plurality of model each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
In some embodiments, the instruction of described reception data comprises the instruction that receives described data by wide area networks such as the Internets from remote computer.In some embodiments, described communication instruction comprises by wide area networks such as the Internets each described model being kept the score and is sent to the instruction of remote computer.
In some embodiments, when described model was kept the score first scope at score value, described test organism or test organism sample were considered to have the biological property of the model representative in a plurality of models; And when described model was kept the score second scope at score value, described test organism or test organism sample were considered to not have the biological property of this model representative.In some embodiments, described biological property is a disease, as cancer (for example breast cancer, lung cancer, prostate cancer, colorectal cancer, oophoroma, carcinoma of urinary bladder, cancer of the stomach or the carcinoma of the rectum etc.).
In some embodiments, described a plurality of model comprises that keeping the score with first model is first model of feature and to keep the score with second model be second model of feature; And the identity that one or more features are used to calculate the cell component that described first model keeps the score is different from the identity that its one or more features are used to calculate the cell component that described second model keeps the score.
In some embodiments, be used for determining that feature in described one or more features of one or more cell components that the described model of the model in described a plurality of models is kept the score comprises the abundance of one or more cell components in the described test organism sample of the described test organism of described species or this species biology.In some instances, described species are people.In some instances, described test organism sample is biopsy or other form from the sample of tumour, blood, bone, mammary gland, lung, prostate, colorectum, ovary, bladder, stomach or rectum.
In some embodiments, described one or more feature comprises the cell component abundance, and described data comprise in the described biological samples of the described test organism of described species or this species biology at least 100, at least 500, at least 5,000 or 1,000-20, the cell component abundance of 000 kind of cell component.In some embodiments, the cell component in the described various kinds of cell composition is mRNA, cRNA or cDNA.
In some embodiments of the present invention, cell component in described one or more cell components is nucleic acid or RNA (ribonucleic acid), and the feature in one or more features of described cell component is to obtain by the transcriptional state of measuring all or part cell component in described test organism or the described test organism sample.In some embodiments, cell component in described one or more cell components is a protein, and the feature in one or more features of described cell component is to obtain by the translation state of measuring cell component described in described test organism or the described test organism sample.In some embodiments, the feature in one or more features of the cell component in the described various kinds of cell composition is to use sample available from test organism or test organism sample to carry out the analysis of cell component tandem mass spectrum then with the isotope affinity labeling to determine.In some embodiments, the feature in one or more features of the cell component in the described various kinds of cell composition is to determine by the sample of experiment with measuring biology or the activity or the posttranslational modification of the cell component in the test organism sample.
In some embodiments, described biological property is a drug susceptibility.In some embodiments, a plurality of models of keeping the score with described computations computation model are represented the existence or the deletion condition of two or more biological properties jointly.In some embodiments, each biological property in described two or more biological properties is the cancer source.In some embodiments, described two or more biological properties comprise first disease and second disease.
In some embodiments, a plurality of models of keeping the score with described computations computation model are represented the existence or the deletion condition of five or more a plurality of biological properties jointly.In some instances, the different cancer source of each representative in described five or the more a plurality of biological property.In some instances, described five or more a plurality of biological property comprise first disease and second disease.
In some embodiments, a plurality of models of keeping the score with described computations computation model are represented the existence or the deletion condition of 2-20 biological property jointly.In some embodiments, each biological property in the described 2-20 biological property is the cancer source.In some embodiments, a described 2-20 biological property comprises first disease and second disease.
Another aspect of the present invention comprises the computing machine with CPU (central processing unit) and the storer that is connected with this CPU (central processing unit).This memory storage receives the instruction of data.These data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the table test organism of one species or this species biology.This storer also stores the instruction of (ii) calculating a plurality of models.It is feature that each model in wherein said a plurality of model is kept the score with model, and this model is kept the score and represented the existence or the deletion condition of a kind of biological property in described test organism or the test organism sample.Calculating single model in described a plurality of model comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that the model relevant with this single model keep the score.This storer also stores the instruction that each model that described computations is calculated is kept the score and communicated.
Another aspect of the present invention comprises the computer program of uniting use with computer system.This computer program comprises computer-readable recording medium and embedding computer program mechanism wherein.This computer program mechanism comprises the instruction that receives data.These data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.This computer program mechanism also comprises the instruction of calculating the model in a plurality of models.It is feature that this model is kept the score with model, this model the keep the score existence or the deletion condition of the biological property in representative described test organism or the test organism sample, and calculate this model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that this model keeps the score.Thereby this computer program also comprises and repeats the instruction that described computations one or many calculates a plurality of models.Further, this computer program comprises the instruction that each model that described computations is calculated is kept the score and communicated.
Another aspect of the present invention provides a kind of and computer system to unite the computer program of use.This computer program comprises computer-readable recording medium and embedding computer program mechanism wherein.This computer program mechanism comprises the instruction that receives data.These data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.This computer program mechanism also comprises the instruction of calculating a plurality of models.It is feature that each model in these a plurality of models is kept the score with model, this model mark is represented the existence or the deletion condition of the biological property in described test organism or the test organism sample, and calculates single model in described a plurality of model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that the model relevant with this single model keep the score.This computer program mechanism also comprises the instruction that each model that described computations is calculated is kept the score and communicated.
Another aspect of the present invention comprises a kind of method that obtains data.These data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.This method also comprises the model that calculates in a plurality of models.It is feature that this model is kept the score with model, and this model mark is represented the existence or the deletion condition of the biological property in described test organism or the test organism sample.Calculating this model comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that this model keeps the score.This method comprises that also thereby repeating described computations one or many calculates described a plurality of model.This method also comprises the instruction that comprises that also each model that described computations is calculated is kept the score and communicated.
The reception data that comprise on the one hand more of the present invention.These data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.Calculate a plurality of models.It is feature that each model in described a plurality of model is kept the score with model, this model mark is represented the existence or the deletion condition of the biological property in described test organism or the test organism sample, and calculates single model in described a plurality of model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that the model relevant with this single model keep the score.Then, each model that described calculating is calculated is kept the score and is communicated.
The computing machine that provides on the one hand more of the present invention with CPU (central processing unit) and the storer that is connected with this CPU (central processing unit).This memory storage sends the instruction of data.These data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the table test organism of one species or this species biology.This storer also stores and receives the instruction that a plurality of models are kept the score.Each model is kept the score corresponding to the model in a plurality of models.It is feature that each model in described a plurality of model is kept the score with model, this model mark is represented the existence or the deletion condition of the biological property in described test organism or the test organism sample, and calculates this model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that this model keeps the score.
Another aspect of the present invention provides the computing machine with CPU (central processing unit) and the storer that is connected with this CPU (central processing unit).This memory storage receives the instruction of data, and wherein these data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.This storer also stores the instruction of calculating the model in a plurality of models.This computations produces the aspect of model of described model, and this aspect of model shows whether the test organism of described species or the test organism sample of this species biology are the members of a class biological sample.The instruction of calculating described model comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize this model.Thereby this storer also stores and repeats the instruction that described computations one or many calculates a plurality of models.This storer also stores the instruction that each aspect of model that described computations is calculated communicates.In some embodiments, the instruction of these reception data comprises the instruction that receives described data by wide area networks such as the Internets from remote computer.In some embodiments, described biological sample kind is a disease, as cancer.
Another aspect of the present invention provides the computing machine with CPU (central processing unit) and the storer that is connected with this CPU (central processing unit).This memory storage receives the instruction of data.These data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.This storer also stores the instruction of calculating a plurality of models.This calculates the aspect of model that produces each model in described a plurality of models, and this aspect of model shows whether the test organism of described species or the test organism sample of this species biology are the members of a class biological sample.This calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize each model in described a plurality of model.This storer also stores the instruction that each aspect of model that described computations is calculated communicates.
The computer program that provides on the one hand a kind of and computer system to unite use more of the present invention.This computer program comprises computer-readable recording medium and embedding computer program mechanism wherein.This computer program mechanism also comprises the instruction that receives data.These data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the table test organism of one species or this species biology.This computer program mechanism also comprises the instruction of calculating the model in a plurality of models.This calculating produces the aspect of model of this model, and this aspect of model shows whether the test organism of described species or the test organism sample of this species biology are the members of a class biological sample.Calculate this model and comprise that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize this model.Thereby this computer program mechanism also comprises and repeats the instruction that described computations one or many calculates a plurality of models.This computer program mechanism also comprises the instruction that each aspect of model that described computations is calculated communicates.
Of the present inventionly comprise on the one hand that more a kind of and computer system unite the computer program of use.This computer program comprises computer-readable recording medium and embedding computer program mechanism wherein.This computer program mechanism comprises the instruction that receives data.These data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the table test organism of one species or this species biology.This computer program mechanism also comprises the instruction of calculating a plurality of models.This calculates the aspect of model that produces each model in described a plurality of models, and this aspect of model shows whether the test organism of described species or the test organism sample of this species biology are the members of a class biological sample.This calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize each model in described a plurality of model.This computer program mechanism also comprises the instruction that each aspect of model that described computations is calculated communicates.
Another aspect of the present invention provides a kind of method, and this method comprises the reception data.This data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.Calculate the model in a plurality of models.This calculates the aspect of model that produces this model, and this aspect of model shows whether the test organism of described species or the test organism sample of this species biology are the members of a class biological sample.Calculate this model and comprise that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize described model.Thereby repeat described computations one or many and calculate described a plurality of model.Then, each aspect of model that described calculating is calculated communicates.
The reception data that comprise on the one hand more of the present invention.These data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology.Calculate a plurality of models.This calculating produces the aspect of model of each model in described a plurality of model, and this aspect of model shows whether the test organism of described species or the test organism sample of this species biology are the members of a class biological sample.This calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize each model in described a plurality of model.Each aspect of model that calculates is communicated.
4. accompanying drawing summary
Fig. 1 has shown the computer system that biological samples is classified in one embodiment of the invention.
Fig. 2 has shown the classify treatment step of sample of a plurality of sorters of the usefulness in one embodiment of the invention.
Fig. 3 has shown the data structure that stores a plurality of models (sorter) in one embodiment of the invention.
Identical reference number is represented corresponding part in these a few width of cloth accompanying drawings.
5. detailed Description Of The Invention
Fig. 1 has shown according to the operated system 10 of one embodiment of the invention.Fig. 3 has shown the data structure that is used for storing data used in the present invention.Fig. 2 has shown the treatment step that is used for detecting a plurality of models according to an embodiment of the invention.With treatment step shown in Figure 2, this model can determine whether sample has one or more biological properties.For disclosing advantages and features of the invention, will be in this part with reference to these accompanying drawings.Representational biological property is disclosed in 5.4 parts hereinafter.
System 10 comprises at least one computing machine 20 (Fig. 1).Computing machine 20 comprises standard package, the network interface unit 28 that comprises the storer 24, user's input/output device 26 of CPU (central processing unit) 22, stored routine module and data structure, other computing machine in computing machine 20 and the system 10 or other computing machine are connected by the communication network (not shown), and one or more interconnective bus 33 of these assemblies that makes.User's input/output device 26 comprises one or more user's I/O assemblies, as mouse 36, display 38 and keyboard 34.Computing machine 20 also comprises the disk 32 that is subjected to Magnetic Disk Controller 30 controls.Storer 24 and disk 32 store program module used in the present invention and data structure together.
Storer 24 comprises many modules used in the present invention and data structure.Should be understood that any moment, be stored in the module in the storer 24 and/or the part of data structure and be stored in the random access memory, and another part of this module and/or data structure is stored in the nonvolatile memory 32 in this system operation.In a typical embodiment, storer 24 comprises operating system 50.Operating system 50 comprises the program of handling various basic system services and carrying out the hardware-dependence task.Storer 24 also comprises the file system (not shown) that is used for file management.In some embodiments, this document system is the assembly of operating system 50.
Although described the overview of exemplary computer system as described in the present invention in detail, the overview of the example data structure of using by one embodiment of the invention is still listed in 5.1 parts hereinafter.Then, in 5.2 parts the detailed process step that adopts this example data structure to detect a plurality of models has been described.The result's who obtains with the present invention example is provided in 5.3 parts.
5.1. example data structure
The example data structure that is used for one embodiment of the invention is shown in Fig. 1.Model detects application software 52 utilization and operation temporal databases 120.Working time, database 120 was molded, thus make it comprise working time analytical model 300 and working time pattern 200.These pattern descriptions the structure of the many dissimilar table in working time database 120.In preferred embodiments, database 120 is any type of data storage devices, includes but not limited to flat file, relational database (SQL) and olap database (MDX and/or its version).In some specific embodiments, database 120 is classification OLAP cubic blocks.In some specific embodiments, database 120 does not comprise as the cubic block storage, but has the star pattern of the dimension table that limits level.In addition, in some embodiments, database 120 has level, and this level is not obviously taken (for example the not classification of dimension table is arranged) in basic database (underlying database) or database schema.In some embodiments, database 120 is Oracle, MS Access 95/97/2000 or the database of forms such as highest version, Informix, Sybase, Interbase, IBM DB2, Paradox, dBase, SQLAnywhere, Ingres, MsSQL, MS SQL server, ANSI Level 2 or PostgreSQL more.In some embodiments, working time database 120 comprise working time pattern 200 and working time analytical model 300.
Working time, a kind of table type of basis of pattern 200 regulation was a model 202.The target of model 202 is to attempt to determine that biological samples (for example tumour) has the possibility of biological property (for example, breast cancer, lung cancer etc.).So, each model 202 is all related with a kind of biological property.In the present invention, biological property is any differentiable phenotype that one or more biological samples show.For example, in a kind of application of the present invention, each biological property is meant source or primary tumor type.According to estimates, nearly 4% cancer patient has metastatic tumo(u)r, and the source of its primary tumor can not get determining.Referring to, for example, Hillen, 200, Postgrad. Med.J.76, p.690.Sometimes, even if after pathological analysis, still do not know former position of metastatic tumo(u)r.Therefore, the primary source tumor locus of predicting these cancer some of them is an important clinical target.Tumour for unknown primary source, typical biological sample kind comprises prostate cancer, breast cancer, colorectal cancer, lung cancer (gland cancer and squamous cell carcinoma), liver cancer, the stomach cancer of the esophagus, cancer of pancreas, oophoroma, kidney and bladder/carcinoma of ureter, in the U.S., they account for about 70% of all cancer associated deaths altogether.Referring to, for example, Greenlee etc., 2001, CA Cancer J.Clin.51, p.15.5.4 parts have hereinafter been described other example of biological sample kind of the present invention.
How to determine with model 202 whether biological samples is the member's of a class biological sample possibility in order to set forth, on behalf of biological sample, consideration particular model 202 have the situation of the possibility of lung cancer.Further, supposing that this lung cancer model is used for biological samples and testing result, to show that this biological samples has the possibility of lung cancer very high.In some embodiments, each model 202 in the database 120 working time comprises unique model identifier 110 of distinguishing each model.In addition, each model 202 regulation one or many calculates 204 (being also referred to as detection).In some embodiments, model 202 regulation 2-1,000 calculating.In a more preferred embodiment, each model 202 regulation is calculated for 3-500 time, 3-100 calculating, or calculate for 3-50 time.
The identity of certain cell component is stipulated in each calculating 204 of model 202.For example, in one case, each single calculating 204 regulation first cell component and second cell components.For instance, consider to comprise in the model 202 situation of four calculating 204, as described in Table 1:
Table 1: example calculation 204
Calculate numbering First cell component Second cell component
1 2 3 4 Gene A AA gene C CC gene NNN gene XXX Gene DDD gene DDD gene M MM gene YYY
Therefore, calculate the 1 regulation first cell component AAA and the second cell component DDD, the rest may be inferred.
Except regulation calculates 204, each model 202 also is given for each computational algorithm 212 of 204 that calculates in the model.When the calculating in the model 202 204 is calculated, the operative relationship between the abundance value of computational algorithm 212 regulation cell components.The abundance value of cell component is taken from the biological samples of classifying with model 202.
An example of computational algorithm 212 is a ratio, and the molecule of this ratio is determined by the abundance of first cell component in the biological samples, and the denominator of this ratio is determined by the abundance of second cell component in the biological samples.In this case, when calculating 204 according to computational algorithm 212, computational algorithm 212 regulations are got the ratio of two kinds of cell component abundance values, and calculate the actual identity of the used test organism sample cell component of 204 regulations.For example, the abundance that a kind of computational algorithm 212 regulations are got first cell component is as molecule, and the abundance of getting second cell component is as denominator.This computational algorithm 212 is used to the each calculating 204 in the exemplary model 202.In the calculating 1 of table 1, the ratio that example calculation algorithm 212 regulations are got gene A AA and gene DDD is being calculated in 2, the ratio that these computational algorithm 212 regulations are got gene C CC and gene DDD, and the rest may be inferred.
Except that the ratio of first cell component and second cell component, the present invention includes multiple computational algorithm 212.For example, in some embodiments, computational algorithm 212 can be stipulated the abundance of first cell component abundance value with second cell component on duty (A * B).In fact, but computational algorithm 212 regulations multiply by the product of the abundance value of first two cell component the abundance value (A * B * C) of the third cell component.Perhaps, computational algorithm 212 can be stipulated the product of the abundance value of first two cell component divided by the abundance value of the 3rd cell component [(A * B)/C]].As described in these embodiment, computational algorithm is any mathematical operation or the combination of mathematical operation (for example multiplication and division, logarithm etc.) to the combination of arbitrary cell composition.Computational algorithm 212 does not show the actual identity of the cell component that is used for calculating any given calculating 204.On the other hand, calculating 204 has stipulated one group of cell component but has not shown operation relation between the cell component that is used for calculating this calculating 204.Computational algorithm 212 is used to calculate 204, can calculates calculating 204 according to method of the present invention.
In some embodiments, each single calculating 204 comprises model identifier 110, this identifier specifies the model 202 under the described calculating.In addition, each calculating comprises threshold value 114.For example, in some embodiments, calculate 204 at every turn and comprise low threshold value and high threshold.In this embodiment, be used for each calculating 204 that aforementioned calculation has calculated model 202 by computational algorithm 212 with model 202.This is calculated as negative when the calculating 204 that calculates is lower than threshold value.This just is calculated as when the calculating 204 that calculates is higher than high threshold.This calculating is characterized by uncertain when the calculating 204 that calculates is between low threshold value and high threshold. are in order to understand the more detailed embodiment of the more information that how to calculate this threshold value and model and their application in the present invention; Can be referring to the common unsettled U.S. Patent Application Serial Number 60/507,381 that is entitled as " system and method that is used for the analyzing gene expression data of clinical diagnosis " (Systems and Methods forAnalyzing Gene Expression Date For Clinical Diagnostics) of Anderson and the sequence number U.S. Patent application to be examined that is entitled as " system and method that is used for the analyzing gene expression data of clinical diagnosis " (Systems andMethods for Analyzing Gene Expression Date For Clinical Diagnostics) of the Moraleda that submitted on June 4th, 2004 and Anderson.
For setting forth the calculating (detection) of using high and low threshold value, consider that table 1 calculates 1 situation, wherein the abundance of gene A AA ([AAA]) is 1,000 in the biological samples, the abundance of DDD ([DDD]) is 100.In addition, calculating the low threshold value of 1 regulation is 0.8, and high threshold is 5.The computational algorithm 212 of model 202 comprises calculating 1, and it indicates the ratio of getting first gene and second gene.When this computational algorithm 212 was used to calculate 204, the value of the calculating that calculates (ratio of [AAA]/[DDD]) was 10 (1,000/100).Because this ratio is characterized by " just " greater than the high threshold of ratio so calculate 204.
In another embodiment, the value of [AAA] is 70 in the biological samples, and the value of [DDD] is 100 in the biological samples.In addition, calculating the low threshold value of 1 regulation is 0.8 and high threshold is 5.At this moment, the ratio of [AAA]/[DDD] is 0.7 (70/100).Because this ratio is lower than low threshold value, be characterized by " bearing " so calculate.
In another embodiment, the value of [AAA] is 120 in the biological samples, and the value of [DDD] is 100 in the biological samples.In addition, calculating the low threshold value of 1 regulation is 0.8 and high threshold is 5.At this moment, the ratio of [AAA]/[DDD] is 1.2 (120/100).Because this ratio greater than low threshold value but less than high threshold, is characterized by " uncertain " so calculate.
Except computational algorithm 212, each model 202 comprises how regulation makes up the calculating 204 of given model 202 to characterize the set algorithm 214 of (calculating) this model.An example of set algorithm 214 is ballot method (voting scheme), and wherein, just the back is negative earlier when calculating if the majority in the model calculates, and then model 202 is characterized by and has high probability or possibility.For example, consider computational algorithm 212 is used for the situation of the calculating of table 1, calculate 1 and 2 for just, it is uncertain calculating 3, is to bear and calculate 4.As result when being such, the biology of using the model be made up of the calculating of table 1 to detect might have the biological property relevant with this model with being characterized as being.
Each model 202 optional model precondition 116 that comprise.Model precondition 116 is defined in computational algorithm 212 needing before the calculating 204 of model to be used for the requirement that is satisfied.An example of model precondition 116 is the calculating 204 that requires another pre-determined model 202 of calculating before the calculating 204 of calculating the model 202 relevant with precondition 116.For example, consider that model 202 is a lung cancer and alternate model 202 is situations of adenocarcinoma of lung.Lung cancer model is used to determine whether specific tumors is the lung cancer positive.At this moment, adenocarcinoma of lung model 202 can have the precondition 116 of requirement operation lung cancer model before operation adenocarcinoma of lung model.Precondition 116 also can require before operation adenocarcinoma of lung model the lung cancer detection model for just.
Except that the table type of model 202, working time pattern 200 regulation level forms other table.The top of this level is a Program Type 220.Each Program Type 220 regulation computational algorithms 212 and set algorithm 214.And, each Program Type 220 optional program identifier 221 that comprise.
One or more models 202 can be related with Program Type 220.When model 202 and Program Type 220 were related, this model used by the computational algorithm 212 of these Program Type 220 regulations and set algorithm 214.In one embodiment, model 202 comprises the program identifier 221 of the employed program 220 of this model.Among this embodiment, model 202 does not need to comprise computational algorithm 212 that will be used by this model and the clear and definite information of gathering algorithm 214, and this is because these information can obtain from the program 220 by program identifier territory 221 appointments the model 202.
With top discussion, each model 202 comprises that one or many calculates 204 as shown in Figure 1.In fact, in some embodiments, calculate 204 at every turn and be stored in other form of the table of discovery in the pattern 200 working time.The abundance value (not shown) of one or more cell components is stipulated in each calculating 204.In addition, calculate 204 at every turn can choose the model identifier 110 that comprises indication and calculating 204 related models 202 wantonly.For example, model identifier 110 can be indicated the calculating 204-1 related with model 202-1.In addition, calculate 204 at every turn can have compute identifiers 112 and threshold value 114.When each calculating 204 comprised model identifier 110, working time, database 120 model 202 did not need clearly to describe the calculating 204 as this model part.The calculating 204 of given if desired model 202 can be identified them by seek the calculating that has with the model identifier 110 of given Model Matching in database 120 calculating 204 working time.
With top discussion, each model 202 comprises one or more model preconditions 224 as shown in Figure 1.In fact, each model precondition 224 is another kind of forms of the data result found in the pattern 200 in working time.Each precondition 224 is defined in the operation model related with this precondition needs the precondition 116 that satisfies before.In addition, each model precondition 224 can be chosen the model identifier 110 of the model 202 that comprises that indication is related with this precondition wantonly.For example, model identifier 110 can be indicated the precondition 224-1 related with model 202-1.When each precondition 224 comprised model identifier 110, working time, database 120 model 202 did not need clearly to describe the precondition 224 as this model part.At this moment, be the precondition 224 that is identified for given model 202, can seek the precondition that has with the model identifier 110 of given Model Matching in database 120 the precondition in working time.
5.2. exemplary process
Introduced the example data structure of one embodiment of the invention in 5.1 parts.This part has described how this new data structure is used for detecting a plurality of models 202.In 5.3 parts this result calculated will be described.
Step 402
In step 402, obtained the cell component characteristic.Usually, the form of cell component characteristic is a cell component abundance data file, and it is submitted to by long-range clinician.In some instances, when submitting data file to, computing machine 20 receives this document by network interface unit 28.In typical embodiment, remote computer arrives computing machine 20 by the wide area network (WAN) of the Internet and so on data transfer.
The cell component characteristic data file generally includes the many aspects (being also referred to as proterties) of the biological aspect of every kind of cell component in the various kinds of cell composition.For example, in some embodiments, the cell component tag file comprises the abundance data of some cell components of given biological samples or biosome.This cell component abundance data file can comprise the data of cell component more than 100 kinds of given biological samples.In fact, this cell component abundance data file can comprise more than 500 kinds of given biological samples, the data of cell component more than 1,000 kind, more than 10,000 kinds or more than 15,000 kinds.In some embodiments, this cell component abundance data file comprises the data of a plurality of biological samples.In these embodiments, which biological samples is this data file clearly indicated relevant with the abundance level of every kind of cell component in the file.
In some embodiments, the format setting of cell component characteristic data file is: Affymetrix (Santa Clara, California) GeneChip probe array (is for example used AffymetrixMAS4.The Affymetrix chip file that 0 software and U95A or U133 genetic chip generate) with CHP extension, Agilent (Palo Alto, California) dna microarray, Amersham (LittleChalfont, England) CodeLink microarray, Imaging Research (St.Catharines, Canada) array Vision file layout, Axon (Union City, California) GenePix file layout, BioDiscovery (Marina del Rey, California) ImaGene file layout, Rosetta (Kirkland, Washington) language (GEML) file layout is formed in gene expression, Incyte (Palo Alto, California) GEM microarray or Molecular Dynamics (Sunnyvale, California) cDNA microarray.
In some embodiments, above-mentioned cell component tag file comprises the microarray image of treated biological samples.For example, in a this embodiment, this document comprises the relevant annotation information of the cell component abundance information of the every kind of cell component that occurs on the array, optional background signal information and the optional employed probe of the described single cell composition of description.In some embodiments, the measurement of cell component abundance is the hereinafter measurement of the described transcriptional state of 5.5 parts.
In some embodiments of the present invention, the aspect of the biological aspect except that transcriptional state (proterties) for example translates state, activated state, or the mixing aspect of biological aspect shows in the above-mentioned cell component tag file.Referring to, 5.6 parts hereinafter for example.For example, in some embodiments, this cell component tag file comprises the protein level of range protein in the biological samples of being studied.In some specific embodiments, this cell component tag file comprises the activity level of the cell component in the amount of the cell component in the tissue of the biological samples of being studied or concentration, one or more biological samples tissues or modification (for example phosphorylation) state of one or more cell components of biological samples.
In one aspect of the invention, the expression of gene level of biological samples is to determine by the amount of measuring the corresponding at least a cell component of gene in the one or more cells with the biological samples of being studied.In one embodiment, the amount of measured at least a cell component comprises the abundance of at least a RNA kind that exists in one or more cells of this biological samples.This wealth of species can make the contact of genetic transcription thing array measure from the method for its cDNA that derives from the RNA or the contact of one or more cells of this biology by comprising.Genetic transcription thing array comprises and contains the nucleic acid that adheres to or the surface of nucleic acid mimics.This nucleic acid or nucleic acid mimics can be hybridized with described RNA kind or with the cDNA that derives from this RNA kind.In one embodiment, the abundance of RNA is by the contact of genetic transcription thing array is measured derived from the nucleic acid of this RNA from the RNA or the contact of one or more cells of the biology of being studied, thereby described genetic transcription thing array comprises the addressable by position nucleic acid that adheres to or the surface of nucleic acid mimics contained, wherein said nucleic acid or nucleic acid mimics can with described RNA kind or with the nucleic acid hybridization of deriving from this RNA kind.
In some embodiments, above-mentioned cell component tag file comprises the gene expression data (or with the corresponding cell component of a plurality of genes) of a plurality of genes.In one embodiment, described a plurality of gene comprises at least 5 genes.In another embodiment, described a plurality of genes comprise at least 100 genes, at least 1,000 gene, at least 20,000 gene or the gene more than 30,000.In some embodiments, described a plurality of genes comprise 5,000-20,000 gene.
In some examples of step 402, the abundance data are by preprocessing.In some embodiments, this preprocessing comprises standardization, wherein with all cells composition characteristics value of given biological samples divided by the cell component abundance intermediate value measured to this biological samples.In some embodiments, with given biological samples or biological all cells composition abundance value divided by 25% and 75% mean value to the measured cell component abundance value of this biological samples.
When the source of cell component abundance measurement is microarray, the cell component abundance value that can obtain to bear during greater than the best match probe when unmatched probe measurement.When oligogene (representing a kind of cell component) usually can this thing happens during with low expression level.Under some typical situations, there is 30% abundance value to bear in certain given cell component abundance file.In more pretreated examples of the present invention, all are equal to or less than the value that 0 cell component abundance value is fixed and replace.When the source of cell component abundance measurement was Affymetrix GeneChip MAS 5.0, in certain embodiments, available fixed value replaced negative cell component abundance value as 20 or 100.Situation more generally is, in some embodiments, all are equal to or less than the intermediate value that 0 cell component abundance is worth cell component abundance in the given biological samples, and promptly the fixed value between the 0.001-0.5 (for example 0.1 or 0.01) replaces.In some embodiments, all cell component abundance values are substituted by the variation of this value, and described variation is inversely proportional to the absolute value of replaced cell component abundance value between intermediate value and 0.In some embodiments, all are replaced based on the determined value of the function of its original negative value size less than 0 cell component abundance value.In some instances, this function is a sigmoid function.
In some embodiments, the webpage on the computing machine 20 or computing machine 20 addressable webpages have promoted step 402.This webpage makes long-range user can select to move which model and promotes the cell component data file to be delivered on the computing machine 20 from remote site.In some embodiments, this webpage can transmit any following information:
Require to calculate the breadboard address of one or more models;
The identity of one or more models (external member) of application cell composition characteristics data file operation;
Distinguish unique sample identifier of the sample of submission;
Distinguish the identifier of the microarray form that is used for measuring the cell component characteristic;
Distinguish the patient's who represents with the cell component characteristic data file identifier;
Form the description of the biological samples of cell component tag file to therefrom obtaining the cell component characteristic; And/or
The doctor of requirement moving model on biological samples or other health care personnel's identity.
In some embodiments, be not to use interface based on webpage, perhaps except using interface based on webpage, operating software module (not shown) on far-end is posted a letter computing machine.This software module make file-sharing technology that long-range doctor can use file transfer protocol (FTP), IP(Internet Protocol) or other type with essential data upload in computing machine 20.In some embodiments, all communications between the computing machine 20 are encrypted with cryptographic algorithm known in the art such as secret key cryptography, hash, message digest and/or public key algorithm in the step 402 (and step 424).This technology is disclosed in, for example, and Kaufman, Network Security, 1995, Prentice-Hall, New Jersey; And Schneier, Applied Cryptography:Protocols, Algorithms, and Source Code in C, second edition, John-Wiley ﹠amp; Sons, Inc., they are included in this paper as a reference respectively in full.
Step 404 and 406
In step 404, make the decision that move (calculating) which model 202.For example, in some cases, the models 202 in the working time database 120 are divided into some set of model.In one embodiment, there is a cover to be used for detecting the model of unknown primary cancer and model that another set of specialized designs is come detection of lung cancer etc.Every suit model 202 comprises one or more models.Therefore, in some instances, step 404 comprises determines which set of model 202 is user's queries.In step 406, select the model of a series of models of selecting in the comfortable step 404.
Step 408
Step 408 is chosen wantonly.In some embodiments, operating procedure 408 and operate in all models (for example, all models in the selected external member) of step 402 medium-long range user appointment not.In optional step 408, whether the model 202 that needs to select in the determining step 406 has satisfied model precondition 116.For example, in some embodiments, model precondition 116 can be stipulated moving model 202, the indication that this model 202 is broad biological sample kinds (for example more common phenotype), and what do not stipulate must be before certain model 202 to select in last a kind of situation of operating procedure 406 is the model of the indication of narrower biological sample kind.For instance, can require usually second model 202 as the lung cancer indication before operation first model, to detect to just as the model precondition 116 of first model 202 of the indication of specific lung cancer form.In addition, second model 202 can comprise model precondition 116, detects before operation second model to just to require the 3rd model as cancer indication.In some embodiments, model precondition 116 comprise another model in a plurality of models of requirement before detecting selected model, be accredited as negative, for just or uncertain.It hereinafter is some other example of how using precondition 116 hierarchal arrangement models 202.
In first example, the precondition of Model B requirement model A before moving model B has specific result.Probably, model A moves but does not obtain the desired particular result of Model B.In this case, Model B is not moved.Yet, if model A operation back produces the particular result Model B operation that Model B requires.This example can be expressed as:
(if A=result), then B can move.
In another example, precondition 116 requirement model A before moving model C of MODEL C have specific result or Model B that specific result is arranged.This example can be expressed as:
(if (A=first result) or (B=second result)), then C can move.
For instance, MODEL C can require before moving model C moving model A and detect the positive or moving model B of cancer and detection of lung cancer positive.Perhaps, the precondition 116 of MODEL C can require model A and Model B all to obtain specific result:
(if (A=first result) and (B=second result)), then C can move.
In another example, precondition 116 requirements MODEL C before moving model D of model D has specific result.And the precondition 116 of MODEL C requires that model A has first result before moving model C, and Model B has second result simultaneously.This example can be expressed as:
(if (A=first result) and (B=second result)), then C can move
(if C=the 3rd result), then D can move.
These examples have been set forth the advantage of the precondition 116 that supplies a model.Because the new precondition 116 of the present invention, but model 202 hierarchal arrangement, the wherein specific model 202 of operation before other model 202 of operation.Usually, moving model 202 designs for the first time are divided into biological samples the biological sample kind (as wide phenotype) of broad.In case after the biological sample rough classification, but just moving model 202 preliminary classification further is subdivided into narrower biological sample kind (for example more concrete biological sample kind).
When the model precondition 116 of the model of selecting in the step 406 202 is met (408-is), then process control proceeds to step 410.When the model precondition 116 of model 202 is not met (408-is not), step 406 is got back in process control, and selects another model 202 from the series model of identifying step 404.
Step 410
Calculating 204 in the model is selected in step 410.Calculate 204 two or more cell components of indication, the feature of this cell component (some aspects of the biological aspect of cell component) will detect in the biological samples that is studied.For example, calculating 204 can be stipulated the cell component abundance value of gene A AA and BBB.In some embodiments, at least a cell component that raises or reduce has been specified in calculating in sample.With respect to not having by the biological property of model 202 representatives and/or the biological samples with different biological features, this sample has the biological property of model 202 representatives of selecting in last a kind of situation of step 406.
Compare the sample with other biological property, the cell component that raises in having the sample of some biological property or reduce can or obtain in disclosed reference by the test of routine.For example, Su etc., 2001, Cancer Research 61, the title that p.7388 provides (i) in specific primary tumor type, to raise and (ii) predict the gene of this tumor type.Su etc. have identified the expression of listed cell component and tumor of prostate in the table 2.
Table 2:Su etc., the cell component that in tumor of prostate, raises
Numbering Login name Title Describe
1 2 3 4 5 6 7 8 9 NM_003656 Hs.12784 NM_001648 NM_005551 does not have NM_006562 NM_016026 NM_001099 NM_005551 CAMK1 KIAA0293 KLK3 KLK2 TRG@ LBX1 LOC51109 ACPP KLK2 Calcium/calmodulin-dependent protein kinase I KIAA0293 albumen kallikrein 3, (prostate specific antigen) kallikrein 2, prostatic and the similar TXi Baoshouti γ of Drosophila melanogaster (D.melanogaster) homeodomain protein locus transcription factor hen late period (lady bird late) CGI-82 protein acid acid phosphatase, prostate kallikrein 2, prostatic
10 11 12 13 14 15 16 17 18 19 No NM_012449 NM_001099 NM_004522 does not have NM_001634 NM_001634 is not had NM_006457 NM_001648 No STEAP ACPP KIF5C does not have AMD1 AMD1 is not had LIM KLK3 Antigen | TIGR==HG2261-HT2352 prostatic six strides film epithelium antigen acid phosphatase; The anti-brush of prostate kinesin family member 5C TIGR==HG2261-HT2351 S adenosylmethionine decarboxylase 1 S adenosylmethionine decarboxylase 1 anti-brush TIGR==HG2261-HT2351 LIM albumen (being similar to rat protein kinase C bond) kallikrein 3, (PSA)
In some embodiments, after the abundance of having measured the various kinds of cell composition, when the abundance of the table cell component of biological samples during, think that then this cell component is raised in having this biological samples of this biological property greater than the abundance of at least 60%, at least 70%, at least 80% or at least 90% cell component of biological samples with this biological property with certain biological property.In some embodiments, when the average abundance of certain cell component of the biological samples with certain biological property is higher than the abundance of this cell component of the biological samples with this biological property, then think the biological samples that does not have this biological property with respect to this, this cell component is raised in having the sample of this biological property.In some embodiments, after the abundance of having measured the various kinds of cell composition, when the abundance of certain cell component in the biological samples with certain biological property during, think that then this cell component is reduced in having the sample of this biological property less than the abundance of at least 40%, at least 30%, at least 20% or at least 10% cell component of biological samples with this biological property.In some embodiments, when the average abundance of the table cell component of the biological samples with certain biological property is lower than the abundance of this cell component of the biological samples with this biological property, then think biological sample or the biology that does not have this biological property with respect to this, this cell component is reduced in having the biological sample of this biological property.
In some embodiments, calculate cell component each nucleic acid or the RNA (ribonucleic acid) naturally of regulation in 204, and the abundance of these cell components of biological samples is that all or part of first cell component by measuring this biological samples and the transcriptional state of second cell component obtain.In some embodiments, calculating the cell component of stipulating in 204 is complete mRNA, cRNA or cDNA or its fragment independently of one another.In some embodiments, calculate each protein naturally of cell component of regulation in 204, and the abundance of these cell components is to obtain by the translation state of measuring all or part of cell component.In some embodiments, the abundance of the cell component of regulation is to determine by the activity or the posttranslational modification of measuring this cell component in the calculating 204.
Step 412
In step 412, the cell component eigenwert of calculating 204 appointments of selecting in the last a kind of situation of step 410 is that the cell component feature of submitting to from step 402 obtains.Therefore, in the example that calculates 204 regulation gene A AA and gene BBB, obtained the abundance value (or by some further features of this calculating appointment) of the cell component of gene A AA and gene BBB from cell component abundance file.
Step 414
In step 414, the calculating of selecting in the computational algorithm 212 calculation procedures 410 last a kind of situations according to described model appointment 204.For example, computational algorithm can stipulate to get the ratio of abundance value of second cell component of first cell component of example calculation 204 appointments and example calculation 204 appointments.Calculating 204 other examples that calculate according to 214 pairs of computational algorithms obtains describing in above 5.1 parts.These examples have been described after calculating 204 obtains calculating and how based on the threshold value of the calculated value that calculates with respect to this calculating it to have been characterized.For example, if the value of the calculating that calculates 204 greater than the minimum value of this calculating, then this calculating that calculates 204 is for just.
Step 416
In step 416, calculate 204 result of calculation for the last time and be stored.In some embodiments, storage comprises the stored models identifier, and 204 model 202 is calculated in its identification operation; The model version identifier, its indication moving model is 202 which version; The expression data file identifier, its recognizing cells composition characteristics data file, this document provides and is used for to calculating the 204 cell component eigenwerts of calculating; With calculating 204 compute identifiers that are associated 112 (Fig. 1) and result of calculation code (as " most probably ", " impossible " etc.).
Step 418
In step 418, determine that in the model 202 all calculate 204 and whether obtain calculating according to the computational algorithm 212 of this model.If not (418-is not), process control is got back to step 410 and is selected another calculating (detection) 202 and calculate from model 202.If (418-is), then network control proceeds to step 420.
Step 420
In step 420, compile all calculating (detection) 204 that model carried out of selecting in the last a kind of situation of step 406 according to set algorithm 214 by model 202 appointments.This aspect of model that obtains this model that compiles.This aspect of model shows whether the described test organism of described species or the described test organism sample of this species biology are the members of a class biological sample.
In one embodiment, collect the object code of every row in the table 318, this code has the model identifier that the model identifier with the model of selecting 202 is complementary in last a kind of situation of step 406.For example, consider that model 202 comprises the situation of 5 calculating 204.In step 414, calculate each calculating 204 and store results.Calculate 204 when relevant when threshold value with each, whether result calculated can be used as calculating is positive and negative or uncertain indication.
Consider that model 202 comprises the situation of 5 calculating (detections) 204.5 row will be arranged in the result of calculation table 318, and delegation is corresponding to calculating for 5 times in 204 each time.In this 5 row each is about to comprise object code.In this user situation, each object code is for just or for negative, or uncertain.In addition, the set algorithm relevant with model 202 will be stipulated how to make up these 5 object codes and characterize this model 202.For example, this set algorithm can be stipulated to make up this 5 object codes with the ballot method, wherein, if just being calculated as more than for negative of calculating in the model thinks that then model 202 is positive (just being expressed as).
An example of set algorithm 214 is ballot methods, and wherein, when just being calculated as more than when negative of the model that calculates, then this model 202 is for just.For example, consider computational algorithm 212 is used for the situation of the calculating of table 1, and calculate 1 and 2 for just, calculate 3 uncertainly, calculating 4 is to bear.As result when being such, the model that is made of the calculating of table 1 will just be represented as.Yet, in some embodiments of the present invention, can adopt weighted method (weighting scheme), wherein, each in the model just calculating be endowed with this model in each negative calculate different weight.For example, it is 3.0 that the weight that each is just calculating in the model is given, and in this model each negative weight of calculating to give be 1.0.In this weighted method, promptly convenient model by 1 just calculating with 2 negative calculate that this model also just is characterized by when forming.
In preferred embodiments, the model of each sign produces a certain biological samples or biological has a possibility of the biological property of this model representative.The model of the model that this possibility representative calculates is kept the score.In other words, the model of each sign can produce and can cross the test organism that shows species or whether the test organism sample of species biology is the member's of a class biological sample the aspect of model (for example, model is kept the score).In some embodiments, model is kept the score high more, then its cell component value biological samples or biological just more possible (i) of being used to calculate this model has the biological property of this model representative, or (ii) is the member of the biological sample kind of this model representative.In some embodiments, model determined a certain biological samples or biological whether very likely, possible, uncertain, impossible or can not have very much a member of a class biological sample of the biological property relevant or this detection representative with this model.In some embodiments, the biological property of model representative is susceptibility and/or the resistance to combined therapy.In some embodiments, the biological property of model representative is the metastatic potential of specified disease and/or the possibility that this disease recurs in biologic artifact.In some embodiments, the biological property of model representative is any exemplary biological property that cancer and/or 5.4 parts are mentioned.In the embodiment of record palindromia possibility, available " sensitivity ", " low-risk " or " excessive risk " etc. is kept the score to model.In the embodiment of record disease metastatic potential, available " pernicious ", " uncertain " or " non-pernicious " etc. are kept the score to model.In the pernicious embodiment of estimating disease, available " pernicious ", " uncertain " or " chronic " etc. are kept the score to model.
Step 422 and 424
In step 422, determine on given cell component abundance file whether all models in the model series of operation (calculating) all move.If not (422-is not), then process control is got back to step 406 and is selected another model 202.If all models are all moved, then report the result (step 424).In some embodiments, the result of report is the feature of each model in a plurality of models.
In typical embodiment, the result of report is the feature of each model 202 in the model series of having moved.The single model 202 of each that moved characterizes according to the singleton algorithm 214 of this model.In typical embodiment, the result is reported to the remote user computer of submitting initial cell composition abundance file to.The exemplary report of making in the step 424 is described in 5.3 parts.
5.3. example results
In some embodiments, the report that step 424 provides is sent on the remote computer from computing machine 20, and this remote computer is cellulation composition characteristics data file in the step 402 of Fig. 4.In some embodiments, this report has the title that following information is provided:
Require to calculate the breadboard address of one or more models;
The unique sequence identifier that requires;
Distinguish unique sample identifier of the sample of submission;
Distinguish the identifier of the microarray form that is used for measuring the cell component characteristic;
The cell component characteristic data file is submitted to the date of the computing machine 20 of step 402;
The date that the report of step 424 generates;
Distinguish the patient's who represents with the cell component characteristic data file identifier;
Form the description of the biological samples of cell component tag file to therefrom obtaining the cell component characteristic; And/or
The doctor of requirement moving model on biological samples or other health care personnel's identity.
Following table 3 and 4 is an example of the report of representation model prostate external member together.The different model of each line display in the table 3 and 4.In table 3, the model of each report has the clinical detection title, and it provides this model is check and so on indication; The reference of one or more research projects (or other form of clinical examination) is provided, and it provides scientific basis for selecting cell component to be used for testing model; Also provide model result and about the clinical description of this model result.The model that table 3 provides has been pointed out: (i) patient can recur the possibility degree of prostate cancer or (ii) the patient to the susceptibility of concrete form of therapy.Table 4 is different from table 3, and each row (model) representative of table 4 determines whether the patient suffers from the affirmation detection of prostate cancer.
Table 3. prostate cancer external member/clinical detection
Clinical detection Reference The result Describe
Androgen is removed resistance (Androgen Ablation-resistance) recurrence possibility recurrence possibility recurrence possibility recurrence possibility recurrence possibility Holzbeierlein- Gerald2004 LaTulippe- Gerald2002 Singh- Sellers2002 Febbo- Sellers2003 Henshall- Sutherland2003 Lapointe- Pollack2004 Responsive low-risk low-risk low-risk low-risk low-risk Expression figure and androgen remove the inconsistent expression figure of resistance with the recurrence low-risk consistent express figure and recurrence low-risk unanimously expression figure and recurrence low-risk unanimously expression figure unanimously expression figure is consistent with the recurrence low-risk with the recurrence low-risk
Table 4. prostate cancer external member/affirmation detects
Confirm to detect Reference The result Describe
Optimum and the pernicious position, contrast source of contrast that optimum and pernicious contrast is optimum and pernicious: prostate Emst- Grone2002 WeIsh- Hampton2001 Magee- Milbrandt2001 Su- Hampton2001 Pernicious uncertain malignant prostate Expression figure consistent with malignant cell with regard to malignant tumour expression scheme uncertain expression figure and malignant cell unanimously expression figure and primary prostate cancer be consistent
Table 5 and 6 has been described the chemosensitivity model of another example of the present invention, and it is to find in the another kind of Report Type that sends in step 424.
The report of table 5. chemosensitivity (Chemosensivity) model
Chemosensitivity detects Reference The result Describe
Vinca alkaloids: camptothecine vinca alkaloids: Irinotecan vinca alkaloids: vincristine vinca alkaloids: vincaleukoblastinum taxane: taxol taxane: docetaxel antibiotic: actinomycin D antibiotic: bleomycin antibiotic: mitomycin C anthracycline antibiotic: Doxorubicin anthracycline antibiotic: daunorubicin antimetabolite: methotrexate (MTX) antimetabolite: 5 FU 5 fluorouracil antimetabolite: cytarabine antimetabolite: gemcitabine antimetabolite: 6-thioguanine antimetabolite: Ismipur PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 The responsive resistance resistance of the responsive resistance resistance of the responsive resistance resistance resistance resistance resistance responsive resistance of resistance resistance Gene expression and the consistent gene expression of camptothecine sensitiveness and the consistent gene expression of Irinotecan sensitiveness and the consistent gene expression of vincristine resistance and the consistent gene expression of vincaleukoblastinum resistance and the consistent gene expression of taxol resistance and the consistent gene expression of docetaxel sensitiveness and the consistent gene expression of actinomycin D resistance and the consistent gene expression of bleomycin resistance and the consistent gene expression of mitomycin C resistance and the consistent gene expression of Doxorubicin resistance and the consistent gene expression of daunorubicin resistance and the consistent gene expression of methotrexate (MTX) resistance and the consistent gene expression of 5 FU 5 fluorouracil sensitiveness and the consistent gene expression of cytarabine resistance and gemcitabine sensitivity genes are expressed with the consistent gene expression of 6-thioguanine resistance and Ismipur resistance consistent
Table 6. chemosensitivity model
Chemosensitivity detects Reference The result Describe
DNA alkanisation (alkylator): cis-platinum interferon: interferon-' alpha ' interferon: interferon-beta interferon: interferon-γ other: STI 571 other: L-ASP PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 PathWork2004 Responsive resistance resistance resistance resistance resistance The consistent gene expression with cisplatin sensitivity of gene expression and the consistent gene expression of interferon-' alpha ' resistance and the consistent gene expression of interferon-beta resistance and the consistent gene expression of interferon-γ resistance and the consistent gene expression of STI 571 resistances and L-ASP resistance are consistent
Table 7 and 8 has been described the colorectum model at another example of the present invention, and it is to find in the another kind of Report Type that sends in step 424.
The report of table 7. colorectum model
Clinical detection Reference The result Describe
Chemosensitivity: 5FU chemosensitivity: 5FU/RTX chemosensitivity: 5FU/CPT chemosensitivity: cis-platinum metastatic potential metastatic potential Takeshi- Fukushima2001 Farrugia- Jackman2003 Mariadason- Augenlicht2003 Huerta- Heber2003 Li- Furukawa2004 Hedge- Quakenbush2001 The responsive uncertain low-risk low-risk of resistance Expression figure and the consistent figure of expression of 5FU resistance cancer and 5FU/RTX sensitiveness cancer unanimously expression figure and the 5FU/CPT sensitiveness cancer low-risk that unanimously uncertain expression figure and metastatic tumor are schemed in expression with regard to cisplatin sensitivity unanimously the low-risk of expression figure and metastatic tumor is consistent
The report of table 8. colorectum model
Confirm to detect Reference The result Describe
The position, contrast source of the contrast pancreas knurl that optimum and pernicious contrast is optimum and pernicious and the contrast pancreas knurl of cancer and cancer: colorectum Yamamoto- Imai2002 Zou- Meltzer2002 Lin- Nakamura2002 Notterman- Levine2001 Su- Hampton2001 Pernicious uncertain cancer cancer colorectum Expression figure consistent with malignant tumour with regard to malignant tumour expression scheme uncertain expression figure and cancer unanimously expression figure and cancer unanimously expression figure and primary colorectal cancer are consistent
Table 9 has been described at the position, source of a set of model of another embodiment of the invention, and this model is to find in the another kind of Report Type that sends in step 424.
The position report of table 9. source
The position, source The PATHWORK index Predictive value
Low High
Colorectum lung stomach liver kidney mammary gland ovary bladder pancreas prostate +32 +12 -42 -42 -88 -88 -88 -88 -100 -100 ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆
5.4. exemplary biological property
The present invention can be used to develop and determines whether biological samples has any one model in a plurality of biological properties.In other words, the present invention can be used to develop the test organism that shows species or whether the test organism sample of species biology is the member's of a class biological sample model.Consider the wide array (biological example sample type) of biological property.In one embodiment, two biological properties are respectively (i) wild-type status and (ii) morbid state.In another embodiment, two biological properties are respectively (i) first morbid state and second morbid state.In yet another embodiment, two biological properties are respectively (i) drug responses state and the (ii) non-response status of medicine.In this case, first model 202 detects the disappearance or the existence of the first biological sample feature, and second model 202 detects the disappearance or the existence of second biological property.The invention is not restricted to only to the disappearance of two kinds of biological properties of sample detection or the situation of existence.In fact, adopt method of the present invention, computing machine and computer program can detect biological property (for example, biological property of any number, two or more biological properties, 3-10 biological property, 5-20 biological property, the biological property more than 25 etc.).Usually (for example detect the existence of each biological property or disappearance with different model 202 this moment, for determining whether this sample is the member who is characterized by the biological sample kind that has this feature, whether is the member who is characterized by a class biological sample that does not have this feature perhaps).In some embodiments, detect a plurality of models with the disappearance of understanding identical biological property or exist.In other words, detect a plurality of models to determine whether biological sample is the member of particular organisms sample type.Exemplary biological property has been described in this part.Biology with given biological property can be considered to the member of corresponding biological sample kind.
5.4.1 breast cancer
Pusztai etc. treat breast cancer with some different NACT methods.Not all scheme all has identical effect to all patients.Also can't select the most effective scheme at present for particular individual.A kind of received alternative method that can prolong no recurrent survival after the breast cancer chemotherapy is that the full pathologic of neoadjuvant is replied (pCR).Pusztai etc. (ASCO 2003 summary 1) report and have found gene expression figure, and this expression figure is measurable at the complementary recently pCR that gives after taxol carries out FAC chemotherapy (T/FAC) then continuously weekly.From the fine needle aspiration thing of 24 kinds of early-stage breast cancers, produced predictive marker (Pusztai etc.).There are 6 people to obtain pCR (25%) among 24 patients.In people's such as Pusztai report, containing the RNA feature of measuring every kind of sample on the eDNA microarray of 30,000 kinds of people's transcripts.Be chosen in the gene that has differential expression between pCR and residue disease (RD) group by signal to noise ratio (S/N ratio).The research method of having estimated some supervision to be determining the optimal classification prediction algorithm, and adopts and once choose (leave-one out) cross validation obtain predicting the outcome optimal number of required gene.Use the support vector machine of 5 genes (3EST, nuclear factor 1/A and histone acetyltransferase) to produce maximum expectation accuracy.In accepting the independent case of T/FAC neoadjuvant, detect predictive marker series.People such as Pusztai have reported 21 patients' that comprise result in confirming detection.People such as Pusztai are 81% based on total exactness of replying prediction of gene expression figure.Total specificity is 93%.Susceptibility is 50% (having 3 among 6 pCR is RD by mis-classification).People such as Pusztai find, being predicted to be the probability that the patient that can produce pCR to chemotherapy before the T/FAC operation gone out existing pCR is 75%, and unscreened patient only has the probability of 25-30%.People's such as Pusztai discovery can be used to set up model 202, the patient that this model can be used to help the doctor to select most probable to be benefited from the T/FAC NACT then.
Cobleigh etc. have the patient with breast cancer's of 10 or how positive tubercle the non-constant of prognosis, but still have some can long-term surviving.Cobleigh etc. (ASCO 2003 summaries 3415) attempt to identify prophesy (predictor) of the long-term anosis survival (DDFS, distant disease-free survival) among this high-risk patient group.Identified the aggressive patient with breast cancer who during 1979-1999, diagnoses out with 10 or how positive tubercle.From 3 10 microns section, extract RNA and quantize 7 expression with reference to gene and 185 cancer related genes with RT-PCR.Result based on disclosed document and microarray test selects gene.79 patients have been studied altogether.54% patient accepts hormone therapy, and 80% patient accepts chemotherapy.Following up a case by regular visits to intermediate value is 15.1.In August, 2002,77% patient the far-end recurrence occurs or dies from breast cancer.The single argument Cox survival analysis of clinical variable shows, about the quantity and the DDFS significant correlation (p=0.02) of tubercle.Cobleigh etc. have used a multivariate model, and comprising age, tumour size, the tubercle that relates to, tumour rank, auxiliary hormone therapy and chemotherapy, 13% the DDFS time of having produced changes.Single argument Cox survival analysis to 185 kinds of cancer related genes shows relevant (5 p<0.01 with DDFS of many genes; 16 p<0.05).For HER2 part Grb7 and macrophage mark CD68, higher expression and short DDFS relevant (p<0.01).For TP53BP2 (oncoprotein p53 is in conjunction with albumen 2), PR and Bcl2, higher expression and long DDFS relevant (p<0.01).The multivariate model that comprises 5 kinds of genes 45% the DDFS time of having produced changes.Multivariable analysis also shows, gene expression is that the remarkable prophesy after clinical variable is controlled is sub.The discovery of Cobleigh etc. can be used to set up model 202, and this model can be used to then help to determine which patient may be relevant with DDFS which may be irrelevant with DDFS.
The patient with breast cancer in van ' t Veer. same disease stage can have visibly different therapeutic response and overall result.For example, prophesy that shifts (poor as a result), lymph node state and organization level can't accurately be classified tumor of breast according to its clinical behavior.In order to overcome this shortcoming, van ' t Veer (2002, Nature415,530-535) on 117 patients' primary breast tumour, used the DNA microanalysis, and come identified gene expression figure with supervised classification, the strong prediction of this gene expression figure when diagnosis in regional nodes the patient's of negative for tumor cells (lymph node feminine gender) short-term and long-term shift (' weak prognosis ' feature).In addition, van ' t Veer has set up the identification mark of BRCA1 carrier's tumour.The discovery of van ' t Veer can be used to set up model 202, and this model can be used to help to determine patient's prognosis then.
Other reference. the representative sample that can be used to set up other breast cancer research of the model 202 that detects breast cancer includes but not limited to: Soule etc., ASCO 2003 summaries 3466; Ikeda etc., ASCO 2003 summaries 34; Schneider etc., 2003, British Journal of Cancer 88, p.96; Long etc., ASCO 2003 summaries 3410; With Chang etc., 2002, Peer View Press, summary 1700, " the gene expression figure of docetaxel chemosensitivity (Gene Expression Profiles forDocetaxel Chemosensitivity) ".
5.4.2 lung cancer
.ERCC1 mRNA levels and DNA repair ability (DRC) and relevant such as Rosell-Costa to the clinical tolerance of cis-platinum.Enzymatic activity and ribonucleotide reductase (RR) M1 or M2 subunit gene change of Expression have been observed during the DNA reparation after being subjected to the gemcitabine damage.Rosell-Costa etc. (ASCO 2003 summaries 2590) are by carrying out quantitative PCR to separating from the RNA of 100 IV phases (NSCLC) patient tumors biopsy samples, estimated the mRNA level of ERCC1 and RRM1, these 100 patients are from the experiment with 570 patients, these 570 patients accept at random gem/cis to gem/cis/vrb to gem/vrb, accept vrb/ifos (ASCO such as Alberola 2001 summaries 1229) then.81 patients' ERCC1 and RRM1 data have been obtained.Reaction velocity, progress time (TTP) and median survival (MS) that these 81 patients are total and all 570 patients' result are similar.Find ERCC1 relevant strongly with the RRM1 level (P=0.00001).The level of finding ERCC1 and RRM1 in the gem/cis arm is significantly different, and does not find different in other arm.In this gem/cis arm, TTP with patient of low ERCC1 is 8.3 months, and the patient with high ERCC1 is 5.1 months (P=0.07), patient with low RRM1 is that the patient who had high RRM1 in 8.3 months is 2.7 months (P=0.01), patient with low ERCC1 and RRM1 is 10 months, and the patient with high ERCC1 and RRM1 is 4.1 months (P=0.009).MS with patient of low ERCC1 is 13.7 months, and the patient with high ERCC1 is 9.5 months (P=0.19), patient with low RRM1 is 13.7 months, and the patient with high RRM1 is 3.6 months (P=0.009), patient with low ERCC1 and RRM1 does not observe, and the patient with high ERCC1 and RRM1 is 6.8 months (P=0.004).The patient of ERCC1 and RRM1 level low (expression DRC low) is the ideal candidates people of gem/cis, and the high patient result of level is relatively poor.Therefore, comprise that the ratio of ERCC1 and RRM1 can be used to set up model 202, which kind of treatment this model decision uses to patients with lung cancer.
Hayes etc. although the incidence of disease height of lung cancer, carry out healthy classification by prognosis and the patient of therapeutic response team and still be difficult to.The original research of lung cancer gene expression arrays shows may exist before the not gland cancer subclass of understanding.These researchs do not obtain reappearing, and the relation between subclass and the clinical effectiveness still imperfectly understands.In order to compare the subclass of three big case series, Hayes etc. (ASCO 2003 summaries 2526) have analyzed their gene expression arrays in data pool, and this array comprises 366 kinds of tumor tissues and normal structure sample.Total expression data group is readjusted, and adopts gene to filter and select the gene subgroup, and this gene subgroup is being duplicated centering expression unanimity but variable expression in all samples.Total data set is carried out the hierarchical clustering analysis, and classification that obtains and the classification that the original copy author supposes are compared.For directly comparing, construct a sorter and use it for the sample of determining from tumour storehouse with 366 kinds of tumours with initial classification.In each step of analyzing, the classification consistance between affirmation and the initial kind of announcing is significant statistically.Confirm in the step at another, in sort program, compared the list of genes of describing the initial hypotype of announcing.In addition, the list of genes that is used for describing the gland cancer subclass also is significantly overlapping statistically.At last, survival curve confirms that the survival rate of a hypotype of gland cancer reduces always.The analysis of Hayes etc. helps to set up reproducible, available mRNA and expresses the gland cancer hypotype that the preface type analysis is described.Therefore, the result of Hayes etc. can be used to set up model 202, and this model can be used to identify the gland cancer hypotype.
5.4.3 prostate cancer
Li etc. taxotere (Taxotere) shows to have solid tumor resisting, comprises the antitumor activity of prostate cancer.Yet also do not illustrate the molecular mechanism of taxotere effect fully.For in non-hormone-sensitive (PC3) and hormone-sensitive (LNCaP) prostate gland cancer cell, setting up the molecular mechanism of taxotere effect, obtained gene expression figure widely with Affymetrix human genome U133A array.See ASCO 2003 summaries 1677 such as Li.To be untreated and carry out microarray analysis with total RNA that the 2nM taxotere is handled 6,36 and 72 hours cell, and with Microarray Suite and DataMining, Cluster and Tree View and Onto-express software analysis data.Just observe gene expression when handling 6 hours and change, the processing time, the living gene that changes of long hair was many more more.In addition, taxotere demonstrates different effects to LNCaP with PC3 gene expression of cells figure.6, after 36 and 72 hours, always have 166,365 and 1785 genes in the PC3 cell and demonstrate variation more than 2 times, and in the LNCaP cell, be respectively 57,823 and 964 genes.Discoveries such as Li do not have influence to androgen receptor, participate in steroids independent form AR activation (IGFBP2, FGF13, EGF8 etc.) although observe the rise of some genes in the LNCaP cell.Cluster analysis has shown the gene downward modulation of being responsible for cell proliferation and cell cycle (cyclin and CDK, Ki-67 etc.), signal transduction (IMPA2, ERBB2IP etc.), transcription factor (HMG-2, NFYB, TRIP13, PIR etc.) and tumour generation (STK15, CHK1, survivin etc.) in these two kinds of clones.On the contrary, taxotere raises and apoptosis-induced (GADD45A, FasApo-1 etc.), cell cycle arrest (p21CIP1, the p27KIP1 etc.) gene relevant with tumor suppression.From these as a result Li etc. reach a conclusion, taxotere changes a large amount of genes, many genes wherein may participate in the molecular mechanism that taxotere influences prostate gland cancer cell.Also may further excavate the optimisation strategy that these information design the taxotere result of treatment and be used for the treatment of metastatic prostate cancer.
With the described result in this part, can develop the model 202 (for example, first biological property is to the taxotere high response, and second biological property does not respond taxotere etc.) that divides paired taxotere with the patient and the associated treatment scheme is had the group of differential responses degree.In another approach, can partly express and set up biological property, with survival prophesy as D2 phase prostate cancer based on Cox-2.
5.4.4 colorectal cancer
Kwon etc. for identifying one group of gene relevant with the development of colorectum carcinogenesis, Kwon etc. (ASCO 2003 summaries 1104) use the cDNA microarray method with 4608 genes to analyze the gene expression figure of colorectal cancer cell, and this colorectal cancer cell is from 12 kinds of tumours with corresponding non-cancer colon epithelial cell.Kwon etc. classify sample and gene by the two-phase cluster analysis, and have identified the gene of differential expression between cancerous tissue and non-cancer tissue.By the change of reverse transcriptase PCR (RT-PCR) confirmation at selected gene within gene expression.With the learning art that is subjected to prosecution, estimated gene expression figure according to lymphatic metastasis.In the tumour more than 75%, observe 122 expression of gene and change, be i.e. the gene of the gene of 77 rises and 45 downward modulations.The gene of frequent change belongs to following function type: signal transduction (19%), metabolism (17%), eucaryotic cell structure/motility (14%), cell cycle (13%) and gene protein are expressed (13%).The Gene RT-PCR analytical table of Xuan Zeing reveals consistent with the discovery in the cDNA microarray at random.Kwon etc. can predict the lymphatic metastasis of 10 people among 12 patients with cross validation ring (cross-validation loop).The result of Kwon etc. can be used to set up and determines whether the patient suffers from the model 202 of colorectal cancer.In addition, the result of Kwon etc. can further be used for identifying the subclass of colorectal cancer.
Other research that can be used to set up colorectal cancer model 202 (comprising that the characterization of biological sample has the model of colorectal cancer and other possible model of prediction colorectal cancer subgroup) includes but not limited to: Nasir etc., 2002, In Vivo.16, p.501, wherein summed up and found that COX-2 expresses the raising research relevant with progress with tumor inducing, and Longley etc., 2003 Clin.Colorectal Cancer.2, p.223; McDermott etc., 2002, Ann Oncol. 13, p.235; With Longley etc., 2002, Pharmacogenomics J.2, p.209.
5.4.5 oophoroma
Spentzos etc. for identifying the expression figure relevant with the clinical effectiveness of epithelium oophoroma (EOC), Spentzos etc. (ASCO 2003 summaries 1800) have estimated the 38 kind tumor samples of acceptance based on the EOC patient of the linearize treatment of platinum/taxane.With rna probe reverse transcription, fluorescence labeling, with the oligonucleotide arrays hybridization and the expressed sequence mark that contain 12675 human genes.Analyze expression data to obtain the predictability feature of chemosensitivity, DFS (DFS) and overall survival (OS).According to the possibility of gene differential expression in the chemosensitivity tumour different, they are classified with the Bayesian model with survival rate.Above-mentioned feature comprises the gene of most probable differential expression between the different tumour subgroup of result respectively.Spealtzos etc. have found one group of gene of overexpression in the chemoresistance tumour, and another group gene of overexpression in the chemosensitivity tumour.Spentzos etc. have found the gene of 45 overexpressions in the tumour relevant with anosis survival of short time (DFS, short disease free survival), and the gene of 18 overexpressions in the tumour relevant with long-time DFS.It is 7.5 and 30.5 months (p<0.00001) two groups that these genes are divided into the DFS intermediate value with patient group.Spentzos etc. have found the gene of 20 overexpressions in the tumour with short time overall survival (OS), and the gene of 29 overexpressions in having the gene of long-time OS (the OS intermediate value is 22 and 40 months, p=0.00008).The gene of the overexpression of evaluations such as Spenizos can be used to set up the model 202 that biological samples is divided into types such as chemoresistance oophoroma, chemosensitivity oophoroma, short time DFS oophoroma, long-time DFS oophoroma, short time OS oophoroma and long-time OS oophoroma.
Other research that can be used to set up oophoroma model 202 includes but not limited to: Presneau etc., and 2003, Oncogene 13, p.1568; With ASCO such as Takano 2003 summaries 1856.
5.4.6 carcinoma of urinary bladder
Wulfing etc. shown Cox-2 (but a kind of induced enzyme that participates in arachidonic acid metabolic) overexpression usually in various human cancers.Nearest studies show that the expression of Cox-2 has prognostic value to the patient who accepts radiotherapy or chemotherapy owing to certain tumour entity.In carcinoma of urinary bladder, Cox-2 expresses also and does not set up good related with survival rate data.In order to address this problem, Wulfing etc. (ASCO2003 summary 1621) have studied 157 continuity patients, and these patients accepted radical cystectomy because of the aggressive carcinoma of urinary bladder.Wherein, there are 61 patients to accept to contain the chemotherapy of cis-platinum as supplemental treatment or in order to treat the disease transfer.Measure with the immunohistochemistry that monoclonal Cox-2 antibody carries out standard to paraffin-embedded piece of tissue.Set up related with clinical and pathology data, long-term surviving rate (3-177 month) and chemotherapy details semi-quantitative results.26 (16.6%) cases are the Cox-2 feminine gender.For all positive cases (n=131,83.4%), 59 people (37.6%) show that low Cos-2 expresses, and 53 people (33.8%) show that moderate Cos-2 expresses, and have 19 people (12.1%) to show that strong Cos-2 expresses.Express with TNM-rank and histology grade irrelevant.Cox-2 expresses histological type significant correlation with tumour, and (urothelium is compared squamous cell carcinoma, P=0.01).In the case that all are studied, Kaplan-Meier analyzes and does not show that any statistical correlations is arranged between overall survival and the DFS.Yet, carry out the subgroup analysis by the chemotherapeutic patient who those was accepted contain cis-platinum and find that Cox-2 expresses and relatively poor whole time-to-live significant correlation (P=0.03).According to Wulfing etc., the immunohistochemistry overexpression of Cox-2 is an incident very common in the carcinoma of urinary bladder.As if the patient who accepts chemotherapy as Cox-2 in tumour during overexpression have worse survival rate.Therefore, Wulfing etc. think, the expression of Cox-2 can provide extra prognosis information for using the bladder cancer patients for the treatment of based on the chemotherapy of cis-platinum, and can be that individual patient is carried out aggressive therapy or used the basis of the risk conditioned of selectivity Cox-2 inhibitor to the target therapy.The result of Wulfing etc. can be used to set up the model 202 that people of bladder tumor is divided into different treatment groups.
5.4.7 cancer of the stomach
Terashima etc. for detecting the chemoresistance related gene in the human cancer of the stomach, Terashima etc. (ASCO 2003 summaries 1161) have studied gene expression figure with dna microarray, and result and external drug susceptibility are compared.Obtain fresh tumor tissues from 16 patients with gastric cancer altogether, then with GeneChip Human U95Av2 array (Affymetrix, Santa Clara, California) the detection gene expression figure that comprises 12,000 human genes and est sequence.Result and the external drug susceptibility result who determines by the ATP detection are compared.The medicine of research and medicine concentrate be cis-platinum (CDDP), Doxorubicin (DOX), mitomycin C (MMC), etoposide (ETP), Irinotecan (CPT is as SN-38), 5 FU 5 fluorouracil (5-FU), doxifluridine (5 '-DFUR), taxol (TXL) and docetaxel (TXT).C with every kind of medicine MaxConcentration added medicine 72 hours.Drug susceptibility is expressed as the ratio (T/C%) of the ATP content of medication therapy groups and control group.Estimate the Pearson correlation coefficient between related gene expression and the T/C%, and use the gene of selecting by correlativity to carry out cluster analysis.Analyze as can be known by these, 51 genes among the CDDP, 34 genes among the DOX, 26 genes among the MMC, 52 genes among the ETP, 51 genes among the CPT, 85 genes, 5 among the 5-FU '-11 genes in 42 genes among the DFUR, TXL and 32 genes among the TXT raise in the drug resistance tumour.Great majority in these genes are relevant with cell growth, Cycle Regulation, apoptosis, heat shock protein or uiquitin-protease enzyme body path.Yet, some genes, as ribosomal protein, CD44 and elongation factors α, all specificity raises in every kind of drug resistance tumour.The gene of the rise that Terashima etc. identify can be used to set up model 202, this model can not only the diagnosis of gastric cancer patient, and the indication that can provide the patient whether to have the drug resistance tumor stomach, and if indication is the drug resistance tumour of which kind of type.
Other reference that can be used to set up model of gastric carcinoma 202 includes but not limited to: ASCO2003 such as Kim summary 560; ASCO such as Arch-Ferrer 2003 summaries 1101; HobdayASCO 2003 summaries 1078; ASCO such as Song 2003 summaries 1056 (overexpression of Rb gene is the independent prognostic factor that prediction does not have the recurrence survival); Leichman etc., ASCO 2003 summaries 1054 (expression of thymidylate synthetase is as prophesy of cancer of the esophagus/cancer of the stomach chemical benefits).
5.4.8 the carcinoma of the rectum
Lenz etc. local recurrence is the important clinical problem that rectal cancer patient faces.Therefore, Lenz etc. (ASCO 2003 summaries 1185) attempts to set up the genetic map of measurable rectal cancer patient pelvis recurrence with the assistant chemical radiation therapy treatment.Treated patient's (UICCII and III phase) that the carcinoma of the rectum is developed in 73 parts altogether from 1991 to 2000, wherein women 25 people, the male sex 48 people, 52.1 years old mean age.Histologic classification is divided into the T2 phase with 22 patients, and 51 people are divided into the T3 phase.Have 35 patients and be the lymph node feminine gender, 38 patients have a place or many places lymphatic metastasis.All patients pass through the cancer resection, add the pelvis radiation therapy with 5-FU then.From tissue extraction RNA formalin fixed, paraffin-embedded, the laser capture micro-dissections.Lenz etc. have measured the mRNA level that (VEGF) gene relevant with DNA reparation (ERCC1, RAD51) takes place with 5FU approach (TS, DPD), blood vessel in tumor tissues and near the normal structure by quantitative RT-PCR (Taqman).The horizontal significant correlation of higher mRNA of ERCCI and TS in discovery such as Lenz local tumor recurrence and near the normal structure, this explanation 5-FU approach, DNA repair, the gene expression dose of the target gene of blood vessel generation can be used to identify the patient of pelvis risk of recurrence.The result of Lenz etc. can be used to set up the model 202 of identifying the patient that the pelvis risk of recurrence is arranged.
5.4.9 other exemplary biological property
Other representational biological property includes but not limited to: acne, acromegalia, acute cholecystitis, Addison disease, mullerianosis, adult's GHD, adult soft tissue sarcoma, alcohol dependence, allergic rhinitis, allergic reaction, alopecia, degenerative brain disorder, amniocentesis, anaemia in the heart failure, anaemia, angina pectoris, ankylosing spondylitis, anxiety disorder, adenoma ovarii testiculare, arrhythmia cordis, arthritis, the eye problem relevant with arthritis, asthma, atherosclerotic, atopic eczema, atrophic vaginitis, attention deficit disorder, the notice disorder, autoimmune disease, balanoposthitis, alopecia, bartholinian abscess, inborn defect, bleeding disorder, osteocarcinoma, brain and tumor of spinal cord, brain stem glioma, brain tumor, breast cancer, mammary cancer risk, mammary gland disease, cancer, kidney, cardiomyopathy, carotid disease, carotid endarterectomy, carpal tunnel syndrome, cerebral paralysis, cervix cancer, chancroid, varicella, children's nephrotic syndrome, Chlamydia, chronic diarrhea, chronic heart failure, walk lamely, angina, the colon or the carcinoma of the rectum, colorectal cancer, common cold, condyloma (reproduction wart), congenital goiter, congestive heart failure, conjunctivitis, keratonosus, ulcer of the cornea, coronary heart disease, Cryptosporidiosis, hypercortisolism, cystic fibrosis, cystitis, cystoscopy or Ureteroscopy, De Quervain disease, dull-witted, depressed, mania, polyuria, diabetes insipidus, diabetes, diabetic retinopathy, Down syndrome, puberty dysmenorrhoea, intercourse pain, the ear allergic reaction, ear infection, eating disorder, eczema, wind-puff, endocarditis, carcinoma of endometrium, endometriosis, children's enuresis, epididymitis, epilepsy, perineotomy, erectile dysfunction, cancer eye, fatal abstraction, incontinence of faces, Female sexual dysfunction, fetal abnormality, fetal alcohol syndrome, fibromyalgia, influenza, folliculitis, fungal infection, the gardnerella vaginalis disease, candidiasis of the genitals, genital herpes, gestational diabetes mellitus, glaucoma, renal glomerular disease, gonorrhoea, gout and pseudogout, the growth disease, gingival disease, hair follicle disease, halitosis, the Hamburger disease, hemophilia, hepatitis, hepatitis B, HCC, herpes infection, the human placental lactogen, hyperparathyroidism, hypertension, hyperthyroidism, hypoglycemia, hypogonadism, hypospadia, hypothyroidism, uterectomy, impotence, sterility, inflammatory bowel disease, indirect inguinal hernia, the heredity heart murmur, intraocular melanoma, IBS, Kaposi sarcoma, leukaemia, liver cancer, lung cancer, tuberculosis, malaria, manic-depressive psychosis, measles, the loss of memory, children's meningitis, menorrhalgia, celiothelioma, microalbumin, antimigraine, imtermenstrual pain, carcinoma of mouth, ataxia, mumps, Naboth's cysts, narcolepsy, nasal allergy, nasal cavity and paranasal sinus cancer, neuroblastoma, neurofibromatosis, neurological disorder, icterus neonatorum, fat, obsessive-compulsive disorder, orchitis or epididymitis, actinal surface flesh dysfunction, osteoarthritis, osteoporosis, osteoporosis, osteosarcoma, oophoroma, ovarian cyst, cancer of pancreas, paraphimosis, Parkinson's, partial epilepsy, pelvic inflammatory disease, peptic ulcer, PPCM, induration of penis, PCOS, pre-eclampsia, pregnanediol, premenstrual syndrome, priapism, prolactinoma, prostate cancer, psoriasis, rheumatic fever, salivary-gland carcinoma, SARS, sexually transmitted disease, the enteric infection that spreads through sex intercourse, the infection that spreads through sex intercourse, Sheehan syndrome, nasosinusitis, cutaneum carcinoma, sleep-disorder, smallpox, dysgeusia, snoring, social phobia, spina bifida, cancer of the stomach, syphilis, carcinoma of testis, thyroid cancer, thyroid disease, tonsillitis, odontopathy, trichomoniasis, tuberculosis, tumour, H type diabetes, ulcerative colitis, urinary tract infections, the Urology Surgery cancer, leiomyoma of uterus, carcinoma of vagina, cyst of vagina, Vulvodynia and vulvovaginitis.
5.5 transcriptional state is measured
This part provides some illustrative methods of measuring as one type expression of gene level of cell component.Be proficient in those skilled in the art and will know the concrete grammar that the invention is not restricted to each biological gene expression in the multiple biology of following measurement.
5.5.1 adopting the transcript of microarray measures
The technology of this part description comprises provides the polynucleotide probes array that can be used to determine simultaneously a plurality of expression of gene levels.These technology also can be used to design and make this polynucleotide probes array.
Available any high-throughput techniques is measured the expression of nucleotide sequence in the gene.Yet when measuring, the result be transcript or reply data absolute magnitude also or relative quantity, include but not limited to abundance value or abundance ratio.Preferably, the measurement of expression figure is by finishing with the transcript hybridization array, as described in this part.In one embodiment, used " transcript array " or " preface type analysis array ".The transcript array can be used to the expression figure of analysis of cells sample, especially can be used to the expression figure of the cell sample measuring particular tissue type or state of development or be exposed to medicine interested.
In one embodiment, expression figure obtains by the polynucleotide (for example, from the synthetic fluorescently-labeled cDNA of the total mRNA of cell) that make detectable label, represent the nucleotide sequence of the mRNA transcript that exists in the cell and microarray hybridization.Microarray be a kind of on holder, the array in addressable by position combination (for example hybridization) site, a plurality of nucleotide sequences that it can present in cell or the biological gene group preferably present great majority or nearly all gene.In these binding sites each all is made up of the polynucleotide probes that is attached to this holder presumptive area.Can make microarray in many ways, certain methods wherein is described below.Yet during fabrication, microarray has some characteristic.This array is reproducible, thereby can make a plurality of copies of certain given array and be easy to mutual comparison.Preferably, microarray is by in conjunction with made stable under (for example nucleic acid hybridization) condition.Microarray is suitable less, for example at 1-25cm 2Between, preferably at 1-3cm 2Between.Yet, also greater or lesser array can be arranged, and it is for for example for to estimate the very big or very little different probe of number simultaneously be preferred.
Preferably, on the microarray particular series of certain given binding site or binding site with specificity in conjunction with (for example hybridization) from the nucleotide sequence in the individual gene of cell or tissue (for example in conjunction with from their specific mRNA or the extron of specific cDNA).
The microarray that uses can comprise one or more detector probe, and each detector probe all has the polynucleotide sequence with the subsequence complementation of RNA to be measured or DNA.Each probe has different nucleotide sequences usually, and the position of each probe on the solid surface of this array is normally known.In fact, this microarray is preferably addressable array, more preferably the position addressable array.Each probe of this array is preferably placed at the known predetermined location on the solid support, thereby can determine the identity (for example sequence) of each probe from its position on (for example on holder or surface) on this array.In some embodiments, described array is an oldered array.
Preferably, the density of the probe on microarray or the microarray series is every square centimeter 100 differences (for example inequality) probe, perhaps more.More preferably, be used for having 550 probes, at least 1 at least on every square centimeter of the microarray of the inventive method, 000 probe, at least 1,500 probes, at least 2,000 probe, at least 8,000 probe or at least 15,000 probe is perhaps more.Therefore, be used for microarray of the present invention and preferably contain at least 25,000, at least 50,000, at least 100,000, at least 150,000, at least 200,000, at least 250,000, at least 500,000 or at least 550,000 difference (for example inequality) probe.
In one embodiment, described microarray is the representative of a kind of wherein each position by the array (for example matrix) of the discrete binding site of the nucleotide sequence of the transcript of a gene code (for example by its mRNA that derives or cDNA extron).The set of binding site contains the binding site of a plurality of genes of many groups on the microarray.For example, in different embodiments, microarray of the present invention can comprise the binding site by the product of the gene code below 50% in certain biological gene group.Perhaps, microarray of the present invention can be had and contained binding site by the product of at least 50%, at least 75%, at least 85%, at least 90%, at least 95%, at least 99% or 100% gene code in certain biological gene group.In other embodiments, microarray of the present invention can have the binding site by the product of the gene code that is less than certain biological cellular expression of 50%, at least 50%, at least 75%, at least 85%, at least 90%, at least 95%, at least 99% or 100%.Described binding site can be the specific RNA DNA or the DNA analog of specific hybrid with it.Described DNA or DNA analog can be that for example, synthetic oligomer or genetic fragment are for example with corresponding oligomer of extron or genetic fragment.
In some embodiments of the present invention, the extron of gene or gene represented by a series of binding sites in the sequential analysis array, and described binding site comprises the probe that has with the different polynucleotide of the different sequence section complementation of this gene or extron.The length of this polynucleotide is preferably 15-200 base, and more preferably 20-100 base most preferably is 40-60 base.Each probe sequence except with the sequence of its target complement sequence also can contain joint sequence.In the present invention, joint sequence be and the sequence of its target complement sequence and the sequence between the support surface.For example, in preferred embodiments, preface type analysis array of the present invention comprises a probe that is specific to each target gene or extron.Yet if necessary, described preface type analysis array can contain at least 2,5,10,100 or 1000 or the probe that is specific to some target genes or extron more.For example, at a base spacing place, can the tile sequence of the longest mRNA spliced body of gene of the probe that described array contains.
In specific embodiment of the present invention, when an extron has other shearing variant, the polynucleotide probes that can comprise a continuous overlap of cover (sequence promptly tiles) in the preface type analysis array of this extron, this overlap is overlapping with the genome area of the longest variant that contains extron.At predetermined base spacing distance, for example in 1,5 or 10 base spacing distance, this cover polynucleotide probes can contain to be crossed over or tiling contains the continuous overlap of the mRNA of long variant.Therefore, this series probe can be used to scan the genome area that contains the extron variant, with the definite variant of expressing or the variant of this extron.In addition, can comprise in the extron preface type analysis array that a cover contains the polynucleotide probes of extron specific probe and/or variant bonding probes.In the present invention, the variant bonding probes is meant that the zone to specific extron variant and adjacent extron combination has specific probe.In some cases, described probe series contains the variant engages probe, its with all different shearing binding sequences of this extron in each specific hybrid takes place.In other cases, described probe series contains the specific probe of extron, its can with the common sequences generation specific hybrid in all different variants of this extron, and/or can with the different shearing binding sequence generation specific hybrids of this extron.
In some cases, extron is presented in the extron preface type analysis array by the probe that contains with the polynucleotide of total length extron complementation.In this case, show extron by the single binding site on the sequential analysis array.In some preferred situations, extron shows by the one or more binding sites on the preface type analysis array, and each binding site comprises the probe that has with the polynucleotide sequence of the RNA fragment complementation of the substantive part of target extron.The length of this probe is generally 15-600 base, is preferably 20-200 base, and more preferably 30-100 base most preferably is 40-80 base.The average length of extron be about 200 bases (referring to, for example, Lewin, GenesV, Oxford University Press, Oxford, 1994).Compare the short probe of length, length be 40-80 probe more specificity therefore increased the specificity of probe in conjunction with this extron to the target extron.For some gene, the sequence length of one or more target extrons can be less than 40-80 base.In these cases, be longer than the probe of target extron if used sequence length, then may need designing probe, make it comprise the sequence that contains complete target extron, the flanking sequence of this target extron is sheared extron from adjacent composing type, thereby makes corresponding sequence section complementation among this probe sequence and the mRNA.Employing is sheared the flanking sequence of extron from adjacent composing type and is not adopted the group flanking sequence of gene, and promptly intron sequences is compared with other probe with equal length, has more the hybridization preciseness.The composing type that the flanking sequence that uses is preferred adjacent is sheared extron or from the extron that does not participate in any other approach.The flanking sequence that uses does not more preferably comprise the signal portion of the sequence of adjacent extron or extron, and crisscrossing is minimized.In some embodiments, when the target extron that is shorter than desirable probe length participates in selecting to shear, designing probe contains flanking sequence in the mRNA that different selections is sheared, thereby measures the expression of the extron of expressing in the mRNA that different choice is sheared.
In some instances, in the time will distinguishing the intragenic extron of selecting shearing approach and/or separation and duplicate, described DNA array or array series also can comprise and probe across the sequence complementation of two adjacent extron bonding pads.Preferably, this probe comprises the sequence from two extrons, and each extron is not overlapping with probe basically, thereby crisscrossing is minimized.If extron appears at one or more selections and shears in mRNA and/or the one or more independent gene that contains the extron that duplicates, and do not appear among other mRNA that select to shear and/or contain in other gene of the extron that duplicates, the probe that then comprises the sequence of an above extron can be used for distinguishing the expression of the extron of selecting the shearing approach and/or duplicating in gene separately.Perhaps, at independent intragenic extron copy, if demonstrate diverse sequence homology from heterogeneic extron, then it preferably includes different probes, thereby can distinguish from heterogeneic extron.
Those skilled in the art will appreciate that above-mentioned any detecting probe method can unite the different arrays that are used for identical preface type analysis array and/or are used for identical preface type analysis sequence series, thereby can more accurately determine a plurality of expression of gene figure.Those skilled in the art can understand, and different detecting probe methods also can be used for the different sequential analysis of level of accuracy.For example, contain the preface type analysis array or the array series of the little probe series of each extron, can be used under given conditions determine that related gene and/or RNA shear approach.Contain the array of big probe series of extron interested or array series and then can be used to more accurately to determine extron expression figure under this specified conditions.Can more effectively use in other DNA array strategy of different probe method is also included within.
Preferably, be used for the binding site (being probe) that microarray of the present invention has the extron series of one or more genes, this gene is relevant with the effect of interested medicine, perhaps is in the interested biopathways." gene " described here is the part by the DNA of rna polymerase transcribe, and it can comprise 5 ' non-translational region (" UTR "), introne, extron and 3 ' UTR.Can be by the number of cell or the biological mRNA that expresses, perhaps by infer genome the part of well-characterized estimate the number of gene in the genome.If the genome of biology interested is checked order, can determine the number of ORF so and can identify the mRNA code area by the analyzing DNA sequence.For example, the genome of saccharomyces cerevisiae (Saccharomyces cerevisiae) is checked order fully, it is reported to contain the ORF coded sequence of 6,275 length greater than 99 amino acid residues of having an appointment.These ORF are analyzed learn, have 5,885 ORF may coded protein products (Goffeau etc., 1996, Science 274:546-567).On the contrary, estimate that human genome contains to have an appointment 30,000-130,000 gene (see Crollius etc., 2000, Nature Genetics 25:235-238; Ewing etc., 2000, Nature Genetics 25:232-234).The order-checking of other biological gene group, comprising but be not limited to fruit bat (Drosophila), Caenorhabditis elegans (C.elegans), plant such as plant and mammals such as the mouse and the mankind such as paddy rice and A Bu platymiscium, also finish or soon finish.Therefore, in the preferred embodiment of the invention, provide the array series of whole probes of the extron of the known or prediction of all that contain the biological gene group.A non-limiting example is, the invention provides the array series of one or two probe of the extron that contains each known or prediction of human genome.
Should understand, when making with the cDNA of the RNA complementation of cell and make it under suitable hybridization conditions during, in this array, will reflect that with the hybridization level in the corresponding site of extron of any specific gene this cell contains the ubiquity of one or more mRNA of the extron that this genetic transcription goes out with microarray hybridization.For example, when with the detectable label of the total mRNA complementation of cell (for example, with the fluorophore mark) when cDNA and microarray hybridization, on this array with the gene of not transcribing or during the RNA of cell shears, be removed (for example, can specificity in conjunction with one or more gene expression products) the corresponding site of extron will produce seldom or not produce signal (for example fluorescence signal), and in the gene of the extron that the mRNA of coding generally expresses, this extron will have stronger signal relatively.The signal intensity pattern of all extrons of monitoring by this gene is determined by homologous genes by selecting to shear the relative abundance of the different mRNA that produce then.
In one embodiment, adopt two color methods to make binding site hybridization from the cDNA and the microarray of the cell sample of two kinds of different conditions.In drug responses, a kind of cell sample is exposed to medicine and the cell sample of another kind of same type is not exposed to medicine.In replied in the path, a kind of cellular exposure was disturbed in the path and the cell of another kind of same type is not exposed to the path and disturbs.From the cDNA of each cell of two kinds of cell types by mark (for example using Cy3 and Cy5 mark) differently so that they can be distinguished.In one embodiment, for example,, synthesize from second kind of cDNA that is not exposed to the cell of medicine with the dNTP of rhodamine mark with the cDNA of synthetic (or be exposed to path the disturb) cell crossed from treated with medicaments of fluorescein-labeled dNTP.When with two kinds of cDNA mixing and with microarray hybridization, determined the relative signal intensity of each cDNA group on each position of array, and detected any relative different of specific extron abundance.
In the above-described embodiments, the cDNA of (or path disturb) cell of crossing from treated with medicaments when fluorophore is excited will present fluorescent green, and will present fluorescent red from the cDNA of untreated cell.Consequently, when transcribing and/or post transcription cleavage when not having direct or indirect influence of specific gene in the drug treating pair cell, then two kinds of intracellular extron expression patterns can't be distinguished, and red-label and cDNA Green Marker will be equivalent when reverse transcription.When with microarray hybridization, the binding site of RNA will be launched the characteristic wavelength of two kinds of fluorophores.On the contrary, when the drug treating of transcribing and/or transcribing back processing of specific gene is exposed to the cell of medicine in using direct or indirect change cell, the extron expression pattern of representing with the green and the red fluorescence ratio of each extron binding site will change.When this medicine increases the amount of a kind of mRNA, the ratio of each extron of expressing among the then this mRNA will rise, and when this medicine reduces the amount of a kind of mRNA, the ratio of each extron of expressing among the then this mRNA will descend.
Detection about mRNA, the existing description come the identified gene change of Expression with Two Colour Fluorescence mark and detection method, for example, at Schena etc., 1995, Quantitative monitoring of geneexpression patterns with a complementary DNA microarray, Science 270:467-470, for all purposes, the document is included in this paper as a reference in full.This method also can be used to mark and detects extron.Use is with the advantage of the cDNA of two kinds of different fluorophore marks, can directly also innerly controllably compare every kind of expression of arranging pairing mRNA of gene or extron in two kinds of cell states, and because the variation that the nuance of test condition (for example hybridization conditions) causes can not influence analysis subsequently.Yet, also can use cDNA, and compare (for example) cell that cross such as treated with medicaments or that disturb in the path and the absolute magnitude of the specific extron in the untreated cell from a kind of cell.In addition, the present invention has also considered to carry out mark with two or more colors.In some embodiments of the present invention, available at least 5,10,20 or 100 kind of dyes in different colors come mark.But this mark can make the cDNA group of separator hybridize simultaneously with identical array, and therefore measures and the optional expression that compares from the mRNA molecule of two or more samples.Spendable dyestuff includes but not limited to: fluorescein and derivant thereof, rhodamine and derivant thereof, red, the 5-carboxyl-fluorescein (" FMA "), 2 of Texas, the 7-dimethoxy-4 ', 5-two chloro-6-carboxyl-fluoresceins (" JOE "), N, N, N ', N '-tetramethyl-6-carboxyl-rhodamine (" TAMRA "), 6-carboxyl-X-rhodamine (" ROX "), HEX, TET, IRD40 and IRD41; Include but not limited to cyanamide (cyamine) dyestuff of Cy3, Cy3.5 and Cy5; Include but not limited to the BODIPY dyestuff of BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY-630/650 and BODIPY-650/670; And the ALEXA dyestuff that includes but not limited to ALEXA-488, ALEXA-532, ALEXA-546, ALEXA-568 and ALEXA-594; And be proficient in other fluorescent dye known to those skilled in the art.
In some embodiments of the present invention, thereby the hybridization data of having measured a plurality of different hybridization time can be determined the variation of balance hybridization level.In this embodiment, polynucleotide with mark are measured the hybridization level, most preferably in the hybridization time span scope that surpasses polynucleotide (being one or more probes) required time of taking a sample of combination, measuring from zero point, so that potpourri is near balance, and the concentration of duplex depends on affinity and abundance rather than diffusion.Yet hybridization time should be lacked as far as possible so that irreversible binding interactions does not take place between the polynucleotide of mark and probe and/or the surface, and it is limited perhaps making this effect at least.For example, in the embodiment with the complex mixture of the polynucleotide of polynucleotide array detections fragmentation, hybridization time is generally 0-72 hour.The suitable hybridization time of other embodiment will depend on specific polynucleotide sequence and used probe, and can by be proficient in those skilled in the art decision (referring to, for example, volumes such as Sambrook, 1989, Molecular Cloning.A Laboratory Manual, second edition, 1-3 volume, Cold SpringHarbor Laboratory, Cold Spring Harbor, New York).
In one embodiment, on different microarraies of the same race, measure the hybridization level of different hybridization time respectively.For every kind of such measurement, on the hybridization time of measuring the hybridization level, simple washing microarray, preferably at room temperature using high aqueous solution to intermediate salt concentration (for example, 0.5-3M salinity) to remove at the polynucleotide that can keep all combinations or hybridization under the condition of all unconjugated polynucleotide washs.Measure detectable label on the polynucleotide molecule of the hybridization that keeps on each probe with the method that is fit to used specific markers method then.Then gained is hybridized horizontal combination and formed hybrid curve.In another embodiment, hybridization level is measured in real time with single microarray.In this embodiment, make the hybridization of microarray interference-free and sample, and detect microarray in the mode of Noninvasive at each hybridization time point.In other embodiments, can use an array, short time hybridization, washing is also measured the hybridization level, and this array is joined in the same sample, hybridizes a period of time again, and washing is also measured once more to obtain the hybridization time curve.
Preferred at least two kinds of hybridization levels measuring in two different hybridization time, the measurement of the first hybridization level with the approaching hybridization time of crisscrossing balance time scope in carry out, the measurement of the second hybridization level is carried out in the hybridization time of being longer than first hybridization time.Crisscrossing balance time scope depends on sample condition and probe sequence, and can be determined by those skilled in the art.In preferred embodiments, the first hybridization level was measured in about 1-10 hour, and second hybridization was measured when 2,4,6,10,12,16,18,48 or 72 times of times of first hybridization time.
5.5.1.1 the probe of preparation microarray
As mentioned above, according to the present invention, with " probe " of specific polynucleotide molecule such as extron specific hybrid be complementary polynucleotide sequence.Be preferably each target extron and select one or more probes.For example, when the probe with minimal amount detected extron, this probe comprised the nucleotide sequence of length greater than 40 bases usually.Perhaps, when detecting extron with the redundant probe of a big group, this probe comprises the nucleotide sequence of 40-60 base usually.Described probe can also comprise the sequence with the complementation of total length extron.The length of extron can from less than 50 bases to greater than 200 bases.Therefore, when using length, preferably shear exon sequence this exon sequence that increases, thereby make probe sequence and contain the continuous mRNA fragment complementation of target extron with the adjacent set moulding greater than the probe of extron.Can make like this between the probe of extron preface type analysis array and have comparable hybridization preciseness.Should be understood that except with the sequence of its target complement sequence, each probe sequence also can contain joint sequence.
Described probe can comprise with the biological gene group in the corresponding DNA of part or the DNA " analogies " (for example, derivant or analog) of each extron of each gene.In one embodiment, the probe of microarray and RNA or the complementation of RNA analogies.Dna analog be by can with DNA specificity Watson-Crick sample hybridization or the polymkeric substance that constitutes with the subunit of RNA specific hybrid.Nucleic acid can or be modified on phosphate backbone at base portion, sugar moieties.Exemplary dna analog comprises, for example thiophosphate.For example, can obtain DNA from the extron fragment of genomic DNA, cDNA (for example passing through RT-PCR) or cloned sequence by PCR (PCR) amplification.The known array that is preferably based on extron or cDNA is selected the PCR primer, can amplify so unique fragment (for example, not with microarray on any other fragment share the homotactic fragment of external phase of 10 above bases).The computer program that is used for designing the primer with required specificity and best amplification characteristic well known in the art for example has Oligo 5.0 editions (National Biosciences).The length of each probe on the microarray is generally 20-600 base, and often is 30-200 base.PCR method is well known in the art, and is described in, for example, volumes such as Innis, 1990, family CRProtocols:A Guide to Methods andApplications, Academic Press Inc., San Diego is among the CA.Being proficient in those skilled in the art will know, available controlled robot system is separated and amplification of nucleic acid.
The optional method for optimizing that produces the microarray polynucleotide probes is polynucleotide or oligonucleotides (Froehler etc., 1986, the NucleicAcid Res.14:5399-5407 that synthesizes with for example N-phosphonate or phosphoramidite chemical method; McBride etc., 1983, Tetrahedron Lett.24:246-248).The length of composition sequence is generally 15-600 base, is more typically 20-100 base, most preferably is 40-70 base.In some embodiments, synthetic nucleic acid comprises the non-natural base, such as but not limited to inosine.As mentioned above, nucleic acid analog can be used as the binding site of hybridization.The example of suitable nucleic acid analog be peptide nucleic acid (referring to, for example, Egholm etc., 1993, Nature 363:566-568; And No. the 5th, 539,083, United States Patent (USP)).
In another embodiment, hybridization site (being probe) is (Nguyen etc., 1995, the Genomics 29:207-209) that clone gene, cDNA (for example sequence mark of Biao Daing) or its embolus from plasmid or bacteriophage make.
5.5.1.2. nucleic acid is attached to solid surface
The polynucleotide probes that has formed can be deposited on the holder to form array.Perhaps can on holder, form array by direct synthetic polyribonucleotides probe.Probe is attached to solid support or surface, this solid support or surface can by, for example glass, plastics (for example polypropylene, nylon), polyacrylamide, NC Nitroncellulose film, gel or other porous or pore-free material are made.
The method for optimizing that nucleic acid is attached to the surface is to print on glass plate, and this method is generally described in Schena etc., 1995, and Science 270:467-470.This method for preparation cDNA microarray especially effectively (also can be referring to DeRisi etc., 1996, Nature Genetics 14:457-460; Shalon etc., 1996, Genome Res.6:639-645; With Schena etc., 1995, Proc.Natl.Acad.Sci.U.S.A.93:10539-11286).
Second method for optimizing of making microarray is to make high density polynucleotide array.The known precalculated position that is manufactured on the surface, the technology of making the oligonucleotide arrays contain thousands of and predetermined sequence complementation is known, have the synthetic photolithography of original position (referring to, Fodor etc., 1991, Science 251:767-773; Lockhart etc., 1996, Nature Biotechnology 14:1675; U.S. Patent No. 5,578,832; Other method (Blanchard etc., the Biosensors ﹠amp of 5,556,752 and 5,510,270), or fast synthetic and polynucleotide that deposition is predetermined; Bioelectronics 11:687-690).When these methods of use, the oligonucleotides of known array (for example, 60 aggressiveness) is directly synthetic on surfaces such as the glass slide of deriving.The array that makes can be redundant, and each extron has a plurality of polynucleotide molecules.
Also can use other method of making microarray, for example, by shelter (Maskos and Southern, 1992, Nucl.Acids.Res.20:1679-1684).In principle, as indicated above, can use the array of any kind, for example nylon hybond membrane Dot blot hybridization (referring to Sambrook etc., the same).Yet, be proficient in those skilled in the art and will know, because the hybridization volume is less, less array is normally preferred.
In particularly preferred embodiments, microarray of the present invention is to make with the ink-jet printing apparatus of synthetic oligonucleotide, for example uses the open WO 98/41531 of disclosed international monopoly in 24 days September in 1998 of the method and system of describing in the following document: Blanchard; Blanchard etc., 1996, Biosensors and Bioelectronics 11:687-690; Blanchard, 1998, Synthetic DNAArray in Genetic Engineering, the 20th volume, J.K.Setlow compiles, Plenum Press, NewYork, 111-123 page or leaf; And the U.S. Patent number 6,028,189 of Blanchard.Specifically, the polynucleotide probes in this microarray preferably in the array on the glass slide for example, synthesizes by each nucleotide base in " droplet " of the contour surface tension solvent of successive sedimentation propylene carbonate.The volume of described droplet is less (for example, 100pL or littler, more preferably 50pL or littler) and be (for example, the passing through hydrophobic domains) that be separated from each other on microarray to form the circular surface tension force hole that limits array element (for example, different probe) position.Polynucleotide probes is covalently bound to the surface at 3 ' end of polynucleotide usually.Perhaps, polynucleotide probes can polynucleotide 5 ' end be covalently bound to the surface (referring to, for example, Blanchard, 1998, Synthetic DNA Array in Genetic Engineering 20, Setlow compiles, Plenum Press, New York, 111-123 page or leaf).
5.5.1.3. target polynucleotide molecule
Available method and composition of the present invention is analyzed target polynucleotide, comprising the RNA molecule, such as but not limited to mRNA (mRNA) molecule, rRNA (rRNA) molecule, cRNA molecule (i.e. cDNA molecule of transcribing in the body make RNA molecule) and fragment thereof.Also available method and composition of the present invention is analyzed target polynucleotide, comprising but their dna molecular of fragment etc. that is not limited to genome dna molecular, cDNA molecule and comprises oligonucleotides, EST, STS.
Described target polynucleotide can be from any source.For example, described target polynucleotide molecule can be the nucleic acid molecules of natural generation, as separating chromosomal DNA or the outer dna molecular of genome, or the RNA molecule from biosome, as separation from the mRNA of biosome molecule.Perhaps, described polynucleotide molecule can synthesize, comprising for example in vivo or external by the synthetic nucleic acid molecules of enzyme process, as the cDNA molecule, or by the synthetic polynucleotide molecule of PCR, by the synthetic RNA molecule of in-vitro transcription etc.The target polynucleotide sample can comprise, for example, and the molecule of dna molecular, RNA molecule or DNA and RNA multipolymer.In preferred embodiments, target polynucleotide of the present invention will with special genes or special genes transcript (for example, with cell in the specific mRNA sequence expressed or with specific cDNA sequence derived from this mRNA sequence) corresponding.Yet in many embodiments, in those embodiments derived from mammalian cell of polynucleotide molecule especially therein, described target polynucleotide can be corresponding with the specific fragment of genetic transcription thing.For example, described target polynucleotide can be corresponding with the different extrons of homologous genes, thereby the difference that can detect and/or analyze this gene is sheared variant.
In preferred embodiments, target polynucleotide to be analyzed is with extracting from the nucleic acid of cell in external preparation.For example, in the embodiment, RNA extracts from cell (for example, total cell RNA, poly (A) +MRNA or its fragment), and mRNA is from total RNA purifying of extracting.Prepare total RNA and poly (A) +The method of RNA is well known in the art and usually as described in Sambrook etc., and is the same.In one embodiment, by from cell, extracting the interested various types of RNA of the present invention with guanidine thiocyanate molten born of the same parents, pass through then CsCl centrifugal and with few dT purifying from (Chirgwin etc., 1979, Biochemistry 18:5294-5299).In another embodiment, carry out purifying at Rneasy post (Qiagen) then by from cell, extracting RNA with the molten born of the same parents of guanidine thiocyanate.From the mRNA of purifying, analyze cDNA with for example widow-dT or random primer then.In preferred embodiments, described target polynucleotide is from the cRNA of extraction from the mRNA preparation of the purifying of cell.In the present invention, cRNA is defined as and the RNA of the RNA complementation of originating.The RNA that extracts increase with a kind of method, and in the method, double-stranded cDNA is carrying out on the direction that antisense RNA transcribes, and usefulness is connected to that the primer of rna polymerase promoter synthesizes from RNA.Use RNA polymerase then, from second chain of double-stranded cDNA transcribe antisense RNA or cRNA (referring to, for example, U.S. Patent number 5,891,636,5,716,785,5,545,522 and 6,132,997; Also can be referring to U.S. Patent number 6,271, the U.S. Provisional Patent Application sequence number 60/253,641 of the Ziman that on November 28th, 002 and 2002 submitted to etc.).Widow-dT the primer (U.S. Patent number 5,545,522 and 6,132,997) or the random primer (the U.S. Provisional Patent Application sequence numbers 60/253,641 of the Ziman that on November 28th, 2002 submitted to etc.) that contain rna polymerase promoter or its complement all can use.Preferably, target polynucleotide is the polynucleotide molecule of weak point of original nucleic acid species of the described cell of representative and/or the polynucleotide molecule of fragmentation.
Preferably, can be detected ground mark with the target polynucleotide of method and composition analysis of the present invention.For example, cDNA can directly carry out mark with for example nucleotide analog, perhaps uses first chain as template, taps into row labels with the 2nd cDNA interchain of mark.Perhaps double-stranded cDNA can be transcribed the row labels of going forward side by side into cRNA.
Preferably, detectable mark is a fluorescence labeling, for example, mixes nucleotide analog.Being suitable for other mark of the present invention includes but not limited to: biotin, immune biotin, antigen, co-factor, dinitrophenol dinitrophenolate, lipoic acid, vinyl compound, detectable polypeptide, be rich in the molecule of electronics, can produce the enzyme and the radioactive isotope of detectable signal by the effect to substrate.Preferred radioactive isotope comprises 32P, 35S, 14C, 15N and 125I.Being suitable for fluorescence molecule of the present invention includes but not limited to: fluorescein and derivant thereof, rhodamine and derivant thereof, red, the 5-carboxyl-fluorescein (" FMA "), 2 of Texas, the 7-dimethoxy-4 ', 5-two chloro-6-carboxyl-fluoresceins (" JOE "), N, N, N ', N '-tetramethyl-6-carboxyl-rhodamine (" TAMRA "), 6-carboxyl-X-rhodamine (" ROX "), HEX, TET, IRD40 and IRD41.Being suitable for fluorescence molecule of the present invention also comprises: cyanamide (cyamine) dyestuff that includes but not limited to Cy3, Cy3.5 and Cy5; Include but not limited to the BODIPY dyestuff of BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY-630/650 and BODIPY-650/670; And the ALEXA dyestuff that includes but not limited to ALEXA-488, ALEXA-532, ALEXA-546, ALEXA-568 and ALEXA-594; And be proficient in other fluorescent dye known to those skilled in the art.Being suitable for the indication molecule that is rich in electronics of the present invention includes but not limited to: ferritin, hemocyanin and collaurum.Perhaps, not very in the embodiment preferred, can first group and polynucleotide specificity are compound comes the labels targets polynucleotide by making.Can be used to the indirect detection target polynucleotide with covalently bound second group of indication molecule that first group is had affinity.In this embodiment, the compound that is suitable as first group includes but not limited to biotin and immune biotin.The compound that is suitable as second group includes but not limited to avidin and streptavidin.
5.5.1.4. with microarray hybridization
As mentioned above, select nucleic acid hybridization and wash conditions, make the complementary polynucleotide sequence of the present invention's polynucleotide molecule to be analyzed (being called " target polynucleotide molecule ") and array at this, preferably combine or specific hybrid with specific array locus specificity, wherein its complementary DNA is located.
The array that contains location double-chain probe DNA thereon preferably is placed in the sex change condition, so that DNA first single stranded before contact target polynucleotide molecule.The array that contains the dna probe (for example, synthetic few DNA (deoxyribonucleic acid)) of single stranded may be by sex change before contact target polynucleotide molecule, for example, to remove because hair clip or dimer that self complementary series forms.
Best hybridization conditions depends on the length oligomer and the polynucleotide of 200 bases (for example, greater than) and the type (for example, RNA or DNA) of probe and target nucleic acid.The general elaboration of specific (being rigorous) nucleic acid hybridization parameter is described in (the same) such as Sambrook and Ausubel etc., 1987, CurrentProtocols in Molecular Biology, Greene Publishing and Wiley-Interscience is among the NewYork.When using the cDNA microarray of Schena etc., typical hybridization conditions is in 65 ℃ of hybridization 4 hours in 5 * SSC+0.2%SDS, (1 * SSC+0.2%SDS) washs with low preciseness lavation buffer solution at 25 ℃ then, (0.1 * SSC+0.2%SDS) washs 10 minutes (Schena etc. with higher preciseness lavation buffer solution at 25 ℃ then, 1996, Proc.Natl.Acad.Sci.U.S.A.93:10614).Effectively hybridization conditions is as described in the following document, for example, Tijessen, 1993, HybridizationWith Nucleic Acid Probes, Elsevier Science Publishers B.V. and Kricka, 1992, Nonisotopic DNA Probe Techniques, Academic Press, San Diego, CA.
Particularly preferred be used for screening chip of the present invention and/or make its hybridization conditions that produces signal be included near under the temperature of the average temperature in succession of probe (for example, in 5 ℃, in preferred 2 ℃) in 1M NaCl, 50mM MES damping fluid (pH 6.5), 0.5% sodium sarcosinate and 30% formamide, hybridize.
5.5.1.5. input and data analysis
Should be appreciated that, when making with the target sequence (for example cDNA or cRNA) of the RNA complementation of cell and make it under suitable hybridization conditions during, and will reflect the abundance that contains in this cell from one or more mRNA of the extron of this genetic transcription with the hybridization level in the corresponding site of extron of any specific gene in the array with microarray hybridization.For example, when make with the detected ground mark of total mRNA complementation of cell (for example, use fluorophore) when cDNA and microarray hybridization, to produce seldom with the corresponding site of the extron of the gene of not transcribing or during the RNA of cell shears, be removed (one or more expression products that promptly can specificity combine this gene) in this array or (for example do not produce signal, and this extron that the mRNA of coding generally expresses the gene of extron will have stronger signal relatively fluorescence signal).Determine by the signal intensity pattern of all extron series of monitoring for this gene then, by homologous genes by selecting to shear the relative abundance of the different mRNA that produce.
In preferred embodiments, from the target sequence of two kinds of different cells, for example cDNA or cRNA are hybridized with the binding site of microarray.In drug responses, a kind of cell sample is exposed to medicine and the cell sample of another kind of same type is not exposed to medicine.In replied in the path, a kind of cellular exposure was disturbed in the path and the cell of another kind of same type is not exposed to the path and disturbs.From each cDNA in two kinds of cell types by isolabeling not so that they can be distinguished.In one embodiment, for example, synthesized the cDNA of (or be exposed to path disturb) cell of crossing from treated with medicaments, synthesized the cDNA that is not exposed to the cell of medicine from second kind with the dNTP of rhodamine mark with fluorescein-labeled dNTP.When with two kinds of cDNA mixing and with microarray hybridization, determined the relative signal intensity of each cDNA group on each position of array, and detected any relative different of specific extron abundance.
In the above-described embodiments, the cDNA of (or path disturb) cell of crossing from treated with medicaments when fluorophore is excited will present fluorescent green, and will present fluorescent red from the cDNA of untreated cell.Consequently, when transcribing and/or post transcription cleavage when not having direct or indirect influence of specific gene in the drug treating pair cell, then two kinds of intracellular extron expression patterns can't be distinguished, and red-label and cDNA Green Marker will be equivalent when reverse transcription.When with microarray hybridization, the binding site of RNA will be launched the characteristic wavelength of two kinds of fluorophores.On the contrary, when the drug treating of transcribing and/or transcribing back processing of specific gene is exposed to the cell of medicine in using direct or indirect change cell, the extron expression pattern of representing with the green and the red fluorescence ratio of each extron binding site will change.When this medicine increases the amount of a kind of mRNA, the ratio of each extron of expressing among the then this mRNA will rise, and when this medicine reduces the amount of a kind of mRNA, the ratio of each extron of expressing among the then this mRNA will descend.
About the detection of mRNA, the existing description come the identified gene change of Expression with Two Colour Fluorescence mark and detection method, for example, at Schena etc., 1995, Science 270:467-470, for all purposes, the document is included in this paper as a reference in full.This method also can be used to mark and detects extron.Use target sequence with two kinds of different fluorophore marks, advantage as cDNA or cRNA is, can be directly and inside controllably relatively in two kinds of cell states, the expression of pairing mRNA of the gene of every kind of arrangement or extron, and because the variation that the nuance of test condition (for example, hybridization conditions) causes can not influence analysis subsequently.Yet, also can use cDNA, and compare the absolute magnitude of specific extron in (for example) cell that cross such as treated with medicaments or that disturb in the path and the untreated cell from a kind of cell.
When using fluorescently-labeled probe, the fluorescent radiation in each site of transcript array can preferably detect by the scanning confocal laser microscope.In one embodiment, use suitable excitation line that in employed two fluorophores each is carried out independent scanning.Perhaps can use laser, thereby can under the specificity wavelength of these two kinds of fluorophores, shine sample simultaneously, and can analyze the emission (see Shalon etc., 1996, Genome Res.6:639-645) of these two kinds of fluorophores simultaneously.In preferred embodiments, this array scans with lasing fluorescence scanning imaging instrument with computer-controlled X-Y dressing table and micro objective.With the continuous agitation of two kinds of fluorophores of multi-thread mixed gas laser acquisition, split emission light according to wavelength and also detect with two photomultipliers.This fluorescence laser scanning device is described in, for example, Schena etc., 1996, among the Genome Res.6:639-645.Perhaps, the fibre-optic bundle of available Ferguson etc. (1996, Nature Biotech.14:1681-1684) description is monitored the mRNA abundance level in a large amount of sites simultaneously.
In preferred embodiments, signal is for example used 12 bit emulators of digiboard by computer recording and analysis.In one embodiment, scanning image with graphic package (for example, HijaakGraphics Suite) noise reduction uses visual grid program (image gridding program) to analyze then, and this program can produce the spreadsheet of the average hybridization of each site under each wavelength.If necessary, the checking property correction that can experimentize to " crosstalking " (or overlapping) between two kinds of fluorescent dye passages.To any specific hybridization site on the transcript array, can calculate the emission ratio of two kinds of fluorophores.This ratio and homogenic absolute expression levels are irrelevant, but the gene that significantly regulated and control by administration, gene delection or other any incident to be checked for its expression is useful.
The method according to this invention, the relative abundance of the mRNA and/or the extron of expressing in the mRNA of two kinds of cells or clone are registered as and are subjected to disturb (i.e. the abundance differences in two kinds of tested mRNA sources) or (being that relative abundance is identical) without interruption.In the present invention, the difference factor between two kinds of RNA originate is at least 25% (for example, a kind of RNA of source is bigger by 25 than the abundance in another kind of source), be more typically 50%, be more typically 2 (for example many twices), 3 (many three times) or 5 (many 5 times), this is registered as interference.Present detection method can reliable detection 1.5 times to 3 times difference.
Yet, detect mRNA and/or the relative different amplitude of the extron abundance expressed also is useful in the mRNA of two kinds of cells or clone.As mentioned above, this can by calculating be used for separator two kinds of fluorophores emission than or undertaken by the comprehensible similar approach of those skilled in the art.
5.5.2 measure other method of transcriptional state
The transcriptional state of cell can be measured by other gene expression technique known in the art.Some generations in this technology have the storehouse of the restriction fragment that is used for electrophoretic analysis of limited complicacy, as method that the digestion of two Restriction Enzymes and phasing primer are combined (referring to, for example, Zabeau equals the European patent 534858A1 of submission on September 24th, 1992), or select to have with the method for the restriction fragment in the predetermined terminal immediate site of mRNA (referring to, for example, Prashar etc., 1996, Proc.Natl.Acad.Sci.USA 93:659-663).Other method comprises from cDNA storehouse statistics takes a sample, as among a plurality of cDNA each enough bases (for example, 20-50 base) check order to identify each cDNA, or on corresponding to the known location of predetermined mRNA end to the short mark that produces (for example, 9-10 base) check order (see, for example, Velculescu, 1995, Science 270:484-487).
The transcriptional state of cell also can be measured by reverse transcription-PCR (RT-PCR).RT-PCR is a kind of technology that detects and quantize mRNA.The susceptibility of RT-PCR is enough to quantize the RNA of individual cells.Referring to, for example, Pfaffl and Hageleit, 2001, Biotechnology Letters 23,275-282; Tadesse etc., 2003, Mol Genet Genomics 269, p.789-796; And Kabir and Shimizu, 2003, J.Biotech.9, p.105.
5.6 the measurement of biological condition others
In the different embodiment of the present invention, can measure the biological condition aspect outside the transcriptional state, for example translate state, activated state or mixing aspect.Therefore, in these embodiments, cell component abundance data can comprise the measured value or even the measured value of protein expression of translation state.The details of the biological condition aspect outside the transcriptional state has been described in this part.
5.6.1 the measurement of translation state
Available some kinds of methods are measured the translation state.For example, can (for example carry out protein by making up microarray, " protein group ") genome detect, that the binding site of this microarray comprises is fixing, preferably monoclonal, the multiple proteins by the genome encoding of described cell is had specific antibody.Preferably, antibody presents the substantive part of encoded protein matter or presents those at least and the effect proteins associated matter of medicine interested.Make monoclonal antibody method and be know (referring to, for example, Harlow and Lane, 1988, Antibodies:A Laboratory Manual, Cold Spring Harbor, New York is included in this paper in full for all purpose document).In one embodiment, the monoclonal antibody of cultivation has resistance to the synthetic polypeptide fragment that designs based on the genome sequence of cell.Use this antibody array, can contact with this array from the protein of this cell, and detect their combination with assay method known in the art.
Perhaps, protein can separate by the two-phase gel electrophoresis system.The two-phase gel electrophoresis is well known in the art, the isoelectric focusing that generally includes first phase and second mutually the SDS-PAGE electrophoresis.Referring to, for example, Hames etc., 1990, Gel Electrophoresis of Proteins:A Practical Approach, IRL Press, New York; Shevchenko etc., 1996, Proc.Natl.Acad.Sci.USA93:1440-1445; Sagliocco etc., 1996, Yeast 12:1519-1533; Lander, 1996, Science274:536-539.The gained electrophoretogram can be analyzed with multiple technologies, comprising the immunoblotting assay of mass-spectrometric technique, Western trace and use polyclone and monoclonal antibody, and the inner and terminal micrometering preface of N-.Use these technology can identify under given physiological condition, be included in the cell (for example, yeast) that is exposed to medicine or the substantive part of all proteins that in by the cell that for example disappearance or overexpression specific gene are modified, produces.
5.6.2 the measurement of other type of cell component abundance
Method of the present invention can be used for any cell component of monitoring.For example, when measuring protein active, embodiment of the present invention can adopt this assay method.Available any function, biochemistry or the physical method that is suitable for the given activity that characterized measured activity.When activity relates to chemical conversion, can make cell protein contact natural substrate and measure conversion ratio.When activity relates to contact between a plurality of measuring units, when for example Huo Hua DNA is in conjunction with the concerning of complex and DNA, can measure the amount of protein involved or the second level outcome of this association, as the amount of the mRNA that transcribes.Equally, when only known function is active, for example in cell cycle control, can observe the performance of this function.Although be knownly with through measuring, the change of protein active constitutes the reply data with said method analysis of the present invention.
In some embodiments of the present invention, the measurement of cell component is from the cell phenotype technology.One of this cell phenotype technology uses the respiration of cell as common index.In one embodiment, the 96 hole microtiter plates that in each hole, contain its peculiar chemical substance are provided.Every kind of peculiar chemical substance design is detected specific phenotype.The cell of biology interested is drawn in each hole.If cell demonstrates suitable phenotype, they will be breathed and initiatively reduce tetrazolium dye, form very dark purple.Weak phenotype will cause more shallow color.Colourless this cell that means does not have specific phenotype.Can be with for several times frequency record change color per hour.When an incubation, can detect the phenotype more than 5000 kinds.Referring to, for example, Bochner etc., 2001, Genome Research 11, p.1246.
In some embodiments of the present invention, the measurement of cell component is from the cell phenotype technology.One of this cell phenotype technology uses the respiration of cell as common index.In one embodiment, the 96 hole microtiter plates that in each hole, contain its peculiar chemical substance are provided.Every kind of peculiar chemical substance design is detected specific phenotype.The cell of biology interested is drawn in each hole.If cell demonstrates suitable phenotype, they will be breathed and initiatively reduce tetrazolium dye, form very dark purple.Weak phenotype will cause more shallow color.Colourless this cell that means does not have specific phenotype.Can be with for several times frequency record change color per hour.When an incubation, can detect the phenotype more than 5000 kinds.Referring to, for example, Bochner etc., 2001, Genome Research 11, p.1246.
In some embodiments of the present invention, the cell component of measurement is a metabolin.Metabolin includes but not limited to: amino acid, metal, soluble saccharide, sugar phosphate and complex carbohydrate.This metabolin can be used such as following method at for example full cellular level and measure: pyrolysis mass spectrum (Irwin, 1982, Analytical Pyrolysis:A Comprehensive Guide, Marcel Dekker, New York; Meuzelaar etc., 1982, Pyrolysis MassSpectrometry of Recent and FossilBiomaterials, Elsevier, Amsterdam), Fourier transform infrared spectroscopy (Griffiths and deHaseth, 1986, Fourier transform infrared spectrometry, John Wiley, New York; Helm etc., 1991, J.Gen.Microbiol.137,69-79; Naumann etc., 1991, Nature351,81-82; Naumann etc., 1991, select from Nelson, W.H. the Modern techniquesfor rapid microbiological analysis of Bianing, 43-96, VCH Publishers, New York), Raman spectrum, gas chromatography-mass spectrum (GC-MS) (Fiehn etc., 2000, Nature Biotechnology 18,1157-1161), Capillary Electrophoresis (CE)/MS, high performance liquid chromatography/mass spectrum (HPLC/MS) and liquid chromatography (LC)-electrospray and kapillary-LC-series connection-electrospray mass spectrum.These methods can be united with the existing chemical measure that uses artificial neural networks and genetic program and distinguished extremely relevant sample.
5.7 the use of assay kit
In one embodiment, can implement method of the present invention with exploitation and use biology sorter by using kit.This kit contains microarray, as described in above chapters and sections.The contained microarray of this kit comprises solid phase, surface for example, probe in the known location of described solid phase with its hybridization or combine.Preferably, these probes are made of known not homotactic nucleic acid, and every kind of nucleic acid can be hybridized with RNA kind that produces this nucleic acid or cDNA kind.In specific embodiment, in the kit of the present invention contained probe be can with the nucleic acid of the nucleotide sequence specific hybrid of the RNA kind of the cell that comes in interested biology, to collect.
In preferred embodiments, kit of the present invention also contains one or more above and coded data structure and/or the software module on computer-readable medium described in Fig. 1-3 and/or 5, and/or the telecommunication network computing machine uses above-mentioned access of database mandate.
In another preferred embodiment, kit of the present invention contains the software in the storer that can be loaded into computer system, and described storer is as indicated above and be shown in Fig. 1.Software contained in the kit of the present invention is identical with the software of above describing in conjunction with Fig. 1 in itself.
Other kit that is used for implementing analytical approach of the present invention is conspicuous for being proficient in those skilled in the art, and is included within the additional claim scope.
6. the reference of quoting
For all purposes, all references of quoting are here included in this paper as a reference in full, and just as for all purposes, it is the same as a reference that each publication or patented claim are included in this paper one by one and separately in full.
The present invention can be implemented by computer program, and this product comprises the computer program mechanism in the embeddeding computer readable storage medium storing program for executing.For example, described computer program can comprise the database schema shown in program module shown in Figure 1 and/or Fig. 2 and 3.These program modules can be stored in CD-ROM, disk storage product or any other mechanized data or the program stored prod.Software module in the described computer program also can be scattered by the Internet by electronic method, perhaps scatters by transmit computer data signal (wherein embedded software module) on carrier wave.
One of ordinary skill in the art obviously should be appreciated that, can carry out some changes and modification to it according to technology disclosed by the invention, and these changes and modification are still in the spirit and scope of claims.Particular described herein only provides as an example, and scope of the present invention only determines by accessory claim and with its scope of equal value fully.

Claims (139)

1. computing machine, this computing machine comprises:
CPU (central processing unit);
With the storer that this CPU (central processing unit) is connected, this memory storage:
(i) receive the instruction of data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of the model in a plurality of models, it is feature that wherein said model is kept the score with model, the keep the score possibility of the biological property in representative described test organism or the test organism sample of this model, the described calculating of wherein said model comprise with one or more features of one or more cell components in the various kinds of cell composition determines that model keeps the score;
Thereby (iii) repeat the instruction that described computations one or many calculates described a plurality of models; With
The instruction that (iv) makes each described model of calculating in the described computations keep the score and communicate.
2. computing machine as claimed in claim 1 is characterized in that, two or more models are kept the score and communicated by described communication instruction, and each model during wherein said two or more models are kept the score is kept the score corresponding to the different models in described a plurality of models.
3. computing machine as claimed in claim 1 is characterized in that, five or more a plurality of model are kept the score and communicated by described communication instruction, and each model during wherein said five or more a plurality of model are kept the score is kept the score corresponding to the different models in described a plurality of models.
4. computing machine as claimed in claim 1 is characterized in that, the instruction of described reception data comprises the instruction that receives described data by wide area network from remote computer.
5. computing machine as claimed in claim 4 is characterized in that described wide area network is the Internet.
6. computing machine as claimed in claim 1 is characterized in that, described communication instruction comprises by wide area network each described model being kept the score and sends to the instruction of remote computer.
7. computing machine as claimed in claim 6 is characterized in that described wide area network is the Internet.
8. computing machine as claimed in claim 1 is characterized in that, when described model was kept the score first scope at score value, described test organism or test organism sample were considered to have the biological property of the model representative in a plurality of models; And when described model was kept the score second scope at score value, described test organism or test organism sample were considered to not have the biological property of this model representative.
9. computing machine as claimed in claim 1 is characterized in that described biological property is a disease.
10. computing machine as claimed in claim 9 is characterized in that described disease is a cancer.
11. computing machine as claimed in claim 9 is characterized in that, described disease is breast cancer, lung cancer, prostate cancer, colorectal cancer, oophoroma, carcinoma of urinary bladder, cancer of the stomach or the carcinoma of the rectum.
12. computing machine as claimed in claim 1 is characterized in that, described a plurality of models comprise that keeping the score with first model is first model of feature and to keep the score with second model be second model of feature; And the identity that its one or more features are used to calculate the cell component that described first model keeps the score is different from the identity that its one or more features are used to calculate the cell component that described second model keeps the score.
13. computing machine as claimed in claim 1, it is characterized in that, be used for determining that feature in described one or more features of one or more cell components that the model of the model in described a plurality of models is kept the score comprises the abundance of described one or more cell components in the described test organism sample of the described test organism of described species or described species biology.
14. computing machine as claimed in claim 1 is characterized in that, described species are people.
15. computing machine as claimed in claim 1 is characterized in that, described test organism sample is from the biopsy of the sample of tumour, blood, bone, mammary gland, lung, prostate, colorectum, ovary, bladder, stomach or rectum or other form.
16. computing machine as claimed in claim 1, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 100 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
17. computing machine as claimed in claim 1, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 500 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
18. computing machine as claimed in claim 1, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 5,000 kind of cell component in the described test organism sample of the described test organism of described species or the described biology of described species.
19. computing machine as claimed in claim 1, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise 1 in the described test organism sample of the described test organism of described species or the described biology of described species, 000-20, the cell component abundance of 000 kind of cell component.
20. computing machine as claimed in claim 1 is characterized in that, the cell component in the described various kinds of cell composition is mRNA, cRNA or cDNA.
21. computing machine as claimed in claim 1, it is characterized in that, cell component in described one or more cell components is nucleic acid or RNA (ribonucleic acid), and the feature in described one or more features of described cell component is to obtain by the transcriptional state of measuring the described cell component of all or part in described test organism or the described test organism sample.
22. computing machine as claimed in claim 1, it is characterized in that, cell component in described one or more cell components is a protein, and the feature in described one or more features of described cell component is to obtain by the translation state of measuring the described cell component in described test organism or the described test organism sample.
23. computing machine as claimed in claim 1, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to use sample available from test organism or test organism sample to carry out the analysis of cell component tandem mass spectrum then with the isotope affinity labeling to determine.
24. computing machine as claimed in claim 1, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to determine by the activity of the sample of experiment with measuring biology or the cell component in the test organism sample or posttranslational modification.
25. computing machine as claimed in claim 1 is characterized in that, described biological property is a drug susceptibility.
26. computing machine as claimed in claim 1 is characterized in that, a plurality of models of keeping the score with described computations computation model are represented in two or more biological properties the possibility of each jointly.
27. computing machine as claimed in claim 26 is characterized in that, each biological property in described two or more biological properties is the cancer source.
28. computing machine as claimed in claim 26 is characterized in that, described two or more biological properties comprise first disease and second disease.
29. computing machine as claimed in claim 1 is characterized in that, a plurality of models of keeping the score with described computations computation model are represented each the possibility in five or the more a plurality of biological property jointly.
30. computing machine as claimed in claim 29 is characterized in that, each biological property in described five or the more a plurality of biological property is the cancer source.
31. computing machine as claimed in claim 29 is characterized in that, described five or more a plurality of biological property comprise first disease and second disease.
32. computing machine as claimed in claim 1 is characterized in that, represents 2-20 biological property possibility separately jointly with a plurality of models that described computations computation model is kept the score.
33. computing machine as claimed in claim 32 is characterized in that, each biological property in the described 2-20 biological property is the cancer source.
34. computing machine as claimed in claim 32 is characterized in that, a described 2-20 biological property comprises first disease and second disease.
35. computing machine, this computing machine comprises:
CPU (central processing unit);
With the storer that this CPU (central processing unit) is connected, this memory storage:
(i) receive the instruction of data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of a plurality of models, it is feature that each model in wherein said a plurality of model is kept the score with model, this model possibility of the biological property in representative described test organism or the test organism sample of keeping the score, and calculate single model in described a plurality of model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that the model relevant with this single model keep the score; With
The instruction that (iii) makes each described model of calculating by described computations keep the score and communicate.
36. unite the computer program of use with computer system for one kind, this computer program comprises computer-readable recording medium and embeds wherein computer program mechanism that this computer program mechanism comprises:
(i) receive the instruction of data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of the model in a plurality of models, it is feature that wherein said model is kept the score with model, the keep the score possibility of the biological property in representative described test organism or the test organism sample of this model, and the described model of described calculating comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that this model keeps the score;
Thereby (iii) repeat the instruction that described computations one or many calculates a plurality of models; With
The instruction that (iv) makes each described model of in described computations, calculating keep the score and communicate.
37. computer program as claimed in claim 36, it is characterized in that, two or more models are kept the score and are communicated by described communication instruction, and wherein said two or more models each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
38. computer program as claimed in claim 36, it is characterized in that, five or more a plurality of model are kept the score and are communicated by described communication instruction, and wherein said five or more a plurality of model each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
39. computer program as claimed in claim 36 is characterized in that, when described model was kept the score first scope at score value, described test organism or test organism sample were considered to have the biological property of the model representative in a plurality of models; And when described model was kept the score second scope at score value, described test organism or test organism sample were considered to not have the biological property of this model representative.
40. computer program as claimed in claim 36 is characterized in that, described biological property is a disease.
41. computer program as claimed in claim 40 is characterized in that, described disease is a cancer.
42. computer program as claimed in claim 40 is characterized in that, described disease is breast cancer, lung cancer, prostate cancer, colorectal cancer, oophoroma, carcinoma of urinary bladder, cancer of the stomach or the carcinoma of the rectum.
43. computer program as claimed in claim 36 is characterized in that, described a plurality of models comprise that keeping the score with first model is first model of feature and to keep the score with second model be second model of feature; And the identity that its one or more features are used to calculate the cell component that described first model keeps the score is different from the identity that its one or more features are used to calculate the cell component that described second model keeps the score.
44. computer program as claimed in claim 36, it is characterized in that, be used for determining that feature in described one or more features of one or more cell components that the model of the model in described a plurality of models is kept the score comprises the abundance of described one or more cell components in the described test organism sample of the described test organism of described species or this species biology.
45. computer program as claimed in claim 36 is characterized in that, described species are people.
46. computer program as claimed in claim 36, it is characterized in that described test organism sample is from the biopsy of the sample of tumour, blood, bone, mammary gland, lung, prostate, colorectum, ovary, bladder, stomach or rectum or other form.
47. computer program as claimed in claim 36, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 100 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
48. computer program as claimed in claim 36, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 500 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
49. computer program as claimed in claim 36, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 5,000 kind of cell component in the described test organism sample of the described test organism of described species or the described biology of described species.
50. computer program as claimed in claim 36, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise at least 1 in the described test organism sample of the described test organism of described species or the described biology of described species, 000-20, the cell component abundance of 000 kind of cell component.
51. computer program as claimed in claim 36 is characterized in that, the cell component in the described various kinds of cell composition is mRNA, cRNA or cDNA.
52. computer program as claimed in claim 36, it is characterized in that, cell component in described one or more cell components is nucleic acid or RNA (ribonucleic acid), and the feature in described one or more features of described cell component is to obtain by the transcriptional state of measuring the described cell component of all or part in described test organism or the described test organism sample.
53. computer program as claimed in claim 36, it is characterized in that, cell component in described one or more cell components is a protein, and the feature in one or more features of described cell component is obtaining by the translation state of measuring the described cell component in described test organism or the described test organism sample.
54. computer program as claimed in claim 36, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to use sample available from test organism or test organism sample to carry out the analysis of cell component tandem mass spectrum then with the isotope affinity labeling to determine.
55. computer program as claimed in claim 36, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to determine by the activity of the sample of experiment with measuring biology or the cell component in the test organism sample or posttranslational modification.
56. computer program as claimed in claim 36 is characterized in that, described biological property is a drug susceptibility.
57. unite the computer program of use with computer system for one kind, this computer program comprises computer-readable recording medium and embeds wherein computer program mechanism that this computer program mechanism comprises:
(i) receive the instruction of data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of a plurality of models, it is feature that each model in wherein said a plurality of model is kept the score with model, this model possibility of the biological property in representative described test organism or the test organism sample of keeping the score, and calculate single model in described a plurality of model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that the model relevant with this single model keep the score; With
The instruction that (iii) makes each described model of in described computations, calculating keep the score and communicate.
58. a method, this method comprises:
Receive data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
Calculate the model in a plurality of models, it is feature that wherein said model is kept the score with model, the keep the score possibility of the biological property in representative described test organism or the test organism sample of this model, and the described model of described calculating comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that this model keeps the score;
Thereby repeat described computations one or many and calculate described a plurality of model; With
Each the described model that calculates in described calculating is kept the score to be communicated.
59. method as claimed in claim 58, it is characterized in that, two or more models are kept the score and are communicated by described communication steps, and wherein said two or more models each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
60. method as claimed in claim 58, it is characterized in that, five or more a plurality of model are kept the score and are communicated by described communication instruction, and wherein said two or more models each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
61. method as claimed in claim 58 is characterized in that, when described model was kept the score first scope at score value, described test organism or test organism sample were considered to have the biological property of the model representative in a plurality of models; And when described model was kept the score second scope at score value, described test organism or test organism sample were considered to not have the biological property of this model representative.
62. method as claimed in claim 58 is characterized in that, described biological property is a disease.
63. method as claimed in claim 62 is characterized in that, described disease is a cancer.
64. method as claimed in claim 62 is characterized in that, described disease is breast cancer, lung cancer, prostate cancer, colorectal cancer, oophoroma, carcinoma of urinary bladder, cancer of the stomach or the carcinoma of the rectum.
65. method as claimed in claim 58 is characterized in that, described a plurality of models comprise that keeping the score with first model is first model of feature and to keep the score with second model be second model of feature; And the identity that its one or more features are used to calculate the cell component that described first model keeps the score is different from the identity that its one or more features are used to calculate the cell component that described second model keeps the score.
66. method as claimed in claim 58, it is characterized in that, be used for determining that feature in described one or more features of one or more cell components that the model of the model in described a plurality of models is kept the score comprises the abundance of described one or more cell components in the described test organism sample of the described test organism of described species or this species biology.
67. method as claimed in claim 58 is characterized in that, described species are people.
68. method as claimed in claim 58 is characterized in that, described test organism sample is from the biopsy of the sample of tumour, blood, bone, mammary gland, lung, prostate, colorectum, ovary, bladder, stomach or rectum or other form.
69. method as claimed in claim 58, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 100 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
70. method as claimed in claim 58, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 500 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
71. method as claimed in claim 58, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 5,000 kind of cell component in the described test organism sample of the described test organism of described species or the described biology of described species.
72. method as claimed in claim 58, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise at least 1 in the described test organism sample of the described test organism of described species or the described biology of described species, 000-20, the cell component abundance of 000 kind of cell component.
73. method as claimed in claim 58 is characterized in that, the cell component in the described various kinds of cell composition is mRNA, cRNA or cDNA.
74. method as claimed in claim 58, it is characterized in that, cell component in described one or more cell components is nucleic acid or RNA (ribonucleic acid), and the feature in one or more features of described cell component is to obtain by the transcriptional state of measuring the described cell component of all or part in described test organism or the described test organism sample.
75. method as claimed in claim 58, it is characterized in that, cell component in described one or more cell components is a protein, and the translation state acquisition of the described cell component of the feature in one or more features of described cell component in passing through described test organism of measurement or described test organism sample.
76. method as claimed in claim 58, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to use sample available from test organism or test organism sample to carry out the analysis of cell component tandem mass spectrum then with the isotope affinity labeling to determine.
77. method as claimed in claim 58, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to determine by the activity of the sample of experiment with measuring biology or the cell component in the test organism sample or posttranslational modification.
78. method as claimed in claim 58 is characterized in that, described biological property is a drug susceptibility.
79. method as claimed in claim 58 is characterized in that, calculates the possibility that a plurality of models that model keeps the score are represented each feature in two or more biological properties jointly with described calculating.
80., it is characterized in that each biological property in described two or more biological properties is the cancer source as the described method of claim 79.
81., it is characterized in that described two or more biological properties comprise first disease and second disease as the described method of claim 79.
82. method as claimed in claim 58 is characterized in that, calculates a plurality of models that model keeps the score with described calculating and represents in five or the more a plurality of biological property possibility of each jointly.
83., it is characterized in that each biological property in described five or the more a plurality of biological property is the cancer source as the described method of claim 82.
84., it is characterized in that described five or more a plurality of biological property comprise first disease and second disease as the described method of claim 82.
85. method as claimed in claim 58 is characterized in that, calculates a plurality of models that model keeps the score with described calculating and represents 2-20 biological property possibility separately jointly.
86., it is characterized in that each biological property in the described 2-20 biological property is the cancer source as the described method of claim 85.
87., it is characterized in that a described 2-20 biological property comprises first disease and second disease as the described method of claim 85.
88. a method, this method comprises:
Receive data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
Calculate a plurality of models, it is feature that each model in wherein said a plurality of model is kept the score with model, this model possibility of the biological property in representative described test organism or the test organism sample of keeping the score, and calculate single model in described a plurality of model and comprise with one or more features of one or more cell components in the described various kinds of cell composition and determine that the model relevant with this single model keep the score; With
Each the described model that calculates in described calculating is kept the score to be communicated.
89. computing machine, this computing machine comprises:
CPU (central processing unit);
With the storer that this CPU (central processing unit) is connected, this memory storage:
(i) send the instruction of data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology; With
(ii) receive the instruction that a plurality of models are kept the score, wherein each model is kept the score corresponding to the model in a plurality of models, and it is feature that each model in wherein said a plurality of model is kept the score with model, the keep the score possibility of the biological property in representative described test organism or the test organism sample of this model, and the described model of described calculating comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that this model keeps the score.
90., it is characterized in that described a plurality of models are kept the score and comprised two or more models and keep the score as the described computing machine of claim 89, and wherein said two or more models each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
91. as the described computing machine of claim 89, it is characterized in that, described a plurality of model is kept the score and is comprised five or more a plurality of model that communicates by described communication instruction and keep the score, and wherein said five or more a plurality of model each model in keeping the score is kept the score corresponding to the different models in described a plurality of models.
92., it is characterized in that the instruction of described transmission data comprises the instruction that described data is sent to mobile computer from described remote computer by wide area network as the described computing machine of claim 89.
93., it is characterized in that described wide area network is the Internet as the described computing machine of claim 92.
94., it is characterized in that described reception instruction comprises by wide area network and receives the instruction that described a plurality of model is kept the score from remote computer as the described computing machine of claim 89.
95., it is characterized in that described wide area network is the Internet as the described computing machine of claim 94.
96., it is characterized in that when described model was kept the score first scope at score value, described test organism or test organism sample were considered to have the biological property of the model representative in a plurality of models as the described computing machine of claim 89; And when described model was kept the score second scope at score value, described test organism or test organism sample were considered to not have the biological property of this model representative.
97., it is characterized in that described biological property is a disease as the described computing machine of claim 89.
98., it is characterized in that described disease is a cancer as the described computing machine of claim 97.
99., it is characterized in that described disease is breast cancer, lung cancer, prostate cancer, colorectal cancer, oophoroma, carcinoma of urinary bladder, cancer of the stomach or the carcinoma of the rectum as the described computing machine of claim 97.
100., it is characterized in that described a plurality of models comprise that keeping the score with first model is first model of feature and to keep the score with second model be second model of feature as the described computing machine of claim 89; And the identity that its one or more features are used to calculate the cell component that described first model keeps the score is different from the identity that its one or more features are used to calculate the cell component that described second model keeps the score.
101. as the described computing machine of claim 89, it is characterized in that, be used for determining that feature in described one or more features of one or more cell components that the model of the model in described a plurality of models is kept the score comprises the abundance of described one or more cell components in the described test organism sample of the described test organism of described species or this species biology.
102., it is characterized in that described species are people as the described computing machine of claim 89.
103., it is characterized in that described test organism sample is from the biopsy of the sample of tumour, blood, bone, mammary gland, lung, prostate, colorectum, ovary, bladder, stomach or rectum or other form as the described computing machine of claim 89.
104. as the described computing machine of claim 89, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 100 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
105. as the described computing machine of claim 89, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 500 kinds of cell components in the described test organism sample of the described test organism of described species or the described biology of described species.
106. as the described computing machine of claim 89, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise the cell component abundance of at least 5,000 kind of cell component in the described test organism sample of the described test organism of described species or the described biology of described species.
107. as the described computing machine of claim 89, it is characterized in that, described one or more feature comprises the cell component abundance, and described data comprise at least 1 in the described test organism sample of the described test organism of described species or the described biology of described species, 000-20, the cell component abundance of 000 kind of cell component.
108., it is characterized in that the cell component in the described various kinds of cell composition is mRNA, cRNA or cDNA as the described computing machine of claim 89.
109. as the described computing machine of claim 89, it is characterized in that, cell component in described one or more cell components is nucleic acid or RNA (ribonucleic acid), and the feature in described one or more features of described cell component is to obtain by the transcriptional state of measuring the described cell component of all or part in described test organism or the described test organism sample.
110. as the described computing machine of claim 89, it is characterized in that, cell component in described one or more cell components is a protein, and the translation state acquisition of the described cell component of the feature in one or more features of described cell component in passing through described test organism of measurement or described test organism sample.
111. as the described computing machine of claim 89, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to use sample available from test organism or test organism sample to carry out the analysis of cell component tandem mass spectrum then with the isotope affinity labeling to determine.
112. as the described computing machine of claim 89, it is characterized in that the feature in one or more features of the cell component in the described various kinds of cell composition is to determine by the activity of the sample of experiment with measuring biology or the cell component in the test organism sample or posttranslational modification.
113., it is characterized in that described biological property is a drug susceptibility as the described computing machine of claim 89.
114., it is characterized in that described a plurality of models are represented the possibility of each feature in two or more biological properties jointly as the described computing machine of claim 89.
115., it is characterized in that each biological property in described two or more biological properties is the cancer source as the described computing machine of claim 114.
116., it is characterized in that described two or more biological properties comprise first disease and second disease as the described computing machine of claim 114.
117., it is characterized in that described a plurality of models are jointly represented in five or the more a plurality of biological property possibility of each as the described computing machine of claim 89.
118., it is characterized in that each biological property in described five or the more a plurality of biological property is the cancer source as the described computing machine of claim 117.
119., it is characterized in that described five or more a plurality of biological property comprise first disease and second disease as the described computing machine of claim 117.
120. as the described computing machine of claim 89, it is characterized in that, represent 2-20 biological property possibility separately jointly with a plurality of models that described computations computation model is kept the score.
121., it is characterized in that each biological property in the described 2-20 biological property is the cancer source as the described computing machine of claim 120.
122., it is characterized in that a described 2-20 biological property comprises first disease and second disease as the described computing machine of claim 120.
123. unite the computer program of use with computer system for one kind, this computer program comprises computer-readable recording medium and embeds wherein computer program mechanism that this computer program mechanism comprises:
(i) send the instruction of data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology; With
(ii) receive the instruction that a plurality of models are kept the score, wherein each model is kept the score corresponding to the model in a plurality of models, and it is feature that each model in wherein said a plurality of model is kept the score with model, the keep the score possibility of the biological property in representative described test organism or the test organism sample of this model, and the described model of described calculating comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that this model keeps the score.
124. a method, described method comprises:
(i) send data, wherein said data are included in one or more features of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology; With
(ii) receiving a plurality of models keeps the score, wherein each model is kept the score corresponding to the model in a plurality of models, and it is feature that each model in wherein said a plurality of model is kept the score with model, the keep the score possibility of the biological property in representative described test organism or the test organism sample of this model, and the described model of described calculating comprises with one or more features of one or more cell components in the described various kinds of cell composition and determines that this model keeps the score.
125. method as claimed in claim 58 is characterized in that, described biological property comprises susceptibility or the resistance to treatment.
126., it is characterized in that described treatment is a drug administration as the described method of claim 125.
127. method as claimed in claim 58 is characterized in that, described biological property comprises the susceptibility of combined therapy or resistance.
128., it is characterized in that described combined therapy is the drug administration combination as the described method of claim 127.
129. method as claimed in claim 58 is characterized in that, described biological property comprises the metastatic potential or the recurrence of disease.
130. computing machine, this computing machine comprises:
CPU (central processing unit);
With the storer that this CPU (central processing unit) is connected, this memory storage:
(i) receive the instruction of data, wherein said data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of the model in a plurality of models, wherein said calculating produces the aspect of model of this model, this aspect of model shows whether the described test organism of described species or the described test organism sample of the described biology of described species are the members of a class biological sample, and the described model of wherein said calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize described model;
Thereby (iii) repeat the instruction that described computations one or many calculates a plurality of models; With
The instruction that each described aspect of model of calculating in described computations is communicated.
131., it is characterized in that the instruction of described reception data comprises the instruction that receives described data by wide area network from remote computer as the described computing machine of claim 130.
132., it is characterized in that described wide area network is the Internet as the described computing machine of claim 131.
133., it is characterized in that described biological sample kind is a disease as the described computing machine of claim 130.
134., it is characterized in that described disease is a cancer as the described computing machine of claim 133.
135. computing machine, this computing machine comprises:
CPU (central processing unit);
With the storer that this CPU (central processing unit) is connected, this memory storage:
(i) receive the instruction of data, wherein said data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of a plurality of models, wherein said calculating produces the aspect of model of each model in described a plurality of model, this aspect of model shows whether the described test organism of described species or the described test organism sample of the described biology of described species are the members of a class biological sample, and wherein said calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize each described model; With
The instruction that each described aspect of model of calculating by described computations is communicated.
136. unite the computer program of use with computer system for one kind, this computer program comprises computer-readable recording medium and embeds wherein computer program mechanism that this computer program mechanism comprises:
(i) receive the instruction of data, wherein said data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of the model in a plurality of models, wherein said calculating produces the aspect of model of this model, this aspect of model shows whether the described test organism of described species or the described test organism sample of the described biology of described species are the members of a class biological sample, and the described model of wherein said calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize described model;
Thereby (iii) repeat the instruction that described computations one or many calculates a plurality of models; With
The instruction that each described aspect of model of calculating in described computations is communicated.
137. unite the computer program of use with computer system for one kind, this computer program comprises computer-readable recording medium and embeds wherein computer program mechanism that this computer program mechanism comprises:
(i) receive the instruction of data, wherein said data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
(ii) calculate the instruction of a plurality of models, wherein said calculating produces the aspect of model of each model in described a plurality of model, this aspect of model shows whether the described test organism of described species or the described test organism sample of the described biology of described species are the members of a class biological sample, and wherein said calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize each described model; With
The instruction that each described aspect of model of calculating by described computations is communicated.
138. a method, described method comprises:
Receive data, wherein said data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
Calculate the model in a plurality of models, wherein said calculating produces the aspect of model of model, this aspect of model shows whether the described test organism of described species or the described test organism sample of the described biology of described species are the members of a class biological sample, and the described model of wherein said calculating comprises that the one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize described model;
Thereby repeat described computations one or many and calculate described a plurality of model; With
Each the described aspect of model that calculates in described calculating is communicated.
139. a method, described method comprises:
Receive data, wherein said data are included in one or more aspects of the biological condition of every kind of cell component in the various kinds of cell composition that records in the test organism sample of the test organism of a certain species or this species biology;
Calculate a plurality of models, wherein said calculating produces the aspect of model of each model in described a plurality of model, this aspect of model shows whether the described test organism of described species or the described test organism sample of the described biology of described species are the members of a class biological sample, and wherein said calculating comprises that one or more aspects with the biological condition of one or more cell components in the described various kinds of cell composition characterize each the described model in described a plurality of model; With
Each the described aspect of model that calculates is communicated.
CN 200480034992 2003-09-29 2004-09-29 Systems and methods for detecting biological features Pending CN1886658A (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US50738103P 2003-09-29 2003-09-29
US60/507,381 2003-09-29
US60/507,445 2003-09-29
US10/861,216 2004-06-04
US10/861,177 2004-06-04
US60/577,416 2004-06-05

Publications (1)

Publication Number Publication Date
CN1886658A true CN1886658A (en) 2006-12-27

Family

ID=37584078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200480034992 Pending CN1886658A (en) 2003-09-29 2004-09-29 Systems and methods for detecting biological features

Country Status (1)

Country Link
CN (1) CN1886658A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102422294A (en) * 2009-05-11 2012-04-18 皇家飞利浦电子股份有限公司 Device and method for comparing molecular signatures
CN104975063A (en) * 2014-04-01 2015-10-14 埃提斯生物技术(上海)有限公司 Screening method for anti-tumor medicine biomarker and application of anti-tumor medicine biomarker
CN106372460A (en) * 2016-08-24 2017-02-01 成都旅美科技有限公司 Environment analysis-based biological distribution determination apparatus
CN110234749A (en) * 2017-02-02 2019-09-13 PhAST公司 Analyze and use the motility kinematics of microorganism
CN113742292A (en) * 2021-09-07 2021-12-03 六棱镜(杭州)科技有限公司 Multi-thread data retrieval and retrieved data access method based on AI technology
WO2023245827A1 (en) * 2022-06-22 2023-12-28 中国食品药品检定研究院 Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102422294A (en) * 2009-05-11 2012-04-18 皇家飞利浦电子股份有限公司 Device and method for comparing molecular signatures
CN102422294B (en) * 2009-05-11 2015-11-25 皇家飞利浦电子股份有限公司 For comparing equipment and the method for molecular label
CN104975063A (en) * 2014-04-01 2015-10-14 埃提斯生物技术(上海)有限公司 Screening method for anti-tumor medicine biomarker and application of anti-tumor medicine biomarker
CN104975063B (en) * 2014-04-01 2020-04-03 埃提斯生物技术(上海)有限公司 Screening method and application of antitumor drug biomarker
CN106372460A (en) * 2016-08-24 2017-02-01 成都旅美科技有限公司 Environment analysis-based biological distribution determination apparatus
CN106372460B (en) * 2016-08-24 2018-11-02 成都旅美科技有限公司 A kind of bio distribution determining device based on environmental analysis
CN110234749A (en) * 2017-02-02 2019-09-13 PhAST公司 Analyze and use the motility kinematics of microorganism
CN110234749B (en) * 2017-02-02 2023-06-30 PhAST公司 Analysis and use of motional kinematics of microorganisms
CN113742292A (en) * 2021-09-07 2021-12-03 六棱镜(杭州)科技有限公司 Multi-thread data retrieval and retrieved data access method based on AI technology
CN113742292B (en) * 2021-09-07 2023-11-10 六棱镜(杭州)科技有限公司 Multithread data retrieval and access method of retrieved data based on AI technology
WO2023245827A1 (en) * 2022-06-22 2023-12-28 中国食品药品检定研究院 Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof

Similar Documents

Publication Publication Date Title
US8977506B2 (en) Systems and methods for detecting biological features
JP7368483B2 (en) An integrated machine learning framework for estimating homologous recombination defects
US8321137B2 (en) Knowledge-based storage of diagnostic models
Dyrskjøt et al. Identifying distinct classes of bladder carcinoma using microarrays
Venet et al. Most random gene expression signatures are significantly associated with breast cancer outcome
Hellwig et al. Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes
CN1764837A (en) Methods for predicting an individual's clinical treatment outcome from sampling a group of patients' biological profiles
US20130332083A1 (en) Gene Marker Sets And Methods For Classification Of Cancer Patients
KR20220003142A (en) Methods and processes for non-invasive assessment of genetic variations
JP2008536094A (en) Methods for predicting chemotherapy responsiveness in breast cancer patients
Yang et al. An assessment of prognostic immunity markers in breast cancer
JP5391279B2 (en) Method for constructing a panel of cancer cell lines for use in testing the efficacy of one or more pharmaceutical compositions
CN101068936A (en) Methods and systems for prognosis and treatment of solid tumors
US20100280987A1 (en) Methods and gene expression signature for assessing ras pathway activity
US20050069863A1 (en) Systems and methods for analyzing gene expression data for clinical diagnostics
CN1886658A (en) Systems and methods for detecting biological features
Lee et al. A novel immune prognostic index for stratification of high-risk patients with early breast cancer
Van Laar Design and multiseries validation of a web-based gene expression assay for predicting breast cancer recurrence and patient survival
Mondol et al. hist2RNA: An Efficient Deep Learning Model to Predict Gene Expression from Breast Cancer Histopathology Images. Cancers 2023, 15, 2569
Chen et al. Directly selecting differentially expressed genes for single-cell clustering analyses
Hong et al. Molecular biomarkers for personalized medicine
Popovici Computational biomarker discovery: methods and practice
Van Laar Optimisation of CDNA Microarray Tumour Profiling and Molecular Analysis of Epithelial Ovarian Cancer
Eggle Using whole-genome wide gene expression profiling for the establishment of RNA fingerprints: application to scientific questions in molecular biology, immunology and diagnostics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1093774

Country of ref document: HK

ASS Succession or assignment of patent right

Owner name: CHUANDAO BIOLOGY CO., LTD.

Free format text: FORMER OWNER: PATHWORK INFORMATICS INC.

Effective date: 20071012

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20071012

Address after: American California

Applicant after: Mission biology Ltd

Address before: American California

Applicant before: Pathwork Informatics Inc.

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20061227