US20110106739A1 - Method for determining the presence of disease - Google Patents
Method for determining the presence of disease Download PDFInfo
- Publication number
- US20110106739A1 US20110106739A1 US12/915,981 US91598110A US2011106739A1 US 20110106739 A1 US20110106739 A1 US 20110106739A1 US 91598110 A US91598110 A US 91598110A US 2011106739 A1 US2011106739 A1 US 2011106739A1
- Authority
- US
- United States
- Prior art keywords
- gene
- disease
- genes
- expression
- levels
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- FIG. 1 is a diagram showing an example of an apparatus for determining the presence of a target disease, which is operated using the program of the invention
- FIG. 6A shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to each of Crohn's disease-determining gene families, wherein the data are the same as those used in the identification of the gene families;
- FIG. 12A shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to genes belonging to Huntington's disease-determining gene families, wherein the data are the same as those used in the identification of the gene families;
- the resulting gene transcription product extract is measured for the levels of expression of transcription products of genes comprising at least one gene belonging to each of at least two disease-determining gene families whose relationship with the target disease is known.
- Whether or not and how much the gene transcription products or cDNAs or cRNAs thereof hybridize with the nucleic acid probes can be detected using a fluorescent substance or a dye or based on a hybridization-induced change in the amount of current flowing on the nucleic acid chip.
- the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects may be obtained by a process including: collecting biological samples from healthy subjects by the same method as that performed to collect the biological sample from the subject; and measuring the levels of expression of transcription products of the object genes using the biological samples.
- a plurality of healthy subjects means a statistically sufficient number of healthy subjects, which may be 30 or more, preferably 40 or more healthy subjects.
- the determination formula may be prepared using discriminant analysis methods known per se.
- Discriminant analysis methods are statistical methods which can provide criteria for determining which of two different groups newly obtained data belongs to, provided that previously presented pieces of data are known to be classified into the two different groups. Examples of such discriminant analysis methods include a support vector machine (SVM), a linear discriminant analysis, a neural network, a k-neighborhood discriminator, a decision tree, a random forest, and so on.
- SVM support vector machine
- the term “patients having the target disease” refers to subjects that can be confirmed to have the target disease based on criteria other than those for the determination method of the invention.
- the patients are humans that can be confirmed to have cancer (as the target disease) by tissue characterization, CT, MRI, tumor marker method, or the like, an autoimmune disease (ditto) by blood test or the like, an infectious disease (ditto) by blood test or the like, a psychiatric disease or a nervous system disease (ditto) by diagnostic brain imaging, genetic testing, inquiry, or the like, Crohn's disease (ditto) by endoscopy, digestive tract imaging, or the like, or endometriosis (ditto) by CT, MRI, endoscopy, or the like.
- the levels of the expression in each of the plurality of healthy subjects are also standardized so that values representing deviations for each of the plurality of healthy subjects are obtained.
- the genes, whose expression levels are measured are first classified into at least two gene families using the classification system.
- the average for each classified gene family is then obtained with respect to each of the plurality of patients and the plurality of healthy subjects in the same manner as in the step of obtaining the average for the subject described above.
- the gene family is identified as a disease-determining gene family related to the target disease.
- the determination method of the invention is particularly suitable for use in determining the presence of such a disease as Crohn's disease, Huntington's disease, or endometriosis.
- examples of the disease-determining gene family include a G protein-related gene family, a blood coagulation-related gene family, an oxidative stress-related gene family, a phagocytosis-related gene family, and a fat oxidation-related gene family.
- Huntington's disease is a chronic progressive neurodegenerative disease whose main symptoms include involuntary movement (mainly choreic movement), mental manifestation, and dementia. When diagnosed, this disease must be discriminated from symptomatic chorea caused by cerebrovascular disorders such as cerebral bleeding, drug-induced chorea caused by antipsychotic drugs, and other diseases such as Wilson's disease. Therefore, the determination method of the invention may be performed on a subject suspected of having Huntington's disease, so that a reliable determination result can be obtained as an index of diagnosis.
- the determination method of the invention can be implemented by the program of the invention in cooperation with the computer 2 including a central processing unit, a storage unit, a reader for a recording medium such as a compact disc or a Floppy® disc, an input unit such as a keyboard, and an output unit such as a display.
- FIG. 2 shows a more specific example of the computer system for implementing the method.
- the RAM 110 c includes an SRAM, DRAM or the like.
- the RAM 110 c is used to read out the computer program stored in the RAM 110 c , ROM 110 b , and hard disk 110 d . When these computer programs are executed, the RAM 110 c is also used as a work area for the CPU 110 a.
- the transcription product expression level-measuring device 1 outputs, to the computer 2 , data on the measured expression levels in the plurality of patients (hereinafter referred to as “measured patient expression level data”) and data on the measured expression levels in the plurality of healthy subjects (hereinafter referred to as “measured healthy subject expression level data”).
- the CPU 110 a receives the output measured patient expression level data and the output measured healthy subject expression level data, and stores the data into the RAM 110 c (step S 21 ).
- FIG. 5 shows the distribution of the average of the z-scores for the healthy subjects 1 and the Crohn's disease patients 1 with respect to each gene family selected as described above.
- the result is shown in FIG. 7A .
- the result shows that the conventional method identified the Crohn's disease patients and the healthy subjects at a sensitivity of 100% and a specificity of 100%.
- the 8,370 genes were classified into gene families (GO Terms) based on the classification of Gene Ontology, and the average of the z-scores for the Huntington's disease patients 1 (6 samples) obtained in the section (1-2) was calculated with respect to the gene within each GO Term.
- a t-test was performed using the averages obtained as described above for the healthy subjects and the Huntington's disease patients with respect to each GO Term, so that a significance probability (p-value) was obtained.
- Hierarchical clustering was performed using the z-scores for all genes contained in the extracted GO Terms, and synchronously varying gene clusters were selected.
- the clustering was performed using software Cluster 3.0 (available from http://bonsai.ims.u-tokyo.ac.jp/ ⁇ mdehoon/software/cluster/software.htm), and the result was displayed using Java Tree View (available from http://sourceforge.net/projects/jtreeview/files/).
- the average of the z-scores for the gene contained in each cluster was used as a cluster score, when a t-test was performed on the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples). From the clusters for which the resulting p-value was 0.05 or less, the microtubule-related gene family, mitochondria-related gene family, and prostaglandin-related gene family were selected as Huntington's disease-determining gene families. Table 4 shows these gene families, genes belonging to each family, and the p value for each family.
- the result is shown in FIG. 14A .
- the result shows that the conventional method using genes other than those belonging to Huntington's disease-determining gene families identified the Huntington's disease patients and the healthy subjects at a sensitivity of 100% and a specificity of 100%.
- the result is shown in FIG. 14B .
- the result shows that for samples different from those used in the identification of Huntington's disease-determining gene families, the sensitivity of the conventional determination method was reduced to 50%, although the specificity was 100%. It is therefore apparent that the conventional determination method using genes other than those belonging to Huntington's disease-determining gene families is more likely to misidentify Huntington's disease patients as healthy subjects than the determination method of the invention.
- the data on lesion tissues and normal tissues obtained from the GEO were produced by analysis using GeneChip® U133 plus2.0 (Affymetrix, Inc.), a DNA chip.
- the DNA chip has 54,675 probe sets, which include probe sets for the same gene.
- the 16,207 genes were classified into gene families (GO Terms) based on the classification of Gene Ontology, and the average of the z-scores for the lesion tissues 1 (9 samples) obtained in the section (1-2) was calculated with respect to the gene within each GO Term.
- the expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples) with respect to each of the 39 genes in Table 5 were input to the SVM.
- the accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 17 samples.
- FIG. 17B The result shows that for samples different from those used in the identification of endometriosis-determining gene families, the sensitivity of the conventional determination method was reduced to 65% or less, although the specificity was 100%. It is therefore apparent that the conventional determination method is more likely to misidentify endometriosis patients as healthy subjects than the determination method of the invention.
- Genes other than those belonging to endometriosis-determining gene families were further identified so that an examination could be performed using such genes. Specifically, a t-test was performed to calculate the significance probability (p-value) between the expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples), and the gene for which the resulting p-value was 0.05 or less with respect to the expression level was determined to be used for the determination. As a result, ten genes were identified. Table 7 shows these genes and the p-value for each gene. FIG. 18 also shows the distribution of the level of expression of the transcription product of each gene in the normal tissues 1 and the lesion tissues 1.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Public Health (AREA)
- Bioinformatics & Computational Biology (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a method for determining presence of a disease, comprising steps of; measuring the levels of expression of transcription products of genes in a biological sample obtained from a subject suspected of having a target disease, wherein the genes comprise at least one gene belonging to each of at least two disease-determining gene families related to the target disease; obtaining values representing deviations by standardizing the levels of the expression based on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects; obtaining the average of values representing deviations with respect to the gene belonging to each of the disease-determining gene families; and determining whether or not the subject has the target disease by using the average; as well as a computer program product for determining presence of a disease.
Description
- The invention relates to a method for determining whether or not a subject has a target disease. More specifically, the invention relates to a method capable of determining whether or not a subject has a target disease, based on the measured levels of expression of transcription products of certain genes in a biological sample collected from the subject.
- Exhaustive analysis of the levels of expression of a large number of genes or transcription products thereof makes it possible to find genes whose expression levels change in relation to certain diseases, and therefore has been expected to be applicable to determining the presence of such diseases. Therefore, many studies have been carried out on methods of determining whether or not a subject has a certain disease based on such exhaustive analysis data.
- However, exhaustive analysis of the levels of expression of genes or transcription products thereof has a problem in which detection of a large number of false-positive genes, error in the measurement system, or poor reproducibility of gene expression makes it difficult to extract genes that show a truly significant change in expression level.
- To solve such a problem, various statistical techniques for analytical data have been studied and developed.
- For example, Japanese Patent Application Laid-Open (JP-A) No. 2005-323573 discloses a method of determining whether there is a significant difference in gene expression between two different conditions by multivariate analysis of data on gene expression levels obtained from a DNA microarray.
- U.S. Patent Application Publication No. 2009/0297494 discloses a method of diagnosing mental disorders based on the levels of expression of genes involved in regulation of intracellular glutathione level.
- The scope of the present invention is defined solely by the appended claims, and is not affected to any degree by the statements within this summary.
- The method and computer program of the invention make it possible to conveniently determine whether or not a subject suspected of having a target disease has the target disease, using a biological sample from the subject. The invention also can provide objective means for determining whether or not a subject has the target disease. The invention also makes it possible to stably provide an accurate index to aid target disease diagnosis as compared with conventional methods.
-
FIG. 1 is a diagram showing an example of an apparatus for determining the presence of a target disease, which is operated using the program of the invention; -
FIG. 2 is a diagram showing an example of a computer system that executes the program of the invention; -
FIG. 3 is a flow chart showing a specific operation according to the program of the invention; -
FIG. 4 is a flow chart showing a specific operation according to the program of the invention for identifying disease-determining gene families; -
FIG. 5 shows the distribution of the average of z-scores for healthy subjects and Crohn's disease patients calculated from the levels of expression of transcription products of genes belonging to a G protein-related gene family, a blood coagulation-related gene family, an oxidative stress-related gene family, a phagocytosis-related gene family, and a fat oxidation-related gene family; -
FIG. 6A shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to each of Crohn's disease-determining gene families, wherein the data are the same as those used in the identification of the gene families; -
FIG. 6B shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to each of Crohn's disease-determining gene families, wherein the data differ from those used in the identification of the gene families; -
FIG. 7A shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to genes belonging to Crohn's disease-determining gene families, wherein the data are the same as those used in the identification of the gene families; -
FIG. 7B shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to genes belonging to Crohn's disease-determining gene families, wherein the data differ from those used in the identification of the gene families; -
FIG. 8 shows the distributions of the levels of expression of genes which are identified as having a significant difference between healthy subjects and Crohn's disease patients from data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients, which are the same as those used in the identification of Crohn's disease-determining gene families; -
FIG. 9A shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to genes having a significant difference between healthy subjects and Crohn's disease patients, wherein the data are the same as those used in the identification of Crohn's disease-determining gene families; -
FIG. 9B shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients with respect to genes having a significant difference between healthy subjects and Crohn's disease patients, wherein the data differ from those used in the identification of Crohn's disease-determining gene families; -
FIG. 10 shows the distribution of the average of z-scores for healthy subjects and Huntington's disease patients calculated from the levels of expression of transcription products of genes belonging to a microtubule-related gene family, a mitochondria-related gene family, and a prostaglandin-related gene family; -
FIG. 11A shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to each of Huntington's disease-determining gene families, wherein the data are the same as those used in the identification of the gene families; -
FIG. 11B shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to each of Huntington's disease-determining gene families, wherein the data differ from those used in the identification of the gene families; -
FIG. 12A shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to genes belonging to Huntington's disease-determining gene families, wherein the data are the same as those used in the identification of the gene families; -
FIG. 12B shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to genes belonging to Huntington's disease-determining gene families, wherein the data differ from those used in the identification of the gene families; -
FIG. 13 shows the distributions of the levels of expression of genes which are identified as having a significant difference between healthy subjects and Huntington's disease patients from data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients, which are the same as those used in the identification of Huntington's disease-determining gene families; -
FIG. 14A shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to genes having a significant difference between healthy subjects and Huntington's disease patients, wherein the data are the same as those used in the identification of Huntington's disease-determining gene families; -
FIG. 14B shows the result of determination using data on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients with respect to genes having a significant difference between healthy subjects and Huntington's disease patients, wherein the data differ from those used in the identification of Huntington's disease-determining gene families; -
FIG. 15 shows the distribution of the average of z-scores for normal tissues and endometriosis lesion tissues calculated from the levels of expression of transcription products of genes belonging to a cytokine synthesis process-related gene family, a cytokine-mediated signaling-related gene family, and an immunoglobulin-mediated immune response-related gene family; -
FIG. 16A shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues with respect to each of endometriosis-determining gene families, wherein the data are the same as those used in the identification of the gene families; -
FIG. 16B shows the result of determination using averages of z-scores calculated from data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues with respect to each of endometriosis-determining gene families, wherein the data differ from those used in the identification of the gene families; -
FIG. 17A shows the result of determination using data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues with respect to genes belonging to endometriosis-determining gene families, wherein the data are the same as those used in the identification of the gene families; -
FIG. 17B shows the result of determination using data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues with respect to genes belonging to endometriosis-determining gene families, wherein the data differ from those used in the identification of the gene families; -
FIG. 18 shows the distributions of the levels of expression of genes which are identified as having a significant difference between normal tissues and endometriosis lesion tissues from data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues, which are the same as those used in the identification of endometriosis-determining gene families; -
FIG. 19A shows the result of determination using data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues with respect to genes having a significant difference between normal tissues and endometriosis lesion tissues, wherein the data are the same as those used in the identification of endometriosis-determining gene families; and -
FIG. 19B shows the result of determination using data on the levels of expression of gene transcription products in normal tissues and endometriosis lesion tissues with respect to genes having a significant difference between normal tissues and endometriosis lesion tissues, wherein the data differ from those used in the identification of endometriosis-determining gene families. - Preferred embodiments of the invention are described below with reference to the drawings.
- The determination method of the invention first measures the levels of expression of transcription products of genes in a biological sample obtained from a subject suspected of having a target disease, wherein the genes comprise at least one gene belonging to each of at least two disease-determining gene families related to the target disease.
- The disease to be determined by the method of the invention (target disease) may be typically, but not limited to, a disease whose diagnosis has required advanced medical equipment such as CT or MRI scanner or a disease which lacks a specific symptom or a specific appearance and therefore is generally diagnosed by exclusion. Examples of such a disease include cancers (e.g., lung cancer, breast cancer, stomach cancer, colon cancer, cervical cancer, and melanoma), autoimmune diseases (e.g., rheumatism, systemic lupus erythematosus, Sjoegren syndrome, Guillain-Barre syndrome, and ulcerative colitis), infectious diseases (e.g., malaria, Japanese encephalitis, cholera, typhoid, and dysentery), psychiatric diseases or nervous system diseases (e.g., schizophrenia, bipolar disorder, Alzheimer's disease, and Huntington's disease), and diseases of unknown origin (e.g., Crohn's disease and endometriosis).
- As used herein, the term “subject suspected of having a target disease” (hereinafter also simply referred to as “subject”) means a subject that potentially has a target disease such as that described above and is to be determined to have or not to have the disease by the determination method of the invention.
- The biological sample may be any sample which can be collected from an organism and from which transcription products of genes can be extracted. The blood (including whole blood, plasma, or serum), saliva, urine, hair, or the like of the subject may be used as the biological sample.
- As used herein, the term “disease-determining gene families related to the target disease” means gene families whose relationship with the target disease is medically, biologically, or statistically clear. As long as such relationship is clear, any disease-determining gene families may be used in the determination method of the invention. In the determination method of the invention, gene families identified by the procedure described below may be used as the disease-determining gene families related to the target disease.
- As used herein, the term “transcription products of genes” refers to products obtained by the transcription of the genes, which are intended to include ribonucleic acid (RNA), specifically, messenger RNA (mRNA).
- As used herein, the term “the levels of expression of transcription products of genes” refers to the amounts of gene transcription products in the biological sample or the amounts of substances that reflect the amounts of the gene transcription products in the biological sample. Therefore, the determination method of the invention may measure the amounts of gene transcription products (mRNAs) or the amounts of complementary deoxyribonucleic acids (cDNAs) or complementary ribonucleic acids (cRNAs) derived from mRNAs. In general, the amount of mRNA in a biological sample is very small. Therefore, the amount of cDNA or cRNA derived therefrom by reverse transcription or in vitro transcription (IVT) is preferably measured.
- The gene transcription products may be extracted from the biological sample by an RNA extraction method known in the art. For example, an RNA extract may be obtained by a process including centrifuging the biological sample to precipitate RNA-containing cells, physically or enzymatically destroying the cells, and removing the cell debris. The RNA extraction may also be performed using a commercially available RNA extraction kit or the like.
- A treatment for removing a contaminant from the gene transcription product extract obtained as described above may also be performed. Such a contaminant, which is typically globin mRNA when the biological sample is blood, is derived from the biological sample and preferably absent in the measurement of the levels of expression of the gene transcription products.
- The resulting gene transcription product extract is measured for the levels of expression of transcription products of genes comprising at least one gene belonging to each of at least two disease-determining gene families whose relationship with the target disease is known.
- While the levels of expression of the gene transcription products may be measured by any known methods, they are preferably measured by quantitative PCR methods or methods using a nucleic acid chip, so that expression of transcription products of a large number of genes can be analyzed.
- When the levels of expression of the gene transcription products are measured using a nucleic acid chip, a typical process may include: bringing cDNAs or cRNAs, which are prepared from the gene transcription product extract or the gene transcription products, into contact with about 20 to 25 mer nucleic acid probes fixed on a substrate; and measuring the change in fluorescence, coloring, current, or any other index to determine the presence or absence of hybridization, so that the levels of expression of the target gene transcription products can be determined.
- At least one nucleic acid probe may be used for one gene transcription product, and two or more probes may be used depending on the length of the gene transcription product. The probe sequence may be appropriately determined by a person skilled in the art according to the sequence of the gene transcription product to be measured.
- For example, GeneChip System available from Affymetrix, Inc. may be used in the method of measuring the levels of expression of the gene transcription products using a nucleic acid chip.
- When a nucleic acid chip is used, the gene transcription products or cDNAs or cRNAs thereof may be fragmented so that the hybridization with the nucleic acid probes can be facilitated. The fragmentation may be performed by methods known in the art, such as methods using nuclease such as ribonuclease or deoxyribonuclease.
- The amounts of the gene transcription products or cDNAs or cRNAs thereof to be in contact with the nucleic acid probes on the nucleic acid chip may generally be from about 5 to about 20 μg. The contact conditions are generally 45° C. for about 16 hours.
- Whether or not and how much the gene transcription products or cDNAs or cRNAs thereof hybridize with the nucleic acid probes can be detected using a fluorescent substance or a dye or based on a hybridization-induced change in the amount of current flowing on the nucleic acid chip.
- When the hybridization is measured by the detection of a fluorescent substance or a dye, the gene transcription products or cDNAs or cRNAs thereof are preferably labeled with a marker for the detection of the fluorescent substance or the dye. Such a marker may be one generally used in the art. In general, biotinylated nucleotide or biotinylated ribonucleotide may be mixed as a nucleotide or ribonucleotide substrate in the synthesis of cDNAs or cRNAs so that biotin-labeled cDNAs or cRNAs can be obtained. The biotin-labeled cDNAs or cRNAs can be coupled to avidin or streptavidin, which is a binding partner to biotin, on the nucleic acid chip. The binding of avidin or streptavidin to an appropriate fluorescent substance or dye makes it possible to detect the hybridization. Examples of the fluorescent substance include fluorescein isothiocyanate (FITC), green-fluorescent protein (GFP), luciferin, and phycoerythrin. In general, a phycoerythrin-streptavidin conjugate is commercially available and therefore conveniently used.
- Alternatively, a labeled antibody to avidin or streptavidin may also be brought into contact with avidin or streptavidin so that the fluorescent substance or dye of the labeled antibody can be detected.
- The levels of expression of the gene transcription products obtained in this step may be any type of values that can relatively indicate the amount of each gene transcription product in the biological sample. When the measurement is performed using the nucleic acid chip, the levels of expression may be signals obtained from the nucleic acid chip, which are based on the intensity of fluorescence, the intensity of coloring, the amount of current, or the like.
- Such signals may be measured using a nucleic acid chip analyzer.
- The measured levels of expression are then standardized based on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects so that values representing deviations are obtained.
- As used herein, the term “transcription products of the corresponding genes” means transcription products of the same genes as those whose expression levels in the subject are measured.
- The levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects may be obtained by a process including: collecting biological samples from healthy subjects by the same method as that performed to collect the biological sample from the subject; and measuring the levels of expression of transcription products of the object genes using the biological samples.
- As used herein, the term “healthy subject” refers to a subject that can be confirmed not to have the target disease, based on criteria other than those for the determination method of the invention. For example, the healthy subject may be a subject that can be confirmed not to have cancer (as the target disease) by tissue characterization, CT, MRI, tumor marker method, or the like, an autoimmune disease (ditto) by blood test or the like, an infectious disease (ditto) by blood test or the like, a psychiatric disease or a nervous system disease (ditto) by diagnostic brain imaging, genetic testing, inquiry, interview sheet method, or the like, Crohn's disease (ditto) by endoscopy, digestive tract imaging, or the like, or endometriosis (ditto) by CT, MRI, endoscopy, or the like.
- As used herein, the term “a plurality of healthy subjects” means a statistically sufficient number of healthy subjects, which may be 30 or more, preferably 40 or more healthy subjects.
- As used herein, the phrase “standardizing (or standardized) based on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects” means that values representing deviations are calculated from the following formula: a value representing a deviation={(the level of expression of a transcription product of a gene in a subject)−(the average of the levels of expression of the transcription product of the corresponding gene in a plurality of healthy subjects)}/(the standard deviation of the levels of expression of the transcription product of the corresponding gene in the plurality of healthy subjects).
- The value representing a deviation is also known as a z-score, which indicates how much the level of expression of the transcription product of the gene in the subject deviates from the level of expression of the transcription product of the gene in the plurality of healthy subjects.
- Alternatively, in the determination method of the invention, the level of expression of a transcription product of a gene in a subject may be divided by the average of the levels of expression of the transcription product of the corresponding gene in a plurality of healthy subjects in order to obtain the ratio of the expression level in the subject to the expression level in the healthy subjects, and the next step may be performed using the value representing the expression level ratio in place of the value representing a deviation.
- The value representing the expression level ratio indicates how much the level of expression of the transcription product of the gene in the subject is larger than the average of the levels of expression of the transcription product of the corresponding gene in the plurality of healthy subjects.
- Subsequently, the average of values representing deviations with respect to the gene belonging to each of the selected disease-determining gene families is obtained.
- When a value representing a deviation is obtained for only one gene belonging to the gene family for which an average is to be obtained, the term “average” as used herein means a value representing a deviation for the one gene, and when values representing deviations are obtained for two or more genes, the term “average” as used herein means the average of these values representing deviations.
- The average is obtained for at least two gene families selected from disease-determining gene families whose relationship with the target disease is known. The number of the selected gene families is preferably as large as possible.
- Whether or not the subject has the target disease is determined using the average obtained as described above.
- The determination may be made by inputting the average obtained as described above from the subject to a determination formula, which is obtained based on: averages previously obtained in the same manner as in the respective steps described above using biological samples collected from healthy subjects; and averages previously obtained in the same manner as in the respective steps described above using biological samples collected from patients having the target disease.
- The determination formula may be prepared using discriminant analysis methods known per se. Discriminant analysis methods are statistical methods which can provide criteria for determining which of two different groups newly obtained data belongs to, provided that previously presented pieces of data are known to be classified into the two different groups. Examples of such discriminant analysis methods include a support vector machine (SVM), a linear discriminant analysis, a neural network, a k-neighborhood discriminator, a decision tree, a random forest, and so on. Among these discriminant analysis methods, a SVM, which is also installed on statistical analysis software GeneSpring, is preferably used in the preparation of the determination formula.
- The averages obtained from the healthy subjects and the averages obtained from the target disease patients may be previously input so that a determination formula can be prepared using a SVM. The average determined from the biological sample collected from the subject may be input to the SVM with which the determination formula is prepared, so that it can be determined whether or not the subject has the target disease.
- As described above, the determination method of the invention is performed using “disease-determining gene families related to the target disease.” For example, such gene families may be gene families statistically related to the target disease. For example, the gene families statistically related to the target disease may be identified by a procedure including the following steps of:
- (a) measuring the levels of expression of transcription products of genes in a biological sample obtained from each of a plurality of patients having the target disease and a plurality of healthy subjects;
- (b) standardizing the levels of the expression in each of the plurality of patients based on the levels of expression of the transcription products of the corresponding genes in the plurality of healthy subjects to obtain values representing deviations for each of the plurality of patients;
- standardizing the levels of the expression in each of the plurality of healthy subjects to obtain values representing deviations for each of the plurality of healthy subjects;
- (c) classifying the genes, whose expression levels are measured, into at least two gene families using a classification system based on the function of molecules encoded by the genes;
- obtaining, as an average for each gene family, the average of values representing deviations for the gene belonging to each of the gene families with respect to each of the plurality of patients and the plurality of healthy subjects;
- (d) obtaining a significance probability between the average for each gene family with respect to the plurality of patients and the average for each corresponding gene family with respect to the plurality of healthy subjects; and
- (e) identifying the gene family as a disease-determining gene family related to the target disease, when the significance probability for the gene family is 0.05 or less.
- The first step is to measure the levels of expression of gene transcription products in a biological sample obtained from each of a plurality of patients having the target disease and a plurality of healthy subjects.
- As used herein, the term “patients having the target disease” (hereinafter also simply referred to as “patients”) refers to subjects that can be confirmed to have the target disease based on criteria other than those for the determination method of the invention. For example, the patients are humans that can be confirmed to have cancer (as the target disease) by tissue characterization, CT, MRI, tumor marker method, or the like, an autoimmune disease (ditto) by blood test or the like, an infectious disease (ditto) by blood test or the like, a psychiatric disease or a nervous system disease (ditto) by diagnostic brain imaging, genetic testing, inquiry, or the like, Crohn's disease (ditto) by endoscopy, digestive tract imaging, or the like, or endometriosis (ditto) by CT, MRI, endoscopy, or the like.
- As used herein, the term “a plurality of patients” means a statistically sufficient number of patients, which may be 30 or more, preferably 40 or more patients. The terms “healthy subject” and “a plurality of healthy subjects” have the same meanings as defined above.
- This step may include extracting the gene transcription products and measuring the levels of expression of the transcription products, which may be performed in the same manner as in the respective steps of the above determination method of the invention using the biological sample obtained from each of the plurality of patients having the target disease and the plurality of healthy subjects.
- The levels of the expression in each of the plurality of patients are standardized based on the levels of expression of the transcription products of the corresponding genes in the plurality of healthy subjects, so that values representing deviations for each of the plurality of patients are obtained.
- As used herein, the phrase “the levels of the expression in each of the plurality of patients are standardized based on the levels of expression of the transcription products of the corresponding genes in the plurality of healthy subjects” means that values representing deviations for all of the plurality of patients are calculated from the following formula: a value representing a deviation for a patient={(the level of expression of a transcription product of a gene in each patient)−(the average of the levels of expression of the transcription product of the corresponding gene in a plurality of healthy subjects)}/(the standard deviation of the levels of expression of the transcription product of the corresponding gene in the plurality of healthy subjects).
- The levels of the expression in each of the plurality of healthy subjects are also standardized so that values representing deviations for each of the plurality of healthy subjects are obtained.
- In this case, “standardized (standardizing)” has the same meaning as commonly used in the field of statistics.
- Specifically, values representing deviations for all of the plurality of healthy subjects may be obtained using the following formula: a value representing a deviation for a healthy subject={(the level of expression of a transcription product of a gene in each healthy subject)−(the average of the levels of expression of the transcription product of the gene in a plurality of healthy subjects)}/(the standard deviation of the levels of expression of the transcription product of the gene in the plurality of healthy subjects).
- The ratio of the expression level in each of the plurality of patients to the average for the healthy subjects and the ratio of the expression level in each of the healthy subjects to the average for the healthy subjects may be calculated in the same manner as in the calculation of the value representing the ratio of the expression level in the subject to the expression level in the healthy subjects, and these expression level ratios may be used in place of the value representing a deviation for each of the plurality of patients and the value representing a deviation for each of the healthy subjects.
- Subsequently, the genes, whose expression levels are measured, are classified into at least two gene families using a classification system based on the function of molecules encoded by the genes, and the average of values representing deviations for the gene belonging to each of the gene families is obtained as an average for each gene family with respect to each of the plurality of patients and the plurality of healthy subjects.
- As used herein, the term “classification system based on the function of molecules encoded by the genes” means a database in which genes are classified according to the function of molecules encoded by the genes. Known databases may be used, examples of which include Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), MetaCyc, GenMAPP, BioCarta, KeyMolnet, and Online Mendelian Inheritance in Man (OMIM). In particular, Gene Ontology is preferably used, in which gene families are defined with terms called “GO Terms.”
- These databases are available from the URLs shown in Table 1 below.
-
TABLE 1 Databases URL GO http://www.geneontology.org/index.shtml KEGG http://www.kegg.jp/kegg/brite.html MetaCyc http://metacyc.org/META/class-tree?object=Gene- Ontology-Terms GenMAPP http://www.genmapp.org/ BioCarta http://www.biocarta.com/genes/allPathways.asp KeyMolnet http://www.immd.co.jp/keymolnet/index.html OMIM http://www.ncbi.nlm.nih.gov/omim/ - In this step, the genes, whose expression levels are measured, are first classified into at least two gene families using the classification system. The average for each classified gene family is then obtained with respect to each of the plurality of patients and the plurality of healthy subjects in the same manner as in the step of obtaining the average for the subject described above.
- Subsequently, a significance probability is obtained between the average for each gene family with respect to the plurality of patients and the average for each corresponding gene family with respect to the plurality of healthy subjects.
- As used herein, the term “corresponding gene family” means the same gene family as the gene family for which the average is obtained with respect to the plurality of patients.
- A t-test may be used to determine the significance probability (hereinafter also referred to as “p-value”) between the average for each gene family with respect to the plurality of patients and the average for each corresponding gene family with respect to the plurality of healthy subjects.
- When the resulting p-value for the gene family is 0.05 or less, the gene family is identified as a disease-determining gene family related to the target disease.
- In the determination method of the invention, at least two selected from the gene families identified by the above procedure are used as disease-determining gene families related to the target disease. The number of the selected disease-determining gene families is preferably as large as possible.
- In the determination method of the invention, the levels of expression of the gene transcription products are not directly used, but values representing deviations are obtained from the expression levels and then used to determine the average for the disease-determining gene family, and the resulting average is used, so that a subject having the target disease can be clearly and stably distinguished from healthy subjects.
- For example, the determination method of the invention is particularly suitable for use in determining the presence of such a disease as Crohn's disease, Huntington's disease, or endometriosis.
- Crohn's disease is a disease of unknown etiology, which has a granulomatous, inflammatory lesion associated with an ulcer or fibrosis and can affect the whole of the digestive tract from the oral cavity to the anus. Now, at least 20,000 people in Japan suffer from this disease. Common symptoms of this disease include stomachache, diarrhea, weight loss, fever, and anal lesion. While confirmed diagnosis of Crohn's disease is performed by endoscopy, it is believed that early detection of this disease can be achieved by screening test using a less invasive test such as blood test. The determination method of the invention may be performed on a subject suspected of having Crohn's disease, so that a reliable determination result can be obtained as an index of diagnosis.
- When the determination method of the invention is used to determine the presence of Crohn's disease, examples of the disease-determining gene family include a G protein-related gene family, a blood coagulation-related gene family, an oxidative stress-related gene family, a phagocytosis-related gene family, and a fat oxidation-related gene family.
- According to the GO Terms, the above five gene families are categorized as “heterotrimeric G-protein complex” (GO:0005834), “blood coagulation” (GO:0007596), “response to oxidative stress” (GO:0006979), “phagocytosis, engulfment” (GO:0006911), and “fatty acid oxidation” (GO:0019395), respectively.
- Huntington's disease is a chronic progressive neurodegenerative disease whose main symptoms include involuntary movement (mainly choreic movement), mental manifestation, and dementia. When diagnosed, this disease must be discriminated from symptomatic chorea caused by cerebrovascular disorders such as cerebral bleeding, drug-induced chorea caused by antipsychotic drugs, and other diseases such as Wilson's disease. Therefore, the determination method of the invention may be performed on a subject suspected of having Huntington's disease, so that a reliable determination result can be obtained as an index of diagnosis.
- When the determination method of the invention is used to determine the presence of Huntington's disease, examples of the disease-determining gene family include a microtubule-related gene family, a mitochondria-related gene family, and a prostaglandin-related gene family.
- According to the GO terms, the three gene families are categorized as “microtube” (GO:0005874), “mitochondrion” (GO:0005739), and signal transduction (GO:0007165), respectively.
- Endometriosis is a disease in which endometria or endometrial-like tissues grow in the uterine cavity or outside the uterine body. Main symptoms of endometriosis are menstrual colic and dysmenorrhea. Therefore, endometriosis is difficult to be discriminated from dysmenorrhea. Thus, the determination method of the invention may be performed on a subject suspected of having endometriosis, so that a reliable determination result can be obtained as an index of diagnosis.
- When the determination method of the invention is used to determine the presence of endometriosis, examples of the disease-determining gene family include a cytokine synthesis process-related gene family, a cytokine-mediated signaling-related gene family, and an immunoglobulin-mediated immune response-related gene family.
- According to the GO terms, the three gene families are categorized as “cytokine biosynthetic process” (GO:0042089), “cytokine-mediated signaling pathway” (GO:0019221), and “immunoglobulin mediated immune response” (GO:0016064), respectively.
- When the determination method of the invention is used, a patient with the target disease is preferably determined to be “positive” at a sensitivity of 80% or more, more preferably 85% or more, even more preferably 90% or more. When the determination method of the invention is used, a healthy subject is preferably determined to be “negative” at a specificity of 80% or more, more preferably 85% or more, even more preferably 90% or more.
- The determination method of the invention, which shows such high sensitivity and specificity, can stably provide a high-accuracy index to aid in diagnosing the target disease.
- Another embodiment of the invention is directed to a program that enables a computer to execute the method of the invention for determining the presence of a disease. Specifically, the program of the invention includes a program for determining the presence of a disease, which enables a computer to function as:
- receiving means for receiving data on the levels of expression of transcription products of genes in a biological sample obtained from a subject suspected of having a target disease, wherein the genes comprise at least one gene belonging to each of at least two disease-determining gene families related to the target disease;
- deviation obtaining means for obtaining values representing deviations by standardizing the levels of the expression based on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects;
- average obtaining means for obtaining the average of values representing deviations with respect to the gene belonging to each of the disease-determining gene families;
- determination means for determining, using the average, whether or not the subject has the target disease; and
- output means for outputting the result of the determination by the determination means.
- The program of the invention may also enable a computer to function as disease-determining genes-identifying means. Specifically, the program of the invention includes a program for determining the presence of a disease, which further enables a computer to function as:
- receiving means for receiving the levels of expression of transcription products of genes in a biological sample obtained from each of a plurality of patients having the target disease and a plurality of healthy subjects;
- deviation obtaining means for obtaining values representing deviations for each of the plurality of patients by standardizing the levels of the expression in each of the plurality of patients based on the levels of expression of the transcription products of the corresponding genes in the plurality of healthy subjects and for obtaining values representing deviations for each of the plurality of healthy subjects by standardizing the levels of the expression in each of the plurality of healthy subjects;
- average obtaining means for classifying the genes, whose expression levels are measured, into at least two gene families using a classification system based on the function of molecules encoded by the genes and for obtaining, as an average for each gene family, the average of values representing deviations for the gene belonging to each of the gene families with respect to each of the plurality of patients and the plurality of healthy subjects;
- significance probability obtaining means for obtaining a significance probability between the average for each gene family with respect to the plurality of patients and the average for each corresponding gene family with respect to the plurality of healthy subjects; and
- gene family identifying means for identifying the gene family as a disease-determining gene family related to the target disease when the significance probability for the gene family is 0.05 or less.
-
FIG. 1 shows an example of an apparatus for determining the presence of a target disease, in which the program of the invention is used. The apparatus includes a gene transcription product expression level-measuringdevice 1, acomputer 2, and acable 3 connecting them together. Data on the expression levels measured by the gene transcription product expression level-measuringdevice 1, such as signals based on the intensity of fluorescence, the amount of current, or the like can be sent to thecomputer 2 through thecable 3. Alternatively, the gene transcription product expression level-measuringdevice 1 may be unconnected with thecomputer 2. In this case, the expression level data may be input to the computer to run the program described above. - The
computer 2 obtains the values representing deviations from the resulting expression levels, obtains the average of the resulting values representing deviations for each of at least two gene families, and determines whether or not the subject has the target disease based on the average. - The determination method of the invention can be implemented by the program of the invention in cooperation with the
computer 2 including a central processing unit, a storage unit, a reader for a recording medium such as a compact disc or a Floppy® disc, an input unit such as a keyboard, and an output unit such as a display.FIG. 2 shows a more specific example of the computer system for implementing the method. - The
computer 2 shown inFIG. 2 mainly includes amain unit 110, adisplay 120, and aninput unit 130. Themain unit 110 mainly includes aCPU 110 a, aROM 110 b, aRAM 110 c, ahard disk 110 d, areadout device 110 e, an input-output interface 110 f, and animage output interface 110 g. TheCPU 110 a,ROM 110 b,RAM 110 c,hard disk 110 d,readout device 110 e, input-output interface 110 f, andimage output interface 110 g are connected to one another through abus 110 h to allow data communication. - The
CPU 110 a can execute the computer program stored in theROM 110 b and the computer program loaded on theRAM 110 c. - The
ROM 110 b includes a mask ROM, PROM, EPROM, EEPROM, or the like. TheROM 110 b stores the computer program to be executed by theCPU 110 a and the data to be used for the execution. - The
RAM 110 c includes an SRAM, DRAM or the like. TheRAM 110 c is used to read out the computer program stored in theRAM 110 c,ROM 110 b, andhard disk 110 d. When these computer programs are executed, theRAM 110 c is also used as a work area for theCPU 110 a. - Various computer programs to be executed by the
CPU 110 a, such as an operating system and application programs, and data to be used for the execution of the computer program are stored on thehard disk 110 d. In an embodiment of the invention, the data stored on thehard disk 110 d also include data on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects (hereinafter referred to as “stored expression level data”), data on disease-determining gene families (hereinafter referred to as “disease-determining gene family data”), and a determination formula for determining whether or not the subject has the target disease. The determination formula is obtained using the discriminant analysis method based on averages previously determined with biological samples collected from healthy subjects and averages previously determined with biological samples collected from patients having the target disease. Anapplication program 140 a as described below is also installed on thehard disk 110 d. - The
readout device 110 e includes a flexible disk drive, a CD-ROM drive, or a DVD-ROM drive or the like and can read out the computer program or data stored on atransportable recording medium 140. Anapplication program 140 a that enables the computer to execute the method of this embodiment is also stored on thetransportable recording medium 140. TheCPU 110 a can read out theapplication program 140 a according to the invention from thetransportable recording medium 140, and theapplication program 140 a can be installed on thehard disk 110 d. - The
application program 140 a may be provided not only from thetransportable recording medium 140 but also from external equipment communicably connected to themain unit 110 of the computer through a telecommunication line (regardless of whether it is wire-line or wireless). For example, theapplication program 140 a may be stored on the hard disk of a server computer on the Internet, and theCPU 110 a may access the server computer to download the application program and install it on thehard disk 110 d. - An operating system to provide a graphical user interface environment, such as Windows® manufactured and sold by Microsoft Corporation in the United States is installed on the
hard disk 110 d. A description will be given below, provided that theapplication program 140 a according to this embodiment runs on the operating system. - For example, the input-
output interface 110 f includes a serial interface such as USB, IEEE 1394, or RS-232C, a parallel interface such as SCSI, IDE, or IEEE 1284, and an analog interface including a D/A converter, an A/D converter, or the like. The transcription product expression level-measuringdevice 1 is connected to the input-output interface 110 f through thecable 3 so that the expression level data determined in the transcription product expression level-measuringdevice 1 can be input to themain unit 110 of the computer. Theinput unit 130 including a keyboard and a mouse is also connected to the input-output interface 110 f so that the user can input data to themain unit 110 of the computer using theinput unit 130. - The
image output interface 110 g is connected to thedisplay 120 including an LCD, CRT, or the like so that an image signal corresponding to the image data sent from theCPU 110 a can be output on thedisplay 120. Thedisplay 120 outputs an image (on the screen) according to the image signal input. -
FIG. 3 is a flow chart more specifically showing how the program of the invention runs on thecomputer 2. - First, when the levels of expression of transcription products of genes are measured in the gene transcription product expression level-measuring
device 1, the transcription product expression level-measuringdevice 1 outputs the data on the measured expression levels (hereinafter referred to as “measured expression level data”) to thecomputer 2. TheCPU 110 a receives the output measured expression level data and stores the data into theRAM 110 c (step S11). - Subsequently, the
CPU 110 a reads out the stored expression level data, which has previously been stored on thehard disk 110 d, and obtains data showing values representing deviations (hereinafter referred to as “deviation data”) based on the input measured expression level data and the stored expression level data (step S12). - Subsequently, the
CPU 110 a reads out the disease-determining gene family data, which has previously been stored on thehard disk 110 d, and determines whether or not the genes for the deviation data belong to the disease-determining gene families, so that the deviation data obtained is classified according to disease-determining gene family (step S13). - Subsequently, the
CPU 110 a uses the deviation data classified according to disease-determining gene family to obtain data showing the average of values representing deviations for each of the disease-determining gene families (hereinafter referred to as “average data”) (step S14). - Subsequently, the
CPU 110 a reads out the determination formula, which has previously been stored on thehard disk 110 d, and applies the average data to the determination formula to determine whether or not the subject has the target disease (step S15). - Subsequently, the
CPU 110 a stores the result of determining whether or not the subject has the target disease into theRAM 110 c and displays the result on thedisplay 120 of the computer through theimage output interface 110 g (step S16). - While, in this embodiment, the
CPU 110 a obtains the measured expression level data from the transcription product expression level-measuringdevice 1 through the input-output interface 110 f, any other configuration may also be used. For example, the levels of expression of gene transcription products may be determined in a transcription product expression level-measuring device independent of thecomputer 2, and the operator may use theinput unit 130 to input the measured expression level data to thecomputer 2. -
FIG. 4 is a flow chart specifically showing how the program of the invention runs on the computer to enable it to function as disease-determining gene-identifying means. In this embodiment, thehard disk 110 d stores data on a classification system based on the function of molecules encoded by genes (hereinafter referred to as “classification system data”). - First, when the levels of expression of transcription products of genes in a plurality of patients and a plurality of healthy subjects are measured in the gene transcription product expression level-measuring
device 1, the transcription product expression level-measuringdevice 1 outputs, to thecomputer 2, data on the measured expression levels in the plurality of patients (hereinafter referred to as “measured patient expression level data”) and data on the measured expression levels in the plurality of healthy subjects (hereinafter referred to as “measured healthy subject expression level data”). TheCPU 110 a receives the output measured patient expression level data and the output measured healthy subject expression level data, and stores the data into theRAM 110 c (step S21). - Subsequently, the
CPU 110 a standardizes the measured patient expression level data for each of the plurality of patients based on the measured healthy subject expression level data on the transcription products of the corresponding genes in the plurality of healthy subjects, so that data showing values representing deviations are obtained for each of the plurality of patients (hereinafter referred to as “patient deviation data”), and theCPU 110 a also standardizes the measured expression level data for each of the plurality of healthy subjects, so that data showing values representing deviations are obtained for each of the plurality of healthy subjects (hereinafter referred to as “healthy subject deviation data”) (step S22). - Subsequently, the
CPU 110 a reads out the classification system data, which has previously been stored on thehard disk 110 d, and classifies the patient deviation data according to gene family, based on the genes for the patient deviation data. TheCPU 110 a also classifies the healthy subject deviation data according to gene family, based on the genes for the healthy subject deviation data (step S23). - Subsequently, the
CPU 110 a uses the patient deviation data classified according to gene family to obtain data showing the average of values representing deviations for each of the gene families (hereinafter referred to as “patient average data”). TheCPU 110 a also uses the healthy subject deviation data classified according to gene family to obtain data showing the average of values representing deviations for each of the gene families (hereinafter referred to as “healthy subject average data”) (step S24). - Subsequently, the
CPU 110 a uses the resulting patient average data and healthy subject average data for each gene family to obtain data showing the significance probability between the average for the plurality of patients and the average for the plurality of healthy subjects (hereinafter referred to as “significance probability data”) (step S25). - Subsequently, the
CPU 110 a uses the resulting significance probability data to identify the gene family for which the significance probability is 0.05 or less (step S26). - Subsequently, the
CPU 110 a stores the identified gene family into theRAM 110 c and displays it on thedisplay 120 of the computer through theimage output interface 110 g (step S27). - While, in this embodiment, the
CPU 110 a obtains the measured patient expression level data and the measured healthy subject expression level data from the transcription product expression level-measuringdevice 1 through the input-output interface 110 f, any other configuration may also be used. For example, the levels of expression of the gene transcription products in the plurality of patients and healthy subjects may be determined in a transcription product expression level-measuring device independent of thecomputer 2, and the operator may use theinput unit 130 to input the measured patient expression level data and the measured healthy subject expression level data to thecomputer 2. - While, in this embodiment, the identified gene family is displayed on the
display 120 in step S27, the data on the identified gene family may also only be stored as disease-determining gene family data into theRAM 110 c. The stored disease-determining gene family data may also be used, for example, in the operation of thecomputer 2 shown inFIG. 2 . - The invention is more specifically described in the examples below, which are not intended to limit the scope of the invention.
- (1) Identification of Crohn's Disease-Determining Gene Families
- Data available from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo), which was a gene expression data bank, were used in Example 1, which were data on the levels of expression of gene transcription products in the blood of Crohn's disease patients and healthy subjects. The data were normalized data obtained by normalization of raw measured signal data, which are available from http://www.ncbi.nlm.nih.gov/sites/GDSbrowser?acc=GDS1615.
- (1-1) Selection of Samples and Probe Sets
- Data on Crohn's disease patients 1 (29 samples) and data on healthy subjects 1 (21 samples) were randomly selected from the data described above, and these data were used to identify Crohn's disease-determining gene families.
- The data on Crohn's disease patients and healthy subjects obtained from the GEO were produced by analysis using GeneChip® U133A (Affymetrix, Inc.), a DNA chip. The DNA chip has 22,283 probe sets, which include probe sets for the same gene.
- Concerning the same gene for which a plurality of probe sets are provided on the DNA chip, therefore, only a probe set showing the maximum signal value was taken from the probe sets for the same gene. In addition, probe sets with a signal value of 50 or less were also excluded, because the reproducibility of the measured values was considered to be low. As a result, genes for 9,331 probe sets were subjected to the analysis described below.
- (1-2) Obtaining Expression Level z-Scores
- Averages and standard deviations were calculated using all signal values obtained from the healthy subjects 1 (21 samples) with respect to transcription products of the genes for the 9,331 probe sets selected as described above. Values representing deviations (z-scores) were calculated for each of the 9,331 genes using these values and the following formula: z-score={(the signal value of the transcription product of each gene)−(the average of the signal values of the transcription product of the corresponding gene in the healthy subjects 1 (21 samples))}/(the standard deviation of the signal values of the transcription product of the corresponding gene in the healthy subjects 1 (21 samples))
- (1-3) Gene Classification and Obtaining Average for Each Gene Family
- The 9,331 genes were classified into gene families (GO Terms) based on the classification of Gene Ontology (readable from http://www.geneontology.org/index.shtml), and the average of the z-scores for the Crohn's disease patients 1 (29 samples) obtained in the section (1-2) was calculated with respect to the gene within each GO Term.
- The average of the z-scores for the healthy subjects 1 (21 samples) was also calculated in the same manner with respect to the gene within each GO Term.
- (1-4) Selecting Gene Families Having Significant Difference Between Healthy Subjects and Crohn's Disease Patients
- A t-test was performed using the averages obtained as described above for the healthy subjects and the Crohn's disease patients with respect to each GO Term, so that a significance probability (p-value) was obtained.
- GO Terms for which the resulting p-value was 0.05 or less (p-value≦5.0E-02) were extracted from the GO Terms used.
- Subsequently, hierarchical clustering was performed using the z-scores for all genes contained in the extracted GO Terms, and synchronously varying gene clusters were selected. The clustering was performed using software Cluster 3.0 (available from http://bonsai.ims.u-tokyo.ac.jp/˜mdehoon/software/cluster/software.htm), and the result was displayed using Java Tree View (available from http://sourceforge.net/projects/jtreeview/files/).
- The average of the z-scores for the gene contained in each cluster was used as a cluster score, when a t-test was performed on the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples). From the clusters for which the resulting p-value was 0.05 or less, the G protein-related gene family, blood coagulation-related gene family, oxidative stress-related gene family, phagocytosis-related gene family, and fat oxidation-related gene family were selected as Crohn's disease-determining gene families. Table 2 shows these gene families, genes belonging to each family, and the p-value for each family.
-
FIG. 5 shows the distribution of the average of the z-scores for thehealthy subjects 1 and the Crohn'sdisease patients 1 with respect to each gene family selected as described above. -
TABLE 2 Gene families Gene symbol Gene title G protein GNG3 guanine nucleotide binding protein (G protein), gamma 3 (p = 1.20E−12) GNG7 guanine nucleotide binding protein (G protein), gamma 7 GNA15 guanine nucleotide binding protein (G protein), alpha 15 (Gq class) GNB5 guanine nucleotide binding protein (G protein), beta 5 GNAS GNAS complex locus GNG5 guanine nucleotide binding protein (G protein), gamma 5 GNG11 guanine nucleotide binding protein (G protein), gamma 11 GNB1 guanine nucleotide binding protein (G protein), beta polypeptide 1 GNG4 guanine nucleotide binding protein (G protein), gamma 4 Blood coagulation GP1BA glycoprotein Ib (platelet), alpha polypeptide (p = 4.70E−05) GP1BB glycoprotein Ib (platelet), beta polypeptide///septin 5 ITGB3 integrin, beta 3 (platelet glycoprotein IIIa, antigen CD61) GP9 glycoprotein IX (platelet) F13A1 coagulation factor XIII, A1 polypeptide Fat oxidation ACOX1 acyl-Coenzyme A oxidase 1, palmitoyl (p = 3.80E−10) ADIPOR2 adiponectin receptor 2 ADIPOR1 adiponectin receptor 1 ALOX12 arachidonate 12-lipoxygenase Oxidative stress GPX1 glutathione peroxidase 1 (p = 6.90E−10) PTGS1 prostaglandin-endoperoxide synthase 1 (prostaglandin G/H synthase and cyclooxygenase) CLU clusterin PDLIM1 PDZ and LIM domain 1 Phagocytosis FCER1G Fc fragment of IgE, high affinity I, receptor for; gamma polypeptide (p = 2.00E−07) CLEC7A C-type lectin domain family 7, member A VAMP7 vesicle-associated membrane protein 7 FCGR1A Fc fragment of IgG, high affinity Ia, receptor (CD64)/// Fc fragment of IgG, high affinity Ic, receptor (CD64) - (2) Evaluating the Accuracy of the Determination Method of the Invention
- (2-1) Determination for the Samples Used in the Identification of Crohn's Disease-Determining Gene Families
- The averages for the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples) with respect to each of the five Crohn's disease-determining gene families were each input to a support vector machine (SVM incorporated in statistical analysis software GeneSpring). The SVM containing the input averages for the 50 samples was then used to determine whether each sample was positive (or had Crohn's disease) or negative (or healthy).
- The result is shown in
FIG. 6A . InFIG. 6A , “sensitivity” is the rate at which the Crohn's disease patients are determined to be “positive,” and “specificity” is the rate at which the healthy subjects are correctly identified. In the drawing, “concordance rate” is the rate at which the Crohn's disease patients and the healthy subjects are determined to be “positive (+)” and “negative (−),” respectively. The result shows that the determination method of the invention makes it possible to identify Crohn's disease patients and healthy subjects at a sensitivity of 90% or more and a specificity of 90% or more. - (2-2) Evaluating the Reproducibility of the Determination Method of the Invention
- Additionally, data on Crohn's disease patients 2 (30 samples) and healthy subjects 2 (21 samples), which were different from the data selected in the section (1-1), were used to evaluate the reproducibility of the determination method of the invention. The determination was performed on these data using the SVM containing the input averages for the samples used in the identification of Crohn's disease-determining gene families in the section (2-1).
- The result is shown in
FIG. 6B . The result shows that even for samples different from those used in the identification of Crohn's disease-determining gene families, the determination method of the invention makes it possible to stably distinguish between healthy subjects and Crohn's disease patients at a sensitivity of 95% or more and a specificity of 90% or more. - In this comparative example, a method of determining the presence of a disease directly based on the levels of expression of gene transcription products in healthy subjects and patients was used as a conventional determination method. The accuracy of the determination of the presence of Crohn's disease by such a conventional method was evaluated.
- (1) Determination Using Genes Belonging to Crohn's Disease-Determining Gene Families
- (1-1) Samples Used in the Identification of Crohn's Disease-Determining Gene Families
- The expression levels in the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples) with respect to each of the 26 genes in Table 1 were input to the SVM. The accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 50 samples.
- The result is shown in
FIG. 7A . The result shows that the conventional method identified the Crohn's disease patients and the healthy subjects at a sensitivity of 100% and a specificity of 100%. - (1-2) Evaluating the Reproducibility of the Conventional Determination Method
- Data on Crohn's disease patients 2 (30 samples) and healthy subjects 2 (21 samples) were then used to evaluate the reproducibility of the conventional determination method. The determination was performed on these samples using the SVM to which the expression levels in the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples) were input in the section (1-1).
- The result is shown in
FIG. 7B . The result shows that for samples different from those used in the identification of Crohn's disease-determining gene families, the specificity of the conventional determination method was reduced to 65% or less, although the sensitivity was 90% or more. It is therefore apparent that the conventional determination method is more likely to misidentify healthy subjects as Crohn's disease patients than the determination method of the invention. - (2) Determination Using Genes Other than Those Belonging to Crohn's Disease-Determining Gene Families
- (2-1) Samples Used in the Identification of Crohn's Disease-Determining Gene Families
- Genes other than those belonging to Crohn's disease-determining gene families (26 genes in Table 1) were further identified so that an examination could be performed using such genes. Specifically, a t-test was performed to calculate the significance probability (p-value) between the expression levels in the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples), and the gene for which the resulting p-value was 0.05 or less with respect to the expression level was determined to be used for the determination. As a result, five genes were identified. Table 3 shows these genes and the p-value for each gene.
FIG. 8 also shows the distribution of the level of expression of the transcription product of each gene in thehealthy subjects 1 and the Crohn'sdisease patients 1. -
TABLE 3 Probe set ID Gene symbol Gene title 202162_s_at CNOT8 CCR4-NOT transcription 8.06E−15 complex, subunit 8200828_s_at ZNF207 zinc finger protein 207 8.60E−15 201133_s_at PJA2 praja ring finger 25.92E−14 204725_s_at NCK1 NCK adaptor protein 11.11E−13 203432_at AW272611 thymopoietin 3.16E−13 - The expression levels in the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples) with respect to each of these genes were each input to the SVM. The accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 50 samples.
- The result is shown in
FIG. 9A . The result shows that the conventional method using genes other than those belonging to Crohn's disease-determining gene families identified the Crohn's disease patients and the healthy subjects at a sensitivity of 95% or more and a specificity of 95% or more. - (2-2) Evaluating the Reproducibility of the Conventional Determination Method
- Data on the Crohn's disease patients 2 (30 samples) and the healthy subjects 2 (21 samples) were then used to evaluate the reproducibility of the conventional determination method using the five genes. The determination was performed on these samples using the SVM to which the expression levels in the healthy subjects 1 (21 samples) and the Crohn's disease patients 1 (29 samples) were input in the section (2-1).
- The result is shown in
FIG. 9B . The result shows that for samples different from those used in the identification of Crohn's disease-determining gene families, the specificity of the conventional determination method was reduced to 40% or less, although the sensitivity was 90% or more. It is therefore apparent that the conventional determination method using genes other than those belonging to Crohn's disease-determining gene families is more likely to misidentify healthy subjects as Crohn's disease patients than the determination method of the invention. - The results of Example 1 and Comparative Example 1 show that the determination method of the invention can achieve more accurate and more stable determination than conventional methods in which the presence of Crohn's disease is determined directly based on the levels of expression of gene transcription products in healthy subjects and Crohn's disease patients.
- (1) Identification of Huntington's Disease-Determining Gene Families
- Data obtained from GEO were used in Example 2, which were data on the levels of expression of gene transcription products in the blood of Huntington's disease patients and healthy subjects. The data were normalized data obtained by normalization of raw measured signal data, which are available from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1751.
- (1-1) Selection of Samples and Probe Sets
- Data on Huntington's disease patients 1 (6 samples) and data on healthy subjects 3 (7 samples) were randomly selected from the data described above, and these data were used to identify Huntington's disease-determining gene families.
- The data on Huntington's disease patients and healthy subjects obtained from the GEO were produced by analysis using GeneChip® U133A (Affymetrix, Inc.). Similarly to the section (1-1) of Example 1, concerning the same gene for which a plurality of probe sets are provided on the DNA chip, only a probe set showing the maximum signal value was taken from the probe sets for the same gene. In addition, probe sets with a signal value of 50 or less were also excluded, because the reproducibility of the measured values was considered to be low. As a result, genes for 8,370 probe sets were subjected to the analysis described below.
- (1-2) Obtaining Expression Level z-Scores
- Averages and standard deviations were calculated using all signal values obtained from the healthy subjects 3 (7 samples) with respect to transcription products of the genes for the 8,370 probe sets selected as described above. Values representing deviations (z-scores) were calculated for each of the 8,370 genes using these values and the following formula: z-score={(the signal value of the transcription product of each gene)−(the average of the signal values of the transcription product of the corresponding gene in the healthy subjects 3 (7 samples))}/(the standard deviation of the signal values of the transcription product of the corresponding gene in the healthy subjects 3 (7 samples))
- (1-3) Gene Classification and Obtaining Average for Each Gene Family
- The 8,370 genes were classified into gene families (GO Terms) based on the classification of Gene Ontology, and the average of the z-scores for the Huntington's disease patients 1 (6 samples) obtained in the section (1-2) was calculated with respect to the gene within each GO Term.
- The average of the z-scores for the healthy subjects 3 (7 samples) was also calculated in the same manner with respect to the gene within each GO Term.
- (1-4) Selecting Gene Families Having Significant Difference Between Healthy Subjects and Huntington's Disease Patients
- A t-test was performed using the averages obtained as described above for the healthy subjects and the Huntington's disease patients with respect to each GO Term, so that a significance probability (p-value) was obtained.
- GO Terms for which the resulting p-value was 0.05 or less (p-value≦5.0E-02) were extracted from the GO Terms used.
- Subsequently, hierarchical clustering was performed using the z-scores for all genes contained in the extracted GO Terms, and synchronously varying gene clusters were selected. The clustering was performed using software Cluster 3.0 (available from http://bonsai.ims.u-tokyo.ac.jp/˜mdehoon/software/cluster/software.htm), and the result was displayed using Java Tree View (available from http://sourceforge.net/projects/jtreeview/files/).
- The average of the z-scores for the gene contained in each cluster was used as a cluster score, when a t-test was performed on the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples). From the clusters for which the resulting p-value was 0.05 or less, the microtubule-related gene family, mitochondria-related gene family, and prostaglandin-related gene family were selected as Huntington's disease-determining gene families. Table 4 shows these gene families, genes belonging to each family, and the p value for each family.
-
FIG. 10 shows the distribution of the average of the z-scores for thehealthy subjects 3 and the Huntington'sdisease patients 1 with respect to each gene family selected as described above. -
TABLE 4 Gene families Gene symbol Gene title Microtubule DYNC1LI1 dynein, cytoplasmic 1, light intermediate chain 1 (p = 2.62E−02) DYNLL1 dynein, light chain, LC8-type 1 DYNLT1 dynein, light chain, Tctex-type 1 DYNLT3 dynein, light chain, Tctex-type 3 Mitochondria ATP5F1 ATP synthase, H+ transporting, mitochondrial F0 complex, subunit B1 (p = 3.28E−02) ATP5J ATP synthase, H+ transporting, mitochondrial F0 complex, subunit F6 ATP5L ATP synthase, H+ transporting, mitochondrial F0 complex, subunit G ATP5C1 ATP synthase, H+ transporting, mitochondrial F1 complex, gamma polypeptide 1 ATP5O ATP synthase, H+ transporting, mitochondrial F1 complex, O subunit (oligomycin sensitivity conferring protein) COX6A1 cytochrome c oxidase subunit VIa polypeptide 1 COX7A2 cytochrome c oxidase subunit VIIa polypeptide 2 (liver) CYCS cytochrome c, somatic MRPL18 mitochondrial ribosomal protein L18 MRPS35 mitochondrial ribosomal protein S35 NDUFA4 NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 4, 9 kDa NDUFA9 NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 9, 39 kDa NDUFB1 NADH dehydrogenase (ubiquinone) 1 beta subcomplex, 1, 7 kDa NDUFB3 NADH dehydrogenase (ubiquinone) 1 beta subcomplex, 3, 12 kDa NDUFB5 NADH dehydrogenase (ubiquinone) 1 beta subcomplex, 5, 16 kDa NDUFC1 NADH dehydrogenase (ubiquinone) 1, subcomplex unknown, 1, 6 kDa NDUFS4 NADH dehydrogenase (ubiquinone) Fe—S protein 4, 18 kDa (NADH-coenzyme Q reductase) TIMM17A translocase of inner mitochondrial membrane 17 homolog A TIMM8B translocase of inner mitochondrial membrane 8 homolog B TOMM20 translocase of outer mitochondrial membrane 20 homolog TOMM7 translocase of outer mitochondrial membrane 7 homolog UQCRH ubiquinol-cytochrome c reductase hinge protein UQCR ubiquinol-cytochrome c reductase, 6.4 kDa subunit UQCRQ ubiquinol-cytochrome c reductase, complex III subunit VII, 9.5 kDa Prostaglandin PTGER2 prostaglandin E receptor 2 (subtype EP2), 53 kDa (p = 7.84E−03) PTGER4 prostaglandin E receptor 4 (subtype EP4) PTGES3 prostaglandin E synthase 3 (cytosolic) - (2) Evaluating the Accuracy of the Determination Method of the Invention
- (2-1) Determination for the Samples Used in the Identification of Huntington's Disease-Determining Gene Families
- The averages for the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples) with respect to each of the three Huntington's disease-determining gene families were each input to a SVM. The SVM containing the input averages for the 13 samples was then used to determine whether each sample was positive (or had Huntington's disease) or negative (or healthy).
- The result is shown in
FIG. 11A . The result shows that the determination method of the invention makes it possible to identify Huntington's disease patients and healthy subjects at a sensitivity of 100% and a specificity of 100%. - (2-2) Evaluating the Reproducibility of the Determination Method of the Invention
- Additionally, data on Huntington's disease patients 2 (6 samples) and healthy subjects 4 (7 samples), which were different from the data selected in the section (1-1), were used to evaluate the reproducibility of the determination method of the invention. The determination was performed on these data using the SVM containing the input averages for the samples used in the identification of Huntington's disease-determining gene families in the section (2-1).
- The result is shown in
FIG. 11B . The result shows that even for samples different from those used in the identification of Huntington's disease-determining gene families, the determination method of the invention makes it possible to stably distinguish between healthy subjects and Huntington's disease patients at a sensitivity of 80% or more and a specificity of 100%. - In this comparative example, a method of determining the presence of a disease directly based on the levels of expression of gene transcription products in healthy subjects and patients was used as a conventional determination method. The accuracy of the determination of the presence of Huntington's disease by such a conventional method was evaluated.
- (1) Determination Using Genes Belonging to Huntington's Disease-Determining Gene Families
- (1-1) Samples Used in the Identification of Huntington's Disease-Determining Gene Families
- The expression levels in the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples) with respect to each of the 27 genes in Table 3 were input to the SVM. The accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 13 samples.
- The result is shown in
FIG. 12A . The result shows that the conventional method identified the Huntington's disease patients and the healthy subjects at a sensitivity of 100% and a specificity of 100%. - (1-2) Evaluating the Reproducibility of the Conventional Determination Method
- Data on the Huntington's disease patients 2 (6 samples) and healthy subjects 4 (7 samples) were then used to evaluate the reproducibility of the conventional determination method. The determination was performed on these samples using the SVM to which the expression levels in the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples) were input in the section (1-1).
- The result is shown in
FIG. 12B . The result shows that for samples different from those used in the identification of Huntington's disease-determining gene families, the sensitivity of the conventional determination method was reduced to 70% or less, although the specificity was 100%. It is therefore apparent that the conventional determination method is more likely to misidentify Huntington's disease patients as healthy subjects than the determination method of the invention. - (2) Determination Using Genes Other than Those Belonging to Huntington's Disease-Determining Gene Families
- (2-1) Samples Used in the Identification of Huntington's Disease-Determining Gene Families
- Genes other than those belonging to Huntington's disease-determining gene families (27 genes in Table 3) were further identified so that an examination could be performed using such genes. Specifically, a t-test was performed to calculate the significance probability (p-value) between the expression levels in the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples), and the gene for which the resulting p-value was 0.05 or less with respect to the expression level was determined to be used for the determination. As a result, ten genes were identified. Table 5 shows these genes and the p-value for each gene.
FIG. 13 also shows the distribution of the level of expression of the transcription product of each gene in thehealthy subjects 3 and the Huntington'sdisease patients 1. -
TABLE 5 ProbeSet ID Gene symbol Gene title p-value 203909_at SLC9A6 solute carrier family 9 (sodium/hydrogen exchanger), member 66.59E−07 219065_s_at MEMO1 mediator of cell motility 12.26E−06 218854_at DSE dermatan sulfate epimerase 2.63E−06 220933_s_at ZCCHC6 zinc finger, CCHC domain containing 6 3.26E−06 203024_s_at C5orf15 chromosome 5 open reading frame 15 4.00E−06 208801_at SRP72 signal recognition particle 72 kDa 5.40E−06 215492_x_at LOC441150 similar to RIKEN cDNA 2310039H08///ribosomal protein L7-like 8.86E−06 1///pre T-cell antigen receptor alpha///KIAA0240/ // canopy 3 homolog 208335_s_at DARC Duffy blood group, chemokine receptor 1.12E−05 203474_at IQGAP2 IQ motif containing GTPase activating protein 21.29E−05 218005_at ZNF22 zinc finger protein 22 (KOX 15) 1.31E−05 - The expression levels in the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples) with respect to each of these genes were input to the SVM. The accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 13 samples.
- The result is shown in
FIG. 14A . The result shows that the conventional method using genes other than those belonging to Huntington's disease-determining gene families identified the Huntington's disease patients and the healthy subjects at a sensitivity of 100% and a specificity of 100%. - (2-2) Evaluating the Reproducibility of the Conventional Determination Method
- Data on the Huntington's disease patients 2 (6 samples) and the healthy subjects 4 (7 samples) were then used to evaluate the reproducibility of the conventional determination method using the ten genes. The determination was performed on these samples using the SVM to which the expression levels in the healthy subjects 3 (7 samples) and the Huntington's disease patients 1 (6 samples) were input in the section (2-1).
- The result is shown in
FIG. 14B . The result shows that for samples different from those used in the identification of Huntington's disease-determining gene families, the sensitivity of the conventional determination method was reduced to 50%, although the specificity was 100%. It is therefore apparent that the conventional determination method using genes other than those belonging to Huntington's disease-determining gene families is more likely to misidentify Huntington's disease patients as healthy subjects than the determination method of the invention. - The results of Example 2 and Comparative Example 2 show that the determination method of the invention can achieve more accurate and more stable determination than conventional methods in which the presence of Huntington's disease is determined directly based on the levels of expression of gene transcription products in healthy subjects and Huntington's disease patients.
- (1) Identification of Endometriosis-Determining Gene Families
- Data obtained from GEO were used in Example 3, which were data on the levels of expression of gene transcription products in normal tissues and lesion tissues of endometriosis patients. The data were normalized data obtained by normalization of raw measured signal data, which are available from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7305 and http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE6364.
- (1-1) Selection of Samples and Probe Sets
- Data on lesion tissues 1 (9 samples) and data on normal tissues 1 (8 samples) were randomly selected from the data described above, and these data were used to identify endometriosis-determining gene families.
- The data on lesion tissues and normal tissues obtained from the GEO were produced by analysis using GeneChip® U133 plus2.0 (Affymetrix, Inc.), a DNA chip. The DNA chip has 54,675 probe sets, which include probe sets for the same gene.
- Concerning the same gene for which a plurality of probe sets are provided on the DNA chip, therefore, only a probe set showing the maximum signal value was taken from the probe sets for the same gene. In addition, probe sets with a signal value of 100 or less were also excluded, because the reproducibility of the measured values was considered to be low. As a result, genes for 16,207 probe sets were subjected to the analysis described below.
- (1-2) Obtaining Expression Level z-Scores
- Averages and standard deviations were calculated using all signal values obtained from the normal tissues 1 (8 samples) with respect to transcription products of the genes for the 16,207 probe sets selected as described above. Values representing deviations (z-scores) were calculated for each of the 16,207 genes using these values and the following formula: z-score={(the signal value of the transcription product of each gene)−(the average of the signal values of the transcription product of the corresponding gene in the normal tissues 1 (8 samples))}/(the standard deviation of the signal values of the transcription product of the corresponding gene in the normal tissues 1 (8 samples))
- (1-3) Gene Classification and Obtaining Average for Each Gene Family
- The 16,207 genes were classified into gene families (GO Terms) based on the classification of Gene Ontology, and the average of the z-scores for the lesion tissues 1 (9 samples) obtained in the section (1-2) was calculated with respect to the gene within each GO Term.
- The average of the z-scores for the normal tissues 1 (8 samples) was also calculated in the same manner with respect to the gene within each GO Term.
- (1-4) Selecting Gene Families Having Significant Difference Between Normal Tissues and Lesion Tissues
- A t-test was performed using the averages obtained as described above for the normal tissues and the lesion tissues with respect to each GO Term, so that a significance probability (p-value) was obtained.
- GO Terms for which the resulting p-value was 0.05 or less (p-value≦5.0E-02) were extracted from the GO Terms used.
- Subsequently, hierarchical clustering was performed using the z-scores for all genes contained in the extracted GO Terms, and synchronously varying gene clusters were selected. The clustering was performed using software Cluster 3.0 (available from http://bonsai.ims.u-tokyo.ac.jp/˜mdehoon/software/cluster/software.htm), and the result was displayed using Java Tree View (available from http://sourceforge.net/projects/jtreeview/files/).
- The average of the z-scores for the gene contained in each cluster was used as a cluster score, when a t-test was performed on the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples). From the clusters for which the resulting p-value was 0.05 or less, the cytokine synthesis process-related gene family, cytokine-mediated signaling-related gene family, and immunoglobulin-mediated immune response-related gene family were selected as endometriosis-determining gene families. Table 6 shows these gene families, genes belonging to each family, and the p-value for each family.
-
FIG. 15 shows the distribution of the average of the z-scores for thenormal tissues 1 and thelesion tissues 1 with respect to each gene family selected as described above. -
TABLE 6 Gene families Gene symbol Gene title Cytokine synthesis CEBPE CCAAT/enhancer binding protein (C/EBP), epsilon process CD28 CD28 molecule (p = 1.25E−03) Cytokine-mediated EREG epiregulin signaling pathway STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) (p = 4.10E−03) STAT5A signal transducer and activator of transcription 5A STAT5B signal transducer and activator of transcription 5B SOCS1 suppressor of cytokine signaling 1 SOCS5 suppressor of cytokine signaling 5 RELA v-rel reticuloendotheliosis viral oncogene homolog A, p65 (avian), nuclear factor of kappa light polypeptide gene enhancer in B-cells 3, CEBPA CCAAT/enhancer binding protein (C/EBP), alpha DUOX2 dual oxidase 2 DUOX1 dual oxidase 1 STAT4 signal transducer and activator of transcription 4 ZNF675 zinc finger protein 675 IL2RB interleukin 2 receptor, beta IRAK3 interleukin-1 receptor-associated kinase 3 KIT v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog LRP8 low density lipoprotein receptor-related protein 8, apolipoprotein e receptor TNFRSF1A tumor necrosis factor receptor superfamily, member 1A PLP2 proteolipid protein 2 (colonic epithelium-enriched) TNFRSF1B tumor necrosis factor receptor superfamily, member 1B TGM2 transglutaminase 2 (C polypeptide, protein-glutamine-gamma-glutamyltransferase) CCR1 chemokine (C—C motif) receptor 1 CCR2 chemokine (C—C motif) receptor 2 PF4 platelet factor 4 (chemokine (C—X—C motif) ligand 4) CX3CL1 chemokine (C—X3—C motif) ligand 1 IL1R1 interleukin 1 receptor, type I CSF2RB colony stimulating factor 2 receptor, beta, low-affinity (granulocyte-macrophage) CLCF1 cardiotrophin-like cytokine factor 1 NUP85 nucleoporin 85 kDa Immunoglobulin- IGHG3 immunoglobulin heavy constant gamma 3 (G3m marker) mediated immune IGHM immunoglobulin heavy constant mu response CD74 CD74 molecule, major histocompatibility complex, class II invariant chain (p = 7.50E−03) FCER1G Fc fragment of IgE, high affinity I, receptor for; gamma polypeptide BCL10 B-cell CLL/lymphoma 10 PRKCD protein kinase C, delta CD27 CD27 molecule MYD88 myeloid differentiation primary response gene (88) TLR8 toll-like receptor 8 - (2) Evaluating the Accuracy of the Determination Method of the Invention
- (2-1) Determination for the Samples Used in the Identification of Endometriosis-Determining Gene Families
- The averages for the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples) with respect to each of the three endometriosis-determining gene families were each input to a SVM. The SVM containing the input averages for the 17 samples was then used to determine whether each sample was positive (or had endometriosis) or negative (or healthy).
- The result is shown in
FIG. 16A . The result shows that the determination method of the invention makes it possible to identify samples with lesion tissues and samples with normal tissues at a sensitivity of 85% or more and a specificity of 100%. - (2-2) Evaluating the Reproducibility of the Determination Method of the Invention
- Additionally, data on lesion tissues 2 (9 samples) and normal tissues 2 (8 samples), which were different from the data selected in the section (1-1), were used to evaluate the reproducibility of the determination method of the invention. The determination was performed on these data using the SVM containing the input averages for the samples used in the identification of endometriosis-determining gene families in the section (2-1).
- The result is shown in
FIG. 16B . The result shows that even for samples different from those used in the identification of endometriosis-determining gene families, the determination method of the invention makes it possible to stably distinguish between samples with normal tissues and samples with lesion tissues at a sensitivity of 75% and a specificity of 85% or more. - In this comparative example, a method of determining the presence of a disease directly based on the levels of expression of gene transcription products in healthy subjects and patients was used as a conventional determination method. The accuracy of the determination of the presence of endometriosis lesion tissues in samples by such a conventional method was evaluated.
- (1) Determination Using Genes Belonging to Endometriosis-Determining Gene Families
- (1-1) Samples Used in the Identification of Endometriosis-Determining Gene Families
- The expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples) with respect to each of the 39 genes in Table 5 were input to the SVM. The accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 17 samples.
- The result is shown in
FIG. 17A . The result shows that the conventional method identified the normal tissues and the lesion tissues at a sensitivity of 100% and a specificity of 100%. - (1-2) Evaluating the Reproducibility of the Conventional Determination Method
- Data on the normal tissues 2 (8 samples) and lesion tissues 2 (9 samples) were then used to evaluate the reproducibility of the conventional determination method. The determination was performed on these samples using the SVM to which the expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples) were input in the section (1-1).
- The result is shown in
FIG. 17B . The result shows that for samples different from those used in the identification of endometriosis-determining gene families, the sensitivity of the conventional determination method was reduced to 65% or less, although the specificity was 100%. It is therefore apparent that the conventional determination method is more likely to misidentify endometriosis patients as healthy subjects than the determination method of the invention. - (2) Determination Using Genes Other than Those Belonging to Endometriosis-Determining Gene Families
- (2-1) Samples Used in the Identification of Endometriosis-Determining Gene Families
- Genes other than those belonging to endometriosis-determining gene families (39 genes in Table 5) were further identified so that an examination could be performed using such genes. Specifically, a t-test was performed to calculate the significance probability (p-value) between the expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples), and the gene for which the resulting p-value was 0.05 or less with respect to the expression level was determined to be used for the determination. As a result, ten genes were identified. Table 7 shows these genes and the p-value for each gene.
FIG. 18 also shows the distribution of the level of expression of the transcription product of each gene in thenormal tissues 1 and thelesion tissues 1. -
TABLE 7 ProbeSet ID Gene symbol Gene title P value 202659_at PSMB10 proteasome (prosome, macropain) subunit, beta type, 10 1.08E−04 241425_at NUPL1 nucleoporin like 1 1.50E−04 223158_s_at NEK6 myeloproliferative disease associated tumor antigen 1.62E−04 5///NIMA (never in mitosis gene a)-related kinase 6221230_s_at ARID4B AT rich interactive domain 4B (RBP1-like) 1.76E−04 214523_at CEBPE CCAAT/enhancer binding protein (C/EBP), epsilon 3.49E−04 1561850_at MGC15613 hypothetical protein MGC15613 3.98E−04 218512_at WDR12 WD repeat domain 125.90E−04 228937_at C13orf31 chromosome 13 open reading frame 31 6.26E−04 238331_at SPRN shadow of prion protein homolog 6.91E−04 227833_s_at MBD6 methyl-CpG binding domain protein 66.96E−04 - The expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples) with respect to each of these genes were input to the SVM. The accuracy of determining whether each sample was positive or negative was evaluated using the SVM containing the input expression levels in the 17 samples.
- The result is shown in
FIG. 19A . The result shows that the conventional method using genes other than those belonging to endometriosis-determining gene families identified the samples with lesion tissues and the samples with normal tissues at a sensitivity of 100% and a specificity of 100%. - (2-2) Evaluating the Reproducibility of the Conventional Determination Method
- Data on the lesion tissues 2 (8 samples) and the normal tissues 2 (8 samples) were then used to evaluate the reproducibility of the conventional determination method using the ten genes. The determination was performed on these samples using the SVM to which the expression levels in the normal tissues 1 (8 samples) and the lesion tissues 1 (9 samples) were input in the section (2-1).
- The result is shown in
FIG. 19B . The result shows that for samples different from those used in the identification of endometriosis-determining gene families, the sensitivity of the conventional determination method was reduced to 0%, although the specificity was 100%. It is therefore apparent that the conventional determination method using genes other than those belonging to endometriosis-determining gene families is extremely more likely to misidentify endometriosis patients as healthy subjects than the determination method of the invention. - The results of Example 3 and Comparative Example 3 show that the determination method of the invention can achieve more accurate and more stable determination than conventional methods in which the presence of endometriosis is determined directly based on the levels of expression of gene transcription products in healthy subjects and endometriosis patients.
Claims (18)
1. A method for determining presence of a disease, comprising steps of:
measuring the levels of expression of transcription products of genes in a biological sample obtained from a subject suspected of having a target disease, wherein the genes comprise at least one gene belonging to each of at least two disease-determining gene families related to the target disease;
obtaining values representing deviations by standardizing the levels of the expression based on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects;
obtaining the average of values representing deviations with respect to the gene belonging to each of the disease-determining gene families; and
determining, using the average, whether or not the subject has the target disease.
2. The method according to claim 1 , wherein the disease-determining gene families in the measuring step are identified by the following steps:
(a) measuring the levels of expression of transcription products of genes in a biological sample obtained from each of a plurality of patients having the target disease and a plurality of healthy subjects;
(b) standardizing the levels of expression of the gene transcription products in each of the plurality of patients based on the levels of expression of transcription products of the corresponding genes in the plurality of healthy subjects to obtain values representing deviations for each of the plurality of patients;
standardizing the levels of expression of the gene transcription products in each of the plurality of healthy subjects to obtain values representing deviations for each of the plurality of healthy subjects;
(c) classifying the genes, whose expression levels are measured, into at least two gene families using a classification system based on the function of molecules encoded by the genes;
obtaining, as an average for each gene family, the average of values representing deviations for the gene belonging to each of the gene families with respect to each of the plurality of patients and the plurality of healthy subjects;
(d) obtaining a significance probability between the average for each gene family with respect to the plurality of patients and the average for each corresponding gene family with respect to the plurality of healthy subjects; and
(e) identifying the gene family as a disease-determining gene family related to the target disease, when the significance probability for the gene family is 0.05 or less.
3. The method according to claim 2 , wherein the classification system based on the function of molecules encoded by the genes is Gene Ontology, Kyoto Encyclopedia of Genes and Genomes (KEGG), MetaCyc, GenMAPP, BioCarta, KeyMolnet, or Online Mendelian Inheritance in Man (OMIM).
4. The method according to claims 1 , wherein the target disease is selected from Crohn's disease, Huntington's disease, and endometriosis.
5. The method according to claims 1 , wherein
the target disease is Crohn's disease, and
the disease-determining gene families are at lease two selected from a G protein-related gene family, a blood coagulation-related gene family, an oxidative stress-related gene family, a phagocytosis-related gene family, and fat oxidation-related gene family.
6. The method according to claims 1 , wherein
the target disease is Huntington's disease, and
the disease-determining gene families are at least two selected from a microtubule-related gene family, a mitochondria-related gene family, and a prostaglandin-related gene family.
7. The method according to claims 1 , wherein
the target disease is endometriosis, and
the disease-determining gene families are at lease two selected from a cytokine synthesis process-related gene family, a cytokine-mediated signaling-related gene family, and an immunoglobulin-mediated immune response-related gene family.
8. The method according to claims 1 , wherein the step of measuring the levels of expression of gene transcription products comprises measuring the level of expression of at least one gene belonging to each of at least three disease-determining gene families.
9. The method according to claim 5 , wherein
the G protein-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: GNG3, GNG7, GNA15, GNB5, GNAS, GNG5, GNG11, GNB1, and GNG4,
the blood coagulation-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: GP1BA, GP1BB, ITGB3, GP9, and F13A1,
the oxidative stress-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: GPX1, PTGS1, CLU, and PDLIM1,
the phagocytosis-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: FCER1G, CLEC7A, VAMP7, and FCGR1A, and
the fat oxidation-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: ACOX1, ADIPOR2, ADIPOR1, and ALOX12.
10. The method according to claim 6 , wherein
the microtubule-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: DYNC1LI1, DYNLL1, DYNLT1, and DYNLT3,
the mitochondria-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: ATP5F1, ATP5J, ATP5L, ATP5C1, ATP5O, COX6A1, COX7A2, CYCS, MRPL18, MRPS35, NDUFA4, NDUFA9, NDUFB1, NDUFB3, NDUFB5, NDUFC1, NDUFS4, TIMM17A, TIMM8B, TOMM20, TOMM7, UQCRH, UQCR, and UQCRQ, and
the prostaglandin-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: PTGER2, PTGER4, and PTGES3.
11. The method according to claim 7 , wherein
the cytokine synthesis process-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: CEBPE and CD28,
the cytokine-mediated signaling-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: EREG, STAT3, STAT5A, STAT5B, SOCS1, SOCS5, RELA, CEBPA, DUOX2, DUOX1, STAT4, ZNF675, IL2RB, IRAK3, KIT, LRP8, TNFRSF1A, PLP2, TNFRSF1B, TGM2, CCR1, CCR2, PF4, CX3CL1, IL1R1, CSF2RB, CLCF1, and NUP85, and
the immunoglobulin-mediated immune response-related gene family contains at least one gene selected from the group consisting of genes represented by the following gene symbols: IGHG3, IGHM, CD74, FCER1G, BCL10, PRKCD, CD27, MYD88, and TLR8.
12. The method according to claims 1 , wherein the biological sample is blood.
13. The method according to claims 1 , wherein the determination is made by inputting, to a determination formula, the average obtained from the subject suspected of having the target disease, wherein the determination formula is obtained based on: averages previously obtained in the same manner as in the measuring step and the obtaining step using biological samples collected from healthy subjects; and averages previously obtained in the same manner as in the measuring step and the obtaining step using biological samples collected from patients having the target disease.
14. The method according to claim 13 , wherein the determination formula is prepared using a discriminant analysis method.
15. The method according to claim 14 , wherein the discriminant analysis method is a support vector machine, a linear discriminant analysis, a neural network, a k-neighborhood discriminator, a decision tree, or a random forest.
16. A computer program product, comprising:
a computer readable medium; and
software instructions, on the computer readable medium, for enabling a computer to perform operations comprising:
receiving the levels of expression of transcription products of genes in a biological sample obtained from a subject suspected of having a target disease, wherein the genes comprise at least one gene belonging to each of at least two disease-determining gene families related to the target disease;
obtaining values representing deviations by standardizing the levels of the expression based on the levels of expression of transcription products of the corresponding genes in a plurality of healthy subjects;
obtaining the average of values representing deviations with respect to the gene belonging to each of the disease-determining gene families;
determining whether or not the subject has the target disease by using the average; and
outputting the result of the determination.
17. The computer program product according to claim 16 , wherein the operations further comprises:
receiving the levels of expression of transcription products of genes in a biological sample obtained from each of a plurality of patients having the target disease and a plurality of healthy subjects;
standardizing the levels of expression of the gene transcription products in each of the plurality of patients based on the levels of expression of the transcription products of the corresponding genes in the plurality of healthy subjects to obtain values representing deviations for each of the plurality of patients;
standardizing the levels of expression of the gene transcription products in each of the plurality of healthy subjects to obtain values representing deviations for each of the plurality of healthy subjects;
classifying the genes, whose expression levels are measured, into at least two gene families according to a classification system based on the function of molecules encoded by the genes;
obtaining, as an average for each gene family, the average of values representing deviations for the gene belonging to each of the gene families with respect to each of the plurality of patients and the plurality of healthy subjects;
obtaining a significance probability between the average for each gene family with respect to the plurality of patients and the average for each corresponding gene family with respect to the plurality of healthy subjects; and
identifying the gene family as a disease-determining gene family related to the target disease, when the significance probability for the gene family is 0.05 or less.
18. The computer program product according to claim 16 , wherein the determination comprises a discriminant analysis method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/298,386 US9898574B2 (en) | 2009-10-30 | 2014-06-06 | Method for determining the presence of disease |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009251017A JP5503942B2 (en) | 2009-10-30 | 2009-10-30 | Determination method of disease onset |
JP2009-251017 | 2009-10-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/298,386 Division US9898574B2 (en) | 2009-10-30 | 2014-06-06 | Method for determining the presence of disease |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110106739A1 true US20110106739A1 (en) | 2011-05-05 |
Family
ID=43827500
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/915,981 Abandoned US20110106739A1 (en) | 2009-10-30 | 2010-10-29 | Method for determining the presence of disease |
US14/298,386 Active 2031-07-15 US9898574B2 (en) | 2009-10-30 | 2014-06-06 | Method for determining the presence of disease |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/298,386 Active 2031-07-15 US9898574B2 (en) | 2009-10-30 | 2014-06-06 | Method for determining the presence of disease |
Country Status (4)
Country | Link |
---|---|
US (2) | US20110106739A1 (en) |
EP (1) | EP2328105A3 (en) |
JP (1) | JP5503942B2 (en) |
CN (1) | CN102051412B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107025386A (en) * | 2017-03-22 | 2017-08-08 | 杭州电子科技大学 | A kind of method that gene association analysis is carried out based on deep learning algorithm |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2733634A1 (en) * | 2012-11-16 | 2014-05-21 | Siemens Aktiengesellschaft | Method for obtaining gene signature scores |
CN108779496B (en) * | 2016-02-10 | 2022-07-05 | 公立大学法人福岛县立医科大学 | Method for identifying esophageal basal cell-like squamous cell carcinoma |
US12006547B2 (en) * | 2017-10-02 | 2024-06-11 | Oxford BioDynamics PLC | Detection of chromosome interactions as indicative of amyotrophic lateral sclerosis |
US20210233615A1 (en) * | 2018-04-22 | 2021-07-29 | Viome, Inc. | Systems and methods for inferring scores for health metrics |
CN111383736A (en) * | 2018-12-28 | 2020-07-07 | 康多富国际有限公司 | Method for determining health food composition for immune system diseases and readable storage medium thereof |
KR102176721B1 (en) * | 2019-03-20 | 2020-11-09 | 한국과학기술원 | System and method for disease prediction based on group marker consisting of genes having similar function |
CN113943798B (en) * | 2020-07-16 | 2023-10-27 | 中国农业大学 | Application of circRNA as hepatocellular carcinoma diagnosis marker and therapeutic target |
CN112017732B (en) * | 2020-10-23 | 2021-02-05 | 平安科技(深圳)有限公司 | Terminal device, apparatus, disease classification method and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080280774A1 (en) * | 2005-02-16 | 2008-11-13 | Wyeth | Methods and Systems for Diagnosis, Prognosis and Selection of Treatment of Leukemia |
US20090297494A1 (en) * | 2004-01-15 | 2009-12-03 | Michel Cuenod | Diagnostic and treatment of a mental disorder |
US20110257888A1 (en) * | 2010-04-14 | 2011-10-20 | Sysmex Corporation | Method of determining chronic fatigue syndrome |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002059367A2 (en) * | 2000-11-30 | 2002-08-01 | Board Of Supervisors Of Louisiana State University And Agricultural And Mechanical College | Diagnostic microarray for inflammatory bowel disease, crohn's disease and ulcerative colitis |
US20040018513A1 (en) * | 2002-03-22 | 2004-01-29 | Downing James R | Classification and prognosis prediction of acute lymphoblastic leukemia by gene expression profiling |
JP2005323573A (en) | 2004-05-17 | 2005-11-24 | Sumitomo Pharmaceut Co Ltd | Method for analyzing gene expression data and, method for screening disease marker gene and its utilization |
JPWO2006030822A1 (en) * | 2004-09-14 | 2008-05-15 | 株式会社東京大学Tlo | Gene expression data processing method and processing program |
BRPI0520012A2 (en) * | 2005-02-18 | 2009-04-14 | Us Gov Health & Human Serv | identification of diagnostic molecular markers for blood lymphocyte endometriosis |
US20070015183A1 (en) * | 2005-06-03 | 2007-01-18 | The General Hospital Corporation | Biomarkers for huntington's disease |
-
2009
- 2009-10-30 JP JP2009251017A patent/JP5503942B2/en active Active
-
2010
- 2010-10-29 EP EP10189410.3A patent/EP2328105A3/en not_active Withdrawn
- 2010-10-29 US US12/915,981 patent/US20110106739A1/en not_active Abandoned
- 2010-10-29 CN CN201010526277.XA patent/CN102051412B/en active Active
-
2014
- 2014-06-06 US US14/298,386 patent/US9898574B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090297494A1 (en) * | 2004-01-15 | 2009-12-03 | Michel Cuenod | Diagnostic and treatment of a mental disorder |
US20080280774A1 (en) * | 2005-02-16 | 2008-11-13 | Wyeth | Methods and Systems for Diagnosis, Prognosis and Selection of Treatment of Leukemia |
US20110257888A1 (en) * | 2010-04-14 | 2011-10-20 | Sysmex Corporation | Method of determining chronic fatigue syndrome |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107025386A (en) * | 2017-03-22 | 2017-08-08 | 杭州电子科技大学 | A kind of method that gene association analysis is carried out based on deep learning algorithm |
Also Published As
Publication number | Publication date |
---|---|
US9898574B2 (en) | 2018-02-20 |
EP2328105A2 (en) | 2011-06-01 |
JP2011092137A (en) | 2011-05-12 |
CN102051412B (en) | 2014-06-18 |
CN102051412A (en) | 2011-05-11 |
EP2328105A3 (en) | 2016-05-18 |
JP5503942B2 (en) | 2014-05-28 |
US20140287965A1 (en) | 2014-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9898574B2 (en) | Method for determining the presence of disease | |
Exarchos et al. | Artificial intelligence techniques in asthma: a systematic review and critical appraisal of the existing literature | |
US11954614B2 (en) | Systems and methods for visualizing a pattern in a dataset | |
Kuehn et al. | Using GenePattern for gene expression analysis | |
KR101642270B1 (en) | Evolutionary clustering algorithm | |
CN105005680B (en) | Use categorizing system and its method of kit identification and diagnosis pulmonary disease | |
Larsson et al. | Comparative microarray analysis | |
JP2013505730A (en) | System and method for classifying patients | |
Baron et al. | Utilization of lymphoblastoid cell lines as a system for the molecular modeling of autism | |
Spang et al. | Prediction and uncertainty in the analysis of gene expression profiles | |
Kuo et al. | A primer on gene expression and microarrays for machine learning researchers | |
Evans et al. | Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets | |
Zhang et al. | Identifying the RNA signatures of coronary artery disease from combined lncRNA and mRNA expression profiles | |
CN112786103A (en) | Method and device for analyzing feasibility of target sequencing Panel for estimating tumor mutation load | |
Liu et al. | Cross-generation and cross-laboratory predictions of Affymetrix microarrays by rank-based methods | |
Simon | BRB-ArrayTools Version 4.3 | |
Grewal et al. | Analysis of expression data: an overview | |
Haverty et al. | Limited agreement among three global gene expression methods highlights the requirement for non-global validation | |
US20090006055A1 (en) | Automated Reduction of Biomarkers | |
Saviozzi et al. | Microarray data analysis and mining | |
US20240354607A1 (en) | Systems and methods for visualizing a pattern in a dataset | |
Kuijjer et al. | Expression Analysis | |
CN118313354B (en) | Automatic annotation method for cell subpopulations, computer program and storage medium | |
Riccadonna et al. | Supervised classification of combined copy number and gene expression data | |
EP2433232A1 (en) | Biomarkers based on sets of molecular signatures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SYSMEX CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YOSHIDA, YUICHIRO;KOBAYASHI, MASAKI;OTOMO, YASUHIRO;SIGNING DATES FROM 20101026 TO 20101028;REEL/FRAME:025341/0993 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |