WO2007015459A1 - Gene set for use in prediction of occurrence of lymph node metastasis of colorectal cancer - Google Patents

Gene set for use in prediction of occurrence of lymph node metastasis of colorectal cancer Download PDF

Info

Publication number
WO2007015459A1
WO2007015459A1 PCT/JP2006/315143 JP2006315143W WO2007015459A1 WO 2007015459 A1 WO2007015459 A1 WO 2007015459A1 JP 2006315143 W JP2006315143 W JP 2006315143W WO 2007015459 A1 WO2007015459 A1 WO 2007015459A1
Authority
WO
WIPO (PCT)
Prior art keywords
lymph node
node metastasis
genes
gene
absence
Prior art date
Application number
PCT/JP2006/315143
Other languages
French (fr)
Japanese (ja)
Inventor
Ichiro Takemasa
Takenobu Tasaki
Hikaru Sonoda
Hirofumi Higuchi
Kenichi Matsubara
Original Assignee
Osaka University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Osaka University filed Critical Osaka University
Publication of WO2007015459A1 publication Critical patent/WO2007015459A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/112Disease subtyping, staging or classification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to a gene group useful for predicting the presence or absence of lymph node metastasis of colorectal cancer, and a method of utilizing the gene expression information thereof.
  • Colorectal cancer is one of the most advanced molecular biology research capabilities, including the structure of multistage carcinogenesis, and reports on individual genes such as APC, K-ras, p53, and DCC have been reported so far. Many are seen. However, just focusing on one of these genes is not sufficient to express the individuality of colorectal cancer, and in recent years, information on the expression of a large number of genes can be obtained at once by using a DNA microarray or the like. Attempts have been made to obtain useful new knowledge.
  • Non-Patent Document 4 An accurate judgment is made even when test sample data is input to an artificial-Ural network model derived using a part of the random data extracted from the entire data. It has been verified that results can be obtained. Therefore, the artificial-eural network model derived here is generally applied to distinguish four types of cancers belonging to small round blue cell tumors that are not limited to the scope of data in this paper. It is suggested that it is possible. However, the results obtained with the artificial-eural network model are generally unacceptable in that they cannot clearly explain the mathematical basis.
  • Non-patent Document 5 A recent study conducted using a DNA microarray for the purpose of identifying molecular targets involved in liver metastasis of colorectal cancer includes a report by Yanagawa et al. (Non-patent Document 5). The authors performed PCR using a human cDNA as a template, using oligo DNA designed based on the base sequence of human cDNA registered in a public gene database as a primer. An amplified cDNA fragment was obtained. These cDNA fragments are then Using a DNA microarray printed as a template, gene expression profiles of colon cancer primary and colon cancer liver metastases isolated from 10 colon cancer patients were examined.
  • liver metastases As a result, we clarified 40 genes whose expression was increased in liver metastases relative to the primary lesion and 7 genes whose expression was decreased in liver metastases relative to the primary lesion. We identified a set of candidate genes that may be involved in liver metastasis.
  • the DNA microarray method is used to perform statistical analysis processing based on the gene discrimination analysis method on the expression information of genes specifically expressed in colon cancer primary tissue ,
  • a method for identifying a gene set effective in predicting liver metastasis of colorectal cancer, a gene set identified by the method, and expression information of the gene set in colorectal cancer primary tissue is known (Patent Document 1).
  • the gene set and method provide information useful for predicting metachronous liver metastasis of colorectal cancer, and are preferable as a material for identifying an important gene specifically expressed in colorectal cancer.
  • lymph node metastasis of colorectal cancer is completely different from liver metastasis in terms of pathology, these gene sets and methods for colorectal cancer liver metastasis can be directly applied to lymph node metastasis of colorectal cancer. That's not the case.
  • an original DNA microarray was prepared using a probe selected from the cDNA library prepared using the primary cancer tissue of colon cancer, liver metastasis tissue of colon cancer and normal colon mucosa tissue as a material. It has also been shown that it is possible to identify candidate genes that are considered to be related to the development and progression of colorectal cancer by performing gene expression analysis in colorectal cancer tissues using this method (Non-patent Document 6).
  • lymph node metastasis of colorectal cancer as described above, the ability to determine the presence or absence of lymph node metastasis.
  • lymph node metastasis relies on the classic histopathological technique of observation, and such a method for determining lymph node metastasis is not necessarily accurate enough.
  • postoperative adjuvant therapy performed after surgery to remove the primary colorectal cancer can improve the prognosis of patients with lymph node metastasis, but postoperative adjuvant therapy is anorexia and upper abdominal discomfort. ⁇ Some side effects such as nausea may occur, and the quality of life (QOL) and the cost of medical care It is necessary to determine whether it is necessary or unnecessary considering the condition and disease state. Therefore, if a more accurate method for determining lymph node metastasis is found, it can be used as a useful index for decision-making when selecting postoperative adjuvant therapy, and eventually appropriate treatment can be received. This is thought to lead to patient benefit.
  • Patent Document 1 Japanese Patent Application Laid-Open No. 2004-33082
  • Non-Patent Document l Troisi R.J., et al., 1999, Cancer, vol. 85, p. 1670-1676
  • Non-Patent Document 2 Cohen A.M., et al., 1997, Curr Probl Surg., Vol. 34, p. 601-676
  • Non-Patent Document 3 Alizadeh et al., 2000, Nature, vol. 403, p. 503-511
  • Non-Patent Document 4 Khan et al., 2001, Nature Medicine, vol. 7, p. 673-679
  • Non-Patent Document 5 Yanagawa et al., 2001, Neoplasia, vol. 3, No. 5, p.395-401
  • Non-Patent Document 6 Takemasa et al., 2001, Biochem. Biophys. Res. Commun., Vol. 285, p. 1244-1
  • the conventional method for determining the presence or absence of lymph node metastasis of colorectal cancer involves excising a plurality of lymph nodes around the colorectal cancer and observing them under a microscope. There was a problem with accuracy.
  • the present invention aims to provide a method for predicting the presence or absence of lymph node metastasis of colorectal cancer by examining the gene expression profile of the colorectal cancer primary tissue. And In order to make it possible to predict the presence or absence of cancer cell metastasis to lymph nodes, the present invention is based on a set of genes that can be used to determine lymph node metastasis of colorectal cancer and their expression information. The purpose is to provide a discriminant that can be used to determine the presence or absence of lymph node metastasis.
  • the inventors of the present invention have also made efforts in a cDNA library prepared using colon cancer primary tissue, colon cancer liver metastasis tissue and normal colon mucosa tissue as materials. Create an original DNA microarray using the selected probe and use the DNA microarray to obtain gene expression analysis data for the primary colorectal cancer lesion. Through statistical analysis, find a set of genes that can be used to predict the presence or absence of lymph node metastasis, and the discriminant that is used to actually predict the presence or absence of lymph node metastasis based on their expression level. In particular, the present invention has been completed.
  • the present invention provides the following method for selecting a gene set for predicting the presence or absence of lymph node metastasis of colorectal cancer.
  • a method for selecting a gene set for predicting the presence or absence of colorectal cancer lymph node metastasis including the following steps (1) to (4):
  • variable selection method in (4) is a stepwise variable selection method.
  • the present invention also provides a gene set for predicting the presence or absence of the following colon cancer lymph node metastasis: provide.
  • a gene set for predicting the presence or absence of colorectal cancer lymph node metastasis selected by any of the methods 1 to 4 above;
  • NM—003404 (G1592), NM—002128 (G2645), NM—052868 (G3031), NM—005034 (G3177), NM—001540 (G3753), NM—005722 (G3826), and NM—015315 ( G43 70) including the gene represented by the database access number (serial number) above 5
  • the present invention further provides the following method for predicting the presence or absence of lymph node metastasis of colorectal cancer using the selected gene set.
  • a method for predicting the presence or absence of lymph node metastasis of colorectal cancer characterized by using the gene set according to any of 5 or 6 above;
  • genes group analysis methods used in the present invention include the following:
  • oral distri- bution analysis (Logistic Discrimination) can be made.
  • Variable selection methods used in the present invention include the following:
  • the present invention is useful for determining whether or not a colorectal cancer cell is likely to metastasize to nearby lymph nodes when a colorectal cancer patient undergoes a primary colorectal cancer resection operation.
  • a discriminant for predicting the presence or absence of lymph node metastasis is provided based on a series of gene sets and their gene expression information.
  • a favorable lymph node metastasis determination result can be obtained by analyzing the gene expression information of the gene set of the primary colorectal cancer tissue using a mouth dystic regression equation. Therefore, it is possible to predict the presence or absence of cancer cell metastasis to lymph nodes at the time of primary colorectal cancer resection.
  • the method of the present invention is based on a gene set effective for predicting the presence or absence of lymph node metastasis at the time of primary colorectal cancer resection, and the expression level of the gene set! Characterized by a discriminant for predicting the presence or absence of nodal metastasis.
  • a gene set useful for predicting the presence or absence of lymph node metastasis is a comprehensive set of genes that can be used for determination from a comprehensive examination of gene expression in multiple samples of colon cancer primary tissue. It is obtained by selecting.
  • Such comprehensive gene expression analysis methods include microarrays, Northern analysis, ATAC-PCR method (Kato et al., Nuc. Acids Res., Vol. 25, p. 4694— 4696, 1997). And real-time PCR represented by Taq Man PCR (Applied Biosystems), SAGE (Velculescu et al., Science, vol. 270, p. 48) 4-487, 1995) can be used.
  • a DNA microarray More specifically, 63 cases of primary colorectal cancer tissue that had been collected through informed consent and were found to have metastasized to the lymph nodes during histopathological observations during primary lesion removal surgery, Gene expression data were obtained using the above-mentioned DNA microarray for a total of 150 tissues, including 87 primary tumor tissues derived from patients who had no metastasis. As a comparative control, gene expression data obtained from normal colonic mucosal tissue strength around the colon cancer primary tissue for 40 cases was used.
  • the gene expression data described above are based on fluorescence emitted from fluorescently labeled cDNA prepared by hybridizing a fluorescently labeled cDNA prepared using total RNA extracted from cancer tissue force to a DNA microarray and a probe on the DNA microarray.
  • the signal is obtained by detecting and quantifying the signal with a special scanner. A more specific procedure is described below.
  • RNA extraction from colorectal cancer tissue or normal colonic mucosa tissue force is described in the package insert of each reagent using reagents such as TRIzol reagent (GIBC 0 BRL) and ISOGEN (Nitsubon Gene). Can be done according to different methods.
  • the total RNA thus prepared can be used as it is for the preparation of the labeled cDNA described below.
  • a commercially available kit such as mRNA Purification Kit (Amersham Biosciences), purifying polyadenine-added RNA (hereinafter also referred to as “mRNA”) from the total RNA according to the attached method, It can also be used for the preparation of cDNA.
  • Cy3 cDNA A cDNA derived from the primary tumor tissue of colorectal cancer labeled with Cy3 (hereinafter sometimes referred to as “Cy3 cDNA”) is mixed in a mixed solution containing the above total RNA or mRNA, oligo dT primer, dNTP and Cy3 labeled dUTP. After adding reverse transcriptase, it is prepared by warming at 37 to 45 ° C for 1 to 3 hours, preferably at 42 ° C for 1 hour. Preparation of a Cy5-labeled normal colon mucosa-derived cDNA (hereinafter also referred to as “Cy5 cDNA”) used as a comparative control is performed in the same manner using total RNA of normal colon mucosa tissue.
  • Cy5 cDNA a Cy5-labeled normal colon mucosa-derived cDNA
  • Cy3 cDNA and Cy5 cDNA are each heat-treated in a denaturing solution at 65 to 70 ° C for 10 to 20 minutes, preferably at 70 ° C for 10 minutes, neutralized, and then mixed in equal amounts (hereinafter referred to as the following). This mixture is sometimes referred to as “Cy5 'Cy3 cDNA”).
  • a denaturing solution 50 mM EDTA It is possible to use 0.5N NaOH or IN NaOH containing, but it is preferable to use 0.5N NaOH containing 50 mM EDTA.
  • Cy5′Cy3 cDNA is purified using a commercial kit such as Micro con-30 (Amicon) according to the attached method.
  • Hybridization of Cy5'Cy3 cDNA and the probe printed on the DNA microarray is performed as follows. First, in order to heat denature the probe, the DNA microarray was heat-treated, and a hybridization solution containing Cy5'Cy3 cDNA that had been heat-treated at 100 ° C for 2 minutes was added dropwise and covered with a cover glass. Place the array in a sealed container and perform hybridization. As for the hybridization conditions, when the hybridization solution contains formamide, hybridization is performed at 42 ° C for 12 hours or more, and it does not contain formamide. Hybridization takes place at about 68 ° C for over 12 hours.
  • the fluorescence of Cy3 and Cy5 is scanned as image data by scanning the fluorescence of Cy3 and Cy5 with a device such as Scan Array 4000 (GSI Lumonics). Subsequently, by analyzing these image data using microarray data analysis software such as Quantarray software (GSI Lumonics), the fluorescence intensities of Cy3 and Cy5 for all probes are converted into text data. Obtainable.
  • a synthetic DNA having a chain length effective for hybridization is used instead.
  • a synthetic DNA having a length of about 20 nucleotides or more consisting of a part of the sequence is used as a probe and fixed to a glass substrate or the like. It is also possible to use a trick.
  • Cy3ZCy5 which is the ratio of the fluorescence intensity values of Cy3 and Cy5 for each probe, is calculated, converted to a logarithmic value with a base of 2 (hereinafter referred to as “log (Cy3ZCy5)”), and log for each probe.
  • the standardized log (Cy3ZCy5) value can be obtained by subtracting the median.
  • the standardized log (Cy3 / Cy5) value can be used as the expression level of each gene.
  • the standardized numerical data (hereinafter sometimes referred to as “standardized numerical data”) for all cases obtained in this manner is integrated and the probe data containing many missing values is collected.
  • the following selection operation is performed for the purpose of removing from the subsequent analysis target. In other words, only probe data for which data has been acquired in more than 128 cases, or more than 85% of all 150 cases analyzed with the microarray, are selected. This allows you to select only probe data that contains 15% or less missing values.
  • the following selection operations are added to eliminate personal genetic background factors. That is, for each probe, the variance value in the data for 150 primary colon cancer lesions and the variance value in the data for 12 normal colon mucosa were calculated, and the former was 1.1 times the latter. Only the probe data is selected.
  • the average value of all data for cases including missing values to be complemented is the data for all cases of genes containing the missing values.
  • KNN K—Nearest t Neighoors
  • b D Singular Value Decomposition
  • Standardized numerical data (hereinafter also referred to as “standardized gene expression data”) supplemented with missing values prepared by force is not affected by knock ground, and Cy3 and Cy5 Inheritance that does not include errors due to differences in detection sensitivity, does not include missing values, and the variation range of gene expression in the colorectal cancer primary lesion compared to normal colon mucosa is due to individual differences It has gene expression information that exceeds the fluctuation range of child expression, and can ensure the reliability of subsequent statistical analysis.
  • SVM Support Vector Machine
  • the analysis is divided into two groups, one for predicting gene identification and the other for evaluation, to ensure statistical reliability. More specifically, the data of 150 cases were also divided into 42 cases with lymph node metastasis and 57 cases with no lymph node metastasis, 99 cases, and 21 cases with lymph node metastasis and 30 cases with no lymph node metastasis. The data from the former 99 cases are used to identify genes and establish discriminants for predicting the presence or absence of lymph node metastasis. The discriminant is evaluated by discriminating this data. In the following description, the former 99 cases of data used for identification of the predictive gene and establishment of the discriminant are expressed as “training data” and the latter 51 cases used for discriminant evaluation. Data is sometimes expressed as “test data”.
  • the above approach (b) is implemented only for the first two divisions considering a huge amount of calculation.
  • the data for 99 cases for training is divided into 2 1250 times randomly at a ratio of 2: 1, and the principal component analysis and the learning of the -Ural network are repeated using it.
  • rank genes based on their sensitivity to identify the presence or absence of lymph node metastasis. Start with 2121 genes, and continue learning with 1536, 768, 384, 192, 96, 48, and 24 refinements.
  • the number of genes included in each gene set and the correct classification rate of test data using the established discriminant As the average of the number of cases in which the results of Z match the number of test data X 100 (%)), (a) has 144 genes and the correct classification rate is 80.2% (standard deviation is 5.6%), (b For), the number of genes is 192 and the correct classification rate is (90.2%). For (c), the number of genes is 133 and the correct classification rate is 78.6% (standard deviation is 6.2%) and for (d). The number of genes is 138, and the correct classification rate is 86.3% (standard deviation: 4.5%). At this time, 16 types of genes are commonly included in the gene set selected by each approach.
  • the target genes are first set to the above 16 genes, and each of these 16 genes is donated (hereinafter referred to as " In addition to the “main effect”), a statistical analysis is performed that also takes into account the interaction of two genes. As a result, it will be possible to search for a discrimination rule in a wider range including the interaction between genes only by the main effect of individual genes, and it is expected that high discrimination performance can be maintained.
  • CART analysis is performed again with the presence or absence of lymph node metastasis as a response for each of the 100 training data used in the above analysis.
  • rpart of Free software R was used. At that time, the default values were used for all operation parameters. From this analysis, 3 to 5 genes can be obtained per analysis as the number of genes that appear as variables that instruct data division.
  • the discrimination performance of lymph node metastasis by the selected gene set is evaluated by the LOO method. That is, using the remaining 149 sample data excluding one sample, the logistic discriminant including the above six variables is estimated, and the operation of discriminating the sample is divided into 150 samples. To implement. As a result, as shown in Table 2, the correct classification rate for the selected set of genes is estimated to be 88.7% (sensitivity: 77.8%, specificity: 96.6%). As described above, in the present invention, it is possible to clarify a gene set necessary for predicting the presence or absence of lymph node metastasis of colorectal cancer with high accuracy.
  • RNA samples 40 cases of total RNA from normal large intestine mucosa were extracted and mixed to obtain standard normal large intestine mucosa total RNA for use in all experiments.
  • concentrations of these RNA samples were calculated based on the absorbance at a wavelength of 260 nm measured using a spectrophotometer as usual.
  • the fluorescent label target to be hybridized to the DNA microarray was prepared by the following procedure. First, 25 g of colon cancer sample-derived total RNA (hereinafter referred to as “colon cancer RNA”) and 25 ⁇ g of standard normal colon mucosa total RNA (hereinafter referred to as “standard colon mucosa HRNA”) are in separate tubes. 2 g of oligo dT primers each having a force of 18 nucleotides were prepared, brought to a volume of 14 L with sterilized distilled water, heated at 70 ° C. for 10 minutes, immediately transferred to ice and rapidly cooled.
  • colon cancer RNA colon cancer sample-derived total RNA
  • standard colon mucosa HRNA standard colon mucosa total RNA
  • Cy3—dUTP Cy5 labeled dUTP
  • concentration ImM standard colonic mucosa URN
  • the 5 X First Strand Buffer, O.IM DTT and Superscriptll used in this reaction were all purchased from GIBCO BRL.
  • DATP, dCTP, dGTP and dTTP, Cy5-d UTP and Cy3-dUTP, and RNAguard were all purchased from Amersham Biosciences.
  • After the reaction add 5 / z L of denaturing solution (0.5N NaOH, 50 mM EDT A) to each tube, heat at 70 ° C for 10 minutes, and then add 7.5 ⁇ L of 1M Tris-HCl (pH 7. It was neutralized by calorie 5).
  • the colon cancer label target and the standard colon mucosa label target were mixed, and 10 g of human COT-1 DNA (purchased from GIBCO BRL) was added thereto.
  • TE buffer was added to this mixture, adjusted to 500 L, and purified and concentrated using Microcon 30 (purchased from Amicon) to remove unreacted Cy5 dUTP and Cy3-dUTP.
  • the purification / concentration procedure followed the manual attached to Microcon-30. Finally, it was concentrated until the total volume became 5 L, and this was used as a label target to be hybridized to the DNA microarray.
  • the DNA microarray by immersing it in a masking solution (3 g of succinic anhydride, 190 mL of N-methyl-2-pyrrolidone and 21 mL of 0.2 M sodium borate) for 5 minutes, and then in distilled water at 95 ° C.
  • the cDNA printed on the microarray was heat denatured by soaking for 3 minutes. Immediately after that, it was immersed in 95% or more ethanol for 1 minute, dehydrated and air-dried.
  • ScanArray 4000 (GSI Lumonics), a confocal laser scanner dedicated to microarrays, independently scans the fluorescence of Cy3 and Cy5 to hybridize each probe on the microarray. Fluorescence patterns of Cy3 and Cy5 derived from cancer targets and standard colon targets were obtained as 16-bit Tiff scanned image data. Subsequently, these image data are analyzed using QuantArray software (GSI Lumonics), which is analysis software dedicated to microarray data, so that the fluorescence intensity of Cy3 and Cy5 for all probes is numerically expressed in text format.
  • QuantArray software GSI Lumonics
  • the fluorescence intensity value for each probe was subtracted from the fluorescence intensity value of the part where the cDNA was not printed.
  • the portion with a low fluorescence intensity value is greatly affected by experimental errors, other data were rejected, leaving a data point of approximately 3000 forces with a high fluorescence intensity value.
  • the ratio of the fluorescence intensity values of Cy3 and Cy5, ie, Cy3ZCy5 was calculated and converted to a logarithmic value with a base of 2 (hereinafter referred to as “log (Cy3 / Cy5)”).
  • the total log (Cy3 / Cy5) is calculated from the log (Cy3 / Cy5) value of each probe!
  • the standard log (Cy3, Cy5) value was obtained by subtracting the median of the values.
  • the average value of the gene expression level to be complemented by the eight samples that were closest to the sample with the missing value was obtained and used as the complement value for the missing value.
  • the number of samples closest to the sample with missing values is defined as the number that increases the number sequentially and minimizes the root mean square (RMS). All the numerical data obtained by complementing the missing values in this way are hereinafter referred to as standardized gene expression data.
  • a probe printed on a DNA microarray may be referred to as a gene.
  • SVM Support Vector Machine
  • PC A / aNN Prin cipal Component Analysis / artificial Neural Network
  • C Hierarchical Cluster Analysis (HCA) + Stepwise Logistic Discrimination and (d) Classification And Regression Tree (C ART) (Breiman et al., Classification and Regression Trees, Wadswarth, 1983) + Logistic c Discrimination, The following four methods were used.
  • the above data can be divided into two parts 100 times and analyzed for 100 different judgments. After identifying the genes, we selected genes that were identified many times. On the other hand, the approach (b) above was implemented only for the first two splits. However, the data for 99 cases for training were randomly divided into 2 parts at a ratio of 2: 1 1250 times, and the main component analysis and neural network learning were repeated using this. After learning, genes were ranked based on their sensitivity to identify the presence or absence of lymph node metastasis, and the genes were narrowed down. 2121 gene strengths have also begun, and learning has progressed with 1536, 768, 384, 192, 96, 48, and 24 refinements.
  • G1592, G3031, G3826, G4370, G2645, G3177 and G3753 are serial numbers assigned to each probe (gene) on the ColonoChip used in the present invention. And the discriminant for judging the presence or absence of lymph node metastasis is
  • lymph node metastasis using the selected gene set was evaluated by the LOO method did. That is, using the remaining 149 sample data excluding one sample, the logistic discriminant including the above six variables is estimated, and the operation of discriminating the sample by that is performed for each 150 samples. Carried out. The results are shown in Table 2. From Table 2, the correct classification rate for the selected set of genes was estimated to be 88.7% (sensitivity: 77.8%, specificity: 96.6%).
  • lymph node metastasis By the determination of lymph node metastasis enabled by the present invention, it is possible to select a better treatment policy according to the case and to expect a medical economic effect. For example, prognosis can be improved by aggressive treatment for patients with a high possibility of lymph node metastasis, while surgery is recommended for cases with a low possibility of lymph node metastasis. Post-adjuvant therapy can be mild and reduce the physical and economic burden on the patient.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed is a method for prediction of the presence or absence of lymph node metastasis of colorectal cancer. Also disclosed is a set of genes which can be used in the method. A method for selecting a set of genes for use in the prediction of the presence or absence of lymph node metastasis of colorectal cancer, the method comprising the following steps (1) to (4): (1) analyzing the information about gene expression in a primary lesion of colorectal cancer in a patient who has been determined to have lymph node metastasis of colorectal cancer by a histopathological test, by at least four analysis methods involving at least one supervised learning analysis method, thereby selecting a group of genes which serve to determine the presence or absence of lymph node metastasis at a correct classification rate of 75% or higher for each of the analysis methods; (2) selecting a common gene which is selected from the group of genes in all of the analysis methods of step (1); (3) analyzing the information about gene expression above to assign the presence or absence of lymph node metastasis to any combination of two or more genes and then selecting a combination(s) of genes showing an interaction from the combinations; and (4) performing a variable selection in a logistic regression model to give an answer regarding the presence or absence of lymph node metastasis using the common gene and the combination(s) of genes as the explaining variables; a set of genes for use in the prediction of the presence or absence of lymph node metastasis of colorectal cancer, which is selected by the method; and a method for prediction of the presence or absence of lymph node metastasis of colorectal cancer by using the set of genes.

Description

明 細 書  Specification
大腸癌リンパ節転移の有無を予測するための遺伝子セット  Gene set for predicting the presence or absence of lymph node metastasis from colorectal cancer
技術分野  Technical field
[0001] 本発明は、大腸癌のリンパ節転移の有無を予測するのに有用な遺伝子群、及びそ れらの遺伝子発現情報の活用法に関する。  [0001] The present invention relates to a gene group useful for predicting the presence or absence of lymph node metastasis of colorectal cancer, and a method of utilizing the gene expression information thereof.
背景技術  Background art
[0002] 大腸癌は先進国では特に発生率が高ぐ本邦でも年々増加の一途を迪つており、 癌関連死の主要な原因のひとつとなっている。統計学的な報告 (例えば、非特許文 献 1参照)によれば、大腸癌患者の 37%は転移がない主病巣に限局した癌であり、 同じく 37%が所属リンパ節にのみ転移を認める局在的な癌であり、残りは遠隔転移 を伴って!/、るものなどであることが分かって 、る。  [0002] Colorectal cancer has been increasing year by year even in Japan, where the incidence is particularly high in developed countries, and is one of the leading causes of cancer-related death. According to statistical reports (eg, see Non-patent Document 1), 37% of colorectal cancer patients are confined to the main lesion without metastasis, and 37% have metastasis only to the regional lymph nodes. It turns out that it is a localized cancer and the rest are those with distant metastases! /.
[0003] 現在、臨床における大腸癌の悪性度分類として最も一般的に利用されている Duke s分類にぉ ヽては、癌の大腸壁深達度と所属リンパ節への転移の程度などの病理学 的な事項を指標としており、本分類と予後の相関に関しては疑う余地のないところと なっている。し力しながら、上記分類結果を大きく左右するリンパ節転移の有無の判 定に関しては、切除された数多くのリンパ節組織のうちの一部を用いて作成された標 本を顕微鏡下で観察するという古典的な組織病理学的手法に頼っているのが現状 である。このような手法によりリンパ節転移陰性と判定された患者の 20〜40%には後 に転移が発見されるとの報告 (例えば、非特許文献 2参照)が示すように、従来のリン パ節転移判定法は必ずしも精度が十分とはいえな力つた。  [0003] At present, most commonly used as the malignancy classification of colorectal cancer in clinical practice, the diseases such as the degree of penetration of cancer into the colon wall and the degree of metastasis to regional lymph nodes are the most commonly used. Physics is used as an index, and there is no doubt about the correlation between this classification and prognosis. However, with regard to the determination of the presence or absence of lymph node metastasis, which greatly affects the above classification results, a specimen prepared using a part of the many excised lymph node tissues is observed under a microscope. The current situation depends on the classical histopathological technique. As shown in a report that 20 to 40% of patients determined to be negative for lymph node metastasis by this method later find metastasis (see, for example, Non-Patent Document 2), The metastasis determination method was not necessarily accurate enough.
[0004] 大腸癌は、多段階発癌の構造など分子生物学的な研究力もっともよく進んでいる 癌の一つで、これまで APC、 K—ras、 p53、 DCCなどの個々の遺伝子についての 報告が多数みられる。しかし、これらの遺伝子のいずれかに注目するだけでは、大腸 癌の個性を表現するには不十分であるため、近年は DNAマイクロアレイなどを用い ることにより一度に極めて多数の遺伝子の発現情報を得ることにより有用な新規知見 を得る試みがなされ始めて 、る。  [0004] Colorectal cancer is one of the most advanced molecular biology research capabilities, including the structure of multistage carcinogenesis, and reports on individual genes such as APC, K-ras, p53, and DCC have been reported so far. Many are seen. However, just focusing on one of these genes is not sufficient to express the individuality of colorectal cancer, and in recent years, information on the expression of a large number of genes can be obtained at once by using a DNA microarray or the like. Attempts have been made to obtain useful new knowledge.
[0005] Alizadehらは、びまん性大細胞型 B細胞リンパ腫患者の末梢血から分取した Bリン パ球を試料として DNAマイクロアレイによる測定を行 ヽ、得られた遺伝子発現データ の階層的クラスタリングを行うことにより、同病患者の末梢血 Bリンパ球には、リンパ組 織の胚中心に存在する B細胞に類似した遺伝子発現パターンを示す場合と、 in vitro で活性ィ匕した B細胞に類似した遺伝子発現パターンを示す場合の 2種類があることを 見出した (非特許文献 3)。両者の生存率を Kaplan-Meierプロットで調べた結果、後 者の発現パターンを示す B細胞を持つ患者は、前者の発現パターンを示す B細胞を 持つ患者と比べて予後が悪いことが明ら力となった。カロえて、従来からの病理学的診 断に基づく予後予測に従うよりも、著者らの行った遺伝子発現情報のクラスタリングで 得られた結果の方が予後との相関性が高力つた。 Alizadehらの研究結果は、遺伝子 発現情報力も臨床的に利用可能な有用な法則性を導き出せたという点で意義のある ものといえる。しかし、その法則が全く新たな臨床例についても適用できるかどうかに つ!、ての検証はなされておらず、この論文の範囲でのみ成立する結果である可能性 は否定できない。 [0005] Alizadeh et al. Isolated B phosphorus from the peripheral blood of patients with diffuse large B-cell lymphoma. By measuring the DNA expression of the lymphocytes using a DNA microarray and performing hierarchical clustering of the gene expression data obtained, the peripheral blood B lymphocytes of the same patient are found in the germinal centers of the lymphoid tissues. It was found that there are two types: a gene expression pattern similar to cells and a gene expression pattern similar to B cells activated in vitro (Non-patent Document 3). As a result of investigating the survival rate of the two using the Kaplan-Meier plot, it is clear that patients with B cells showing the latter expression pattern have a worse prognosis than patients with B cells showing the former expression pattern. It became. The results obtained by the clustering of gene expression information performed by the authors were more highly correlated with the prognosis than by following the prognosis prediction based on the conventional pathological diagnosis. The research results of Alizadeh et al. Are significant in that gene expression information ability has also led to useful laws that can be used clinically. However, it has not been verified whether the law can be applied to completely new clinical cases, and it cannot be denied that the result is valid only within the scope of this paper.
[0006] Khanらは、組織学的には区別が難しい小円形青色細胞腫に属する 4種類の癌が、 人工-ユーラルネットワークを利用した遺伝子発現情報の解析により正確に区別され ることを報告した (非特許文献 4)。この報告の中では、全体のデータから無作為に抜 き出した一部のデータを用いて導き出した人工-ユーラルネットワークモデルに対し て、テストサンプルのデータを入力した場合にも、正確な判定結果が得られることが 検証されている。したがって、ここで導き出された人工-ユーラルネットワークモデル は、この論文内のデータの範囲に限定されるものではなぐ小円形青色細胞腫に属 する 4種類の癌を区別するために一般的に適用可能なものであることが示唆される。 しかしながら、人工-ユーラルネットワークモデルで得られる判定結果は、数学的な根 拠を明確に説明できな 、と 、う点で一般には受け入れられにく 、。  [0006] Khan et al. Reported that four types of cancer belonging to small round blue cell tumors, which are difficult to distinguish histologically, can be accurately distinguished by analysis of gene expression information using an artificial-eural network. (Non-Patent Document 4). In this report, an accurate judgment is made even when test sample data is input to an artificial-Ural network model derived using a part of the random data extracted from the entire data. It has been verified that results can be obtained. Therefore, the artificial-eural network model derived here is generally applied to distinguish four types of cancers belonging to small round blue cell tumors that are not limited to the scope of data in this paper. It is suggested that it is possible. However, the results obtained with the artificial-eural network model are generally unacceptable in that they cannot clearly explain the mathematical basis.
[0007] 大腸癌の肝転移に関わる分子標的を同定することを目的として DNAマイクロアレイ を用いて行われた最近の研究例としては、柳川らの報告 (非特許文献 5)がある。著 者らは、公共の遺伝子データベースに登録されているヒト cDNAの塩基配列に基づ V、て設計したオリゴ DNAをプライマーとして用い、ヒトの cDNAを铸型として PCRを 行い、 9, 121種類の増幅 cDNA断片を得た。次いで、これらの cDNA断片をプロ一 ブとしてプリントした DNAマイクロアレイを使って、 10症例の大腸癌患者より分離した 大腸癌原発巣及び大腸癌肝転移巣の遺伝子発現プロファイルを調べた。その結果 、原発巣に対して肝転移巣で発現が上昇している 40種類の遺伝子と、原発巣に対し て肝転移巣で発現が低下している 7種類の遺伝子を明らかにし、大腸癌の肝転移に 関わる可能性がある候補遺伝子セットを同定した。 [0007] A recent study conducted using a DNA microarray for the purpose of identifying molecular targets involved in liver metastasis of colorectal cancer includes a report by Yanagawa et al. (Non-patent Document 5). The authors performed PCR using a human cDNA as a template, using oligo DNA designed based on the base sequence of human cDNA registered in a public gene database as a primer. An amplified cDNA fragment was obtained. These cDNA fragments are then Using a DNA microarray printed as a template, gene expression profiles of colon cancer primary and colon cancer liver metastases isolated from 10 colon cancer patients were examined. As a result, we clarified 40 genes whose expression was increased in liver metastases relative to the primary lesion and 7 genes whose expression was decreased in liver metastases relative to the primary lesion. We identified a set of candidate genes that may be involved in liver metastasis.
[0008] 大腸癌の肝転移に関与する遺伝子セットについては、 DNAマイクロアレイ法により 大腸癌原発巣組織に特異的に発現した遺伝子群の発現情報を遺伝子判別分析手 法に基づく統計解析処理することにより、大腸癌の肝転移の予測に有効な遺伝子セ ットを同定する方法、当該方法によって同定された遺伝子セット及び大腸癌原発巣 組織における当該遺伝子セットの発現情報を用いて大腸癌の肝転移を予測する方 法が知られている(特許文献 1)。当該遺伝子セット及び方法は、大腸癌の異時性肝 転移の予測に有用な情報を提供するものであり、大腸癌で特異的に発現している重 要な遺伝子を同定するための材料として好ましいものではあるが、大腸癌のリンパ節 転移は肝転移とは病理学的にみて全く病態が異なるため、これら大腸癌の肝転移用 の遺伝子セット及び方法をそのまま大腸癌のリンパ節転移に応用できるわけでは決 してない。 [0008] With respect to the gene set involved in liver metastasis of colorectal cancer, the DNA microarray method is used to perform statistical analysis processing based on the gene discrimination analysis method on the expression information of genes specifically expressed in colon cancer primary tissue , A method for identifying a gene set effective in predicting liver metastasis of colorectal cancer, a gene set identified by the method, and expression information of the gene set in colorectal cancer primary tissue. A prediction method is known (Patent Document 1). The gene set and method provide information useful for predicting metachronous liver metastasis of colorectal cancer, and are preferable as a material for identifying an important gene specifically expressed in colorectal cancer. However, since lymph node metastasis of colorectal cancer is completely different from liver metastasis in terms of pathology, these gene sets and methods for colorectal cancer liver metastasis can be directly applied to lymph node metastasis of colorectal cancer. That's not the case.
[0009] また、大腸癌原発巣組織、大腸癌肝転移巣組織及び正常大腸粘膜組織を材料と して作製した cDNAライブラリ一力も選択したプローブを用いてオリジナルの DNAマ イクロアレイを作製し、それを用いて大腸癌組織における遺伝子発現解析を行うこと により、大腸癌の発育'進展に関連すると考えられる候補遺伝子の同定が可能である ことも示されて ヽる (非特許文献 6)。  [0009] In addition, an original DNA microarray was prepared using a probe selected from the cDNA library prepared using the primary cancer tissue of colon cancer, liver metastasis tissue of colon cancer and normal colon mucosa tissue as a material. It has also been shown that it is possible to identify candidate genes that are considered to be related to the development and progression of colorectal cancer by performing gene expression analysis in colorectal cancer tissues using this method (Non-patent Document 6).
[0010] 一方、大腸癌のリンパ節転移に関しては、上記のようにリンパ節転移の有無の判定 力 切除された数多くのリンパ節組織のうちの一部を用いて作成された標本を顕微鏡 下で観察するという古典的な組織病理学的手法に頼っているのが現状であり、このよ うなリンパ節転移判定法は精度が必ずしも十分とはいえない。また、大腸癌原発巣除 去手術後に行われる術後補助療法により、リンパ節転移のあった患者の予後を改善 できることも知られて 、るが、術後補助療法は食欲不振 ·上腹部不快感 ·嘔気などの 副作用を伴うこともあり、 Quality of life (QOL)や医療費の観点力もみて、患者個人の 状態と病勢を考慮して必要 ·不必要を判断する必要がある。従って、より精度の高い リンパ節転移判定法が見出されれば、術後補助療法の選択に際して意志決定のた めの有用な指標として利用可能となり、最終的には適切な治療が受けられることによ り患者の利益につながると考えられる。 [0010] On the other hand, regarding lymph node metastasis of colorectal cancer, as described above, the ability to determine the presence or absence of lymph node metastasis. A specimen prepared using a part of a large number of lymph node tissues excised as described above under a microscope. Currently, it relies on the classic histopathological technique of observation, and such a method for determining lymph node metastasis is not necessarily accurate enough. In addition, it is known that postoperative adjuvant therapy performed after surgery to remove the primary colorectal cancer can improve the prognosis of patients with lymph node metastasis, but postoperative adjuvant therapy is anorexia and upper abdominal discomfort.・ Some side effects such as nausea may occur, and the quality of life (QOL) and the cost of medical care It is necessary to determine whether it is necessary or unnecessary considering the condition and disease state. Therefore, if a more accurate method for determining lymph node metastasis is found, it can be used as a useful index for decision-making when selecting postoperative adjuvant therapy, and eventually appropriate treatment can be received. This is thought to lead to patient benefit.
[0011] 特許文献 1 :特開 2004— 33082  Patent Document 1: Japanese Patent Application Laid-Open No. 2004-33082
非特許文献 l : Troisi R.J.ら、 1999, Cancer, vol. 85, p. 1670-1676  Non-Patent Document l: Troisi R.J., et al., 1999, Cancer, vol. 85, p. 1670-1676
非特許文献 2 : Cohen A.M.ら、 1997, Curr Probl Surg., vol. 34, p. 601-676 非特許文献 3 :Alizadehら、 2000, Nature, vol. 403, p. 503-511  Non-Patent Document 2: Cohen A.M., et al., 1997, Curr Probl Surg., Vol. 34, p. 601-676 Non-Patent Document 3: Alizadeh et al., 2000, Nature, vol. 403, p. 503-511
非特許文献 4 : Khanら、 2001, Nature Medicine, vol. 7, p. 673-679  Non-Patent Document 4: Khan et al., 2001, Nature Medicine, vol. 7, p. 673-679
非特許文献 5 :柳川ら、 2001, Neoplasia, vol. 3, No. 5, p.395- 401  Non-Patent Document 5: Yanagawa et al., 2001, Neoplasia, vol. 3, No. 5, p.395-401
非特許文献 6 :竹政ら、 2001, Biochem. Biophys. Res. Commun., vol. 285, p. 1244-1 Non-Patent Document 6: Takemasa et al., 2001, Biochem. Biophys. Res. Commun., Vol. 285, p. 1244-1
249 249
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0012] 上記のように、大腸癌のリンパ節転移の有無を判定する従来の方法は、大腸癌周 辺の複数のリンパ節を切除し、これを顕微鏡下で観察するもので、判定結果の精度 に問題があった。 [0012] As described above, the conventional method for determining the presence or absence of lymph node metastasis of colorectal cancer involves excising a plurality of lymph nodes around the colorectal cancer and observing them under a microscope. There was a problem with accuracy.
[0013] したがって、本発明は、この点を改善するために、大腸癌原発巣組織の遺伝子発 現プロファイルを調べることにより、大腸癌のリンパ節転移の有無を予測する方法を 提供することを目的とする。本発明はまた、リンパ節への癌細胞の転移の有無を予測 することを可能とならしめるために、大腸癌のリンパ節転移判定に利用可能な遺伝子 セット及びそれらの発現情報に基づ 、てリンパ節転移の有無を判定するために利用 可能な判別式を提供することを目的とする。  [0013] Therefore, in order to improve this point, the present invention aims to provide a method for predicting the presence or absence of lymph node metastasis of colorectal cancer by examining the gene expression profile of the colorectal cancer primary tissue. And In order to make it possible to predict the presence or absence of cancer cell metastasis to lymph nodes, the present invention is based on a set of genes that can be used to determine lymph node metastasis of colorectal cancer and their expression information. The purpose is to provide a discriminant that can be used to determine the presence or absence of lymph node metastasis.
課題を解決するための手段  Means for solving the problem
[0014] 本発明者らは、上記の目的を達成するために鋭意研究を重ねた結果、大腸癌原発 巣組織、大腸癌肝転移巣組織及び正常大腸粘膜組織を材料として作製した cDNA ライブラリ一力も選択したプローブを用いてオリジナルの DNAマイクロアレイを作製し 、当該 DNAマイクロアレイを用いて得た大腸癌原発巣の遺伝子発現解析データの 統計解析を通じて、リンパ節転移の有無を予測するのに利用可能な遺伝子セット、及 びそれらの発現量に基づ 、て実際にリンパ節転移の有無を予測するために利用す る判別式を見出すことに成功し、本発明を完成するに至った。 [0014] As a result of intensive studies to achieve the above object, the inventors of the present invention have also made efforts in a cDNA library prepared using colon cancer primary tissue, colon cancer liver metastasis tissue and normal colon mucosa tissue as materials. Create an original DNA microarray using the selected probe and use the DNA microarray to obtain gene expression analysis data for the primary colorectal cancer lesion. Through statistical analysis, find a set of genes that can be used to predict the presence or absence of lymph node metastasis, and the discriminant that is used to actually predict the presence or absence of lymph node metastasis based on their expression level. In particular, the present invention has been completed.
[0015] すなわち、本発明は、以下の大腸癌のリンパ節転移の有無を予測するための遺伝 子セットの選択方法を提供する。  That is, the present invention provides the following method for selecting a gene set for predicting the presence or absence of lymph node metastasis of colorectal cancer.
1.下記(1)〜(4)の工程を含む、大腸癌リンパ節転移の有無を予測するための遺伝 子セットの選択方法:  1. A method for selecting a gene set for predicting the presence or absence of colorectal cancer lymph node metastasis, including the following steps (1) to (4):
(1)組織病理学的判定によりリンパ節転移の有無が明らかにされた患者の大腸癌原 発巣組織における遺伝子発現情報を、教師あり学習解析方法を少なくとも一つ含む (1) At least one supervised learning analysis method for gene expression information in the primary colorectal cancer tissue of patients whose presence or absence of lymph node metastasis was revealed by histopathological determination
、 4以上の解析方法で解析することにより、リンパ節転移の有無を正分類率 75%以上 で分類できる遺伝子群をそれぞれの解析方法において選定する工程、 Selecting a group of genes in each analysis method that can classify the presence or absence of lymph node metastasis with a positive classification rate of 75% or more by analyzing with 4 or more analysis methods,
(2) (1)で用いたそれぞれの解析方法で選定された遺伝子群から、何れの解析方法 でも共通して選定された共通遺伝子を選択する工程、  (2) a step of selecting a common gene selected in common by any analysis method from the gene group selected by each analysis method used in (1),
(3)前記遺伝子発現情報を解析することにより、任意の 2以上の遺伝子の組合せの 中から、リンパ節転移の有無の分類を指示し、交互作用を示す遺伝子の組合わせを 選択する工程、及び  (3) analyzing the gene expression information, instructing classification of the presence or absence of lymph node metastasis from any combination of two or more genes, and selecting a combination of genes exhibiting an interaction; and
(4)前記共通遺伝子及び前記遺伝子の組合わせを説明変数として、リンパ節転移の 有無を応答としたロジスティック回帰モデルにおける変数選択を行う工程;  (4) performing variable selection in a logistic regression model in which the presence or absence of lymph node metastasis as a response using the common gene and the combination of the genes as explanatory variables;
2. (1)の解析万法;^、 (a) Support Vector Machine^ (b Principal Component Analys is Artificial Neural Networkの拡張法、 (c) Hierarchical Cluster Analysisと Stepwise L ogistic Discriminationの糸且合せ及び (d) Classification And Regression Treeと Logistic Discriminationの組合せよりなる群から選択されるものを少なくとも一つ含むものであ る、上記 1.に記載の方法;  2. (1) Analysis method; ^, (a) Support Vector Machine ^ (b Principal Component Analys is Artificial Neural Network extension method, (c) Hierarchical Cluster Analysis and Stepwise Logistic Discrimination and (d ) The method according to 1. above, comprising at least one selected from the group consisting of a combination of Classification And Regression Tree and Logistic Discrimination;
3. 、5)の解 方法力、、、α)し lassincation And Regression Treeと Logistic Discnminati onの組合せである、上記 1.または 2.に記載の方法;  3. The solution described in 1. or 2 above, which is a combination of lassincation And Regression Tree and Logistic Discnmination, and the solution power of 5, 5);
4. (4)の変数選択の方法が、ステップワイズの変数選択法である、上記 1.ないし 3. のいずれかに記載の方法。  4. The method according to any one of 1 to 3 above, wherein the variable selection method in (4) is a stepwise variable selection method.
[0016] 本発明はまた、以下の大腸癌リンパ節転移の有無を予測するための遺伝子セットを 提供する。 [0016] The present invention also provides a gene set for predicting the presence or absence of the following colon cancer lymph node metastasis: provide.
5.上記 1.ないし 4.のいずれかの方法により選択される、大腸癌リンパ節転移の有 無を予測するための遺伝子セット;  5. A gene set for predicting the presence or absence of colorectal cancer lymph node metastasis, selected by any of the methods 1 to 4 above;
6.少なくとも NM— 003404 (G1592)、 NM— 002128 (G2645)、 NM— 052868 (G3031)、 NM— 005034 (G3177)、 NM— 001540 (G3753)、 NM— 005722 (G3826)、及び NM— 015315 (G43 70)のデータベースのアクセス番号 (シリアル番号)で表される遺伝子を含む、上記 5 6.At least NM—003404 (G1592), NM—002128 (G2645), NM—052868 (G3031), NM—005034 (G3177), NM—001540 (G3753), NM—005722 (G3826), and NM—015315 ( G43 70) including the gene represented by the database access number (serial number) above 5
.に記載の遺伝子セット。 The gene set described in.
[0017] 本発明はさらに、以下の、上記選択された遺伝子セットを用いた大腸癌のリンパ節 転移の有無を予測する方法をも提供する。 [0017] The present invention further provides the following method for predicting the presence or absence of lymph node metastasis of colorectal cancer using the selected gene set.
7.上記 5.または 6.の何れかに記載の遺伝子セットを用いることを特徴とする大腸 癌リンパ節転移の有無を予測するための方法;  7. A method for predicting the presence or absence of lymph node metastasis of colorectal cancer, characterized by using the gene set according to any of 5 or 6 above;
8.下記の判別式を用いることを特徴とする上記 7.記載の方法:  8. The method according to 7. above, characterized by using the following discriminant:
D = 0.2307— 2.7132 X NM— 003404 (G1592)の発現量  D = 0.2307— 2.7132 X NM— 003404 (G1592) expression level
+ 8.9509 X NM— 052868 (G3031)の発現量  + 8.9509 X NM— 052868 (G3031) expression level
+ 8.7975 X NM.005722 (G3826)の発現量  + 8.7975 X NM.005722 (G3826) expression level
2.3098 X NM— 015315 (G4370)の発現量  2.3098 X NM— 015315 (G4370) expression level
+ 3.5126 X NM.002128 (G2645)の発現量 X NM— 005034 (G3177)の発現量  + 3.5126 X NM.002128 (G2645) expression level X NM—005034 (G3177) expression level
- 8.8226 X NM— 001540 (G3753)の発現量 X NM.005722 (G3826)の発現量  -Expression level of 8.8226 X NM—001540 (G3753) X Expression level of NM.005722 (G3826)
(D >0のときリンパ節転移あり、 D≤0のときリンパ節転移なし、と判別する)。  (If D> 0, lymph node metastasis is detected, and if D≤0, no lymph node metastasis is determined).
[0018] 本発明において用いる遺伝子群の解析方法としては、以下のものが挙げられる:[0018] Examples of gene group analysis methods used in the present invention include the following:
1. Hierarchical Cluster Analysis クフスタ ~~分析 1. Hierarchical Cluster Analysis
2. Logistic Discrimination (変数選択法を含む) ロジスティック解析  2. Logistic Discrimination (including variable selection method) Logistic analysis
3. Classification And Regression Tree 7 ~~ト  3. Classification And Regression Tree 7
4. Principal Component Analysis Artificial Neural Network (拡張法を含む) ピーシ 一エー Z主成分分析  4. Principal Component Analysis Artificial Neural Network (including extended method)
5. Projection Pursuit for supervised classincation フ—ロンェクンヨンノヽ ~~ンュ ~~ト Z射 影追跡  5. Projection Pursuit for supervised classincation
6. Support Vector Machine サポートベクターマシン/エスブイエム 7. Self Organizing Map エスォーェム 6. Support Vector Machine Support Vector Machine / SBM 7. Self Organizing Map
8. AdaBoost ァダブースト  8. AdaBoost
9.これらの手法を 2つ以上組み合わせた遺伝子の選択の工程。  9. Gene selection process combining two or more of these methods.
[0019] 本発明にお 、て用いる遺伝子発現情報の解析方法としては、口ジステイク解析 (Lo gistic Discrimination) 举けられる。  [0019] As a method for analyzing gene expression information used in the present invention, oral distri- bution analysis (Logistic Discrimination) can be made.
[0020] 本発明において用いる変数選択の方法としては、以下のものが挙げられる: [0020] Variable selection methods used in the present invention include the following:
1.ステップワイズ(逐次増減法; stepwise)  1. Stepwise (stepwise)
2.前進選択法 (forward)  2. Forward selection method (forward)
3.後退選択法 (backward)  3.Backward selection method (backward)
発明の効果  The invention's effect
[0021] 本発明では、大腸癌患者が大腸癌原発巣切除手術を受ける時点で、大腸癌細胞 が周辺のリンパ節に転移して 、る可能性が高 、か否かを判定するために有用な一連 の遺伝子セット、ならびにそれらの遺伝子発現情報に基づいてリンパ節転移の有無 を予測するための判別式が提供される。本発明の方法に従えば、大腸癌原発巣組 織の当該遺伝子セットの遺伝子発現情報を口ジスティック回帰式で解析することによ り、良好なリンパ節転移判定成績を得ることができる。したがって、大腸癌原発巣切除 手術の時点において、リンパ節への癌細胞の転移の有無を予測することが可能とな る。  [0021] The present invention is useful for determining whether or not a colorectal cancer cell is likely to metastasize to nearby lymph nodes when a colorectal cancer patient undergoes a primary colorectal cancer resection operation. A discriminant for predicting the presence or absence of lymph node metastasis is provided based on a series of gene sets and their gene expression information. According to the method of the present invention, a favorable lymph node metastasis determination result can be obtained by analyzing the gene expression information of the gene set of the primary colorectal cancer tissue using a mouth dystic regression equation. Therefore, it is possible to predict the presence or absence of cancer cell metastasis to lymph nodes at the time of primary colorectal cancer resection.
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0022] 本発明の方法は、大腸癌原発巣切除手術の時点におけるリンパ節転移の有無を 予測するうえで有効な遺伝子セット、及び当該遺伝子セットの発現量に基づ!、て実 際にリンパ節転移の有無を予測するための判別式によって特徴付けられる。  [0022] The method of the present invention is based on a gene set effective for predicting the presence or absence of lymph node metastasis at the time of primary colorectal cancer resection, and the expression level of the gene set! Characterized by a discriminant for predicting the presence or absence of nodal metastasis.
[0023] リンパ節転移の有無を予測するうえで有用な遺伝子セットは、多サンプルの大腸癌 原発巣組織について遺伝子発現を網羅的に調べ、その中から判定に利用可能な遺 伝子のセットを選出することにより得られる。このような網羅的な遺伝子発現解析の方 法としては、マイクロアレイをはじめとして、 Northern解析、 ATAC— PCR法(Katoら、 N uc. Acids Res., vol. 25, p. 4694— 4696, 1997)や Taq Man PCR法(Applied Biosyste ms社)に代表されるリアルタイム PCR法、 SAGE (Velculescuら、 Science, vol. 270,p. 48 4-487, 1995)等々様々な方法が利用可能である。 [0023] A gene set useful for predicting the presence or absence of lymph node metastasis is a comprehensive set of genes that can be used for determination from a comprehensive examination of gene expression in multiple samples of colon cancer primary tissue. It is obtained by selecting. Such comprehensive gene expression analysis methods include microarrays, Northern analysis, ATAC-PCR method (Kato et al., Nuc. Acids Res., Vol. 25, p. 4694— 4696, 1997). And real-time PCR represented by Taq Man PCR (Applied Biosystems), SAGE (Velculescu et al., Science, vol. 270, p. 48) 4-487, 1995) can be used.
[0024] 本発明の好ましい態様では、 DNAマイクロアレイを用いた、特開 2004— 33082に 記載の方法に従って行われる。より具体的には、インフォームドコンセントを経て収集 された、原発巣除去手術時の組織病理的観察でリンパ節への転移が認められた患 者由来の大腸癌原発巣組織 63例と、リンパ節転移が認められな力つた患者由来の 原発巣組織 87例の合計 150例の組織にっ 、て上記 DNAマイクロアレイを用いて遺 伝子発現データを取得した。比較対照としては、 40症例分の大腸癌原発巣組織周 辺の正常大腸粘膜組織力 得られた遺伝子発現データを用いた。  [0024] In a preferred embodiment of the present invention, it is performed according to the method described in JP-A-2004-33082 using a DNA microarray. More specifically, 63 cases of primary colorectal cancer tissue that had been collected through informed consent and were found to have metastasized to the lymph nodes during histopathological observations during primary lesion removal surgery, Gene expression data were obtained using the above-mentioned DNA microarray for a total of 150 tissues, including 87 primary tumor tissues derived from patients who had no metastasis. As a comparative control, gene expression data obtained from normal colonic mucosal tissue strength around the colon cancer primary tissue for 40 cases was used.
[0025] 上記の遺伝子発現データは、癌組織力ゝら抽出した全 RNAを材料として調製した蛍 光標識 cDNAを、 DNAマイクロアレイにハイブリダィズさせ、 DNAマイクロアレイ上 のプローブにハイブリダィズした蛍光標識 cDNAの発する蛍光シグナルを専用のス キヤナで検出 ·定量ィ匕することにより取得される。より具体的な手順を以下に記した。  [0025] The gene expression data described above are based on fluorescence emitted from fluorescently labeled cDNA prepared by hybridizing a fluorescently labeled cDNA prepared using total RNA extracted from cancer tissue force to a DNA microarray and a probe on the DNA microarray. The signal is obtained by detecting and quantifying the signal with a special scanner. A more specific procedure is described below.
[0026] 大腸癌組織あるいは正常大腸粘膜組織力ゝらの全 RNA抽出は、 TRIzol試薬 (GIBC 0 BRL社)、 ISOGEN (二ツボンジーン社)などの試薬を用い、各試薬の添付文書に記 載された方法に従って行うことができる。このようにして調製した全 RNAは、そのまま 下記の標識 cDNAの調製に使用することができる。また、例えば、 mRNA Purification Kit (Amersham Biosciences社)などの市販のキットにより、添付の方法に従って、該 全 RNAからポリアデニン付加 RNA (以下、「mRNA」と称することもある)を精製して 以下の標識 cDNAの調製に使用することもできる。  [0026] Total RNA extraction from colorectal cancer tissue or normal colonic mucosa tissue force is described in the package insert of each reagent using reagents such as TRIzol reagent (GIBC 0 BRL) and ISOGEN (Nitsubon Gene). Can be done according to different methods. The total RNA thus prepared can be used as it is for the preparation of the labeled cDNA described below. In addition, for example, by using a commercially available kit such as mRNA Purification Kit (Amersham Biosciences), purifying polyadenine-added RNA (hereinafter also referred to as “mRNA”) from the total RNA according to the attached method, It can also be used for the preparation of cDNA.
[0027] Cy3標識した大腸癌原発巣組織由来の cDNA (以下、「Cy3cDNA」と称すること もある)は、上記の全 RNAまたは mRNA、オリゴ dTプライマー、 dNTP及び Cy3標 識 dUTPを含む混合液に逆転写酵素を加えた後、 37〜45°Cで 1〜3時間、好ましく は、 42°Cで 1時間加温することにより調製される。比較対照として使用される Cy5標 識した正常大腸粘膜由来の cDNA (以下、「Cy5cDNA」と称することもある)の調製 も、正常大腸粘膜組織の全 RNAを用いて同様の方法により行われる。こうして得られ た Cy3cDNA及び Cy5cDNAは、それぞれ変性溶液中で 65〜70°Cで 10〜20分 間、好ましくは、 70°Cで 10分間加熱処理し、中和後、等量混合される(以下、この混 合液を「Cy5 'Cy3cDNA」を称することもある)。変性溶液として、 50mM EDTAを 含む 0.5N NaOH又は IN NaOHなどを用いることができるが、 50mM EDTAを含 む 0.5N NaOHを使用するのが好ましい。 Cy5 'Cy3cDNAの精製は、例えば Micro con-30 (Amicon社)などの巿販キットを用い、添付の方法に従って行われる。 [0027] A cDNA derived from the primary tumor tissue of colorectal cancer labeled with Cy3 (hereinafter sometimes referred to as “Cy3 cDNA”) is mixed in a mixed solution containing the above total RNA or mRNA, oligo dT primer, dNTP and Cy3 labeled dUTP. After adding reverse transcriptase, it is prepared by warming at 37 to 45 ° C for 1 to 3 hours, preferably at 42 ° C for 1 hour. Preparation of a Cy5-labeled normal colon mucosa-derived cDNA (hereinafter also referred to as “Cy5 cDNA”) used as a comparative control is performed in the same manner using total RNA of normal colon mucosa tissue. The thus obtained Cy3 cDNA and Cy5 cDNA are each heat-treated in a denaturing solution at 65 to 70 ° C for 10 to 20 minutes, preferably at 70 ° C for 10 minutes, neutralized, and then mixed in equal amounts (hereinafter referred to as the following). This mixture is sometimes referred to as “Cy5 'Cy3 cDNA”). As a denaturing solution, 50 mM EDTA It is possible to use 0.5N NaOH or IN NaOH containing, but it is preferable to use 0.5N NaOH containing 50 mM EDTA. Cy5′Cy3 cDNA is purified using a commercial kit such as Micro con-30 (Amicon) according to the attached method.
[0028] Cy5 'Cy3cDNAと DNAマイクロアレイにプリントされたプローブとのハイブリダィゼ ーシヨンは、以下のようにして行われる。先ず、プローブを熱変性させるために DNA マイクロアレイを加熱処理し、これに 100°Cで 2分間加熱処理した Cy5 'Cy3cDNA 含有ハイブリダィゼーシヨン液を滴下し、カバーガラスで覆った後、 DNAマイクロアレ ィを密閉容器に入れ、ハイブリダィゼーシヨンを行う。ノ、イブリダィゼーシヨン条件とし ては、ハイブリダィゼーシヨン液がホルムアミドを含む場合には、 42°Cで 12時間以上 のハイブリダィゼーシヨンが行われ、ホルムアミドを含まな 、場合には約 68°Cで 12時 間以上のハイブリダィゼーシヨンが行われる。ノ、イブリダィゼーシヨンの終了後、例え ば Scan Array 4000 (GSI Lumonics社)などの機器により Cy3と Cy5の蛍光をスキャン し、蛍光パターンを画像データとして得る。続いて、これらの画像データを、例えば Q uantarrayソフトウェア(GSI Lumonics社)などのマイクロアレイデータ専用解析ソフトを 用いて解析することにより、全プローブについての Cy3と Cy5の蛍光強度をテキスト 形式の数値データとして得ることができる。  [0028] Hybridization of Cy5'Cy3 cDNA and the probe printed on the DNA microarray is performed as follows. First, in order to heat denature the probe, the DNA microarray was heat-treated, and a hybridization solution containing Cy5'Cy3 cDNA that had been heat-treated at 100 ° C for 2 minutes was added dropwise and covered with a cover glass. Place the array in a sealed container and perform hybridization. As for the hybridization conditions, when the hybridization solution contains formamide, hybridization is performed at 42 ° C for 12 hours or more, and it does not contain formamide. Hybridization takes place at about 68 ° C for over 12 hours. After completion of the hybridization, the fluorescence of Cy3 and Cy5 is scanned as image data by scanning the fluorescence of Cy3 and Cy5 with a device such as Scan Array 4000 (GSI Lumonics). Subsequently, by analyzing these image data using microarray data analysis software such as Quantarray software (GSI Lumonics), the fluorescence intensities of Cy3 and Cy5 for all probes are converted into text data. Obtainable.
[0029] 本発明の好ましい態様では上記 DNAマイクロアレイを使用した力 その代わりにハ イブリダィゼーシヨンのために有効な鎖長を持つ合成 DNAを用いても同様の結果を 得ることができる。例えば、本発明で開示された遺伝子名あるいは配列情報に基づ Vヽて、その一部の配列からなる約 20ヌクレオチド以上の長さを持つ合成 DNAをプロ ーブとして、ガラス基盤などに固定ィ匕したものを使用することも可能である。  [0029] In a preferred embodiment of the present invention, the same results can be obtained even if a synthetic DNA having a chain length effective for hybridization is used instead. For example, based on the gene name or sequence information disclosed in the present invention, a synthetic DNA having a length of about 20 nucleotides or more consisting of a part of the sequence is used as a probe and fixed to a glass substrate or the like. It is also possible to use a trick.
[0030] 一般的に、蛍光強度の低いデータはバックグラウンドの影響を大きく受けているの で、例えば蛍光強度が強い方から 3, 000データポイントだけを残すなどの方法によ り、蛍光強度の低いプローブのデータは棄却され欠損値として扱われる。続いて、ス キャンの際に起こりうる Cy3と Cy5の検出感度調整のずれを補正して標準化するため の操作が行われる。すなわち、各プローブについての Cy3と Cy5の蛍光強度値の比 である Cy3ZCy5を算出し、底が 2の対数値 (以下、「log (Cy3ZCy5)」と記載する) に変換し、各プローブについての log (Cy3ZCy5)値から、全 log (Cy3ZCy5)値の 中央値 (median)を差し引くことにより標準化 log (Cy3ZCy5)値を得ることができる。 該標準化 log (Cy3/Cy5)値は、各遺伝子の発現量として用いることができる。 [0030] Generally, data with low fluorescence intensity is greatly affected by the background. For example, by leaving only 3,000 data points from the higher fluorescence intensity, the fluorescence intensity can be reduced. Low probe data is rejected and treated as missing values. Subsequently, an operation for correcting and standardizing the deviation in the detection sensitivity adjustment between Cy3 and Cy5 that may occur during scanning is performed. Specifically, Cy3ZCy5, which is the ratio of the fluorescence intensity values of Cy3 and Cy5 for each probe, is calculated, converted to a logarithmic value with a base of 2 (hereinafter referred to as “log (Cy3ZCy5)”), and log for each probe. From the (Cy3ZCy5) value, all log (Cy3ZCy5) values The standardized log (Cy3ZCy5) value can be obtained by subtracting the median. The standardized log (Cy3 / Cy5) value can be used as the expression level of each gene.
[0031] このようにして得られた全症例についての標準化された数値データ(以下、「標準化 数値データ」と記載することがある)は、ー且統合され、欠損値を多く含むプローブの データを以降の解析対象から外す目的で次の選択操作が行われる。すなわち、マイ クロアレイで解析した全 150症例のうちの 85%以上にあたる 128例以上でデータが 取得できているプローブのデータのみが選択される。これにより、欠損値を 15%以下 しか含まないプローブのデータだけを選択することができる。さらに、個人的な遺伝子 背景因子の除外を目的として次の選択操作が加えられる。すなわち、各プローブに ついて、 150例の大腸癌原発巣のデータ内での分散値と 12例の正常大腸粘膜につ V、てのデータ内での分散値を算出し、前者が後者の 1.1倍を超て!/、るプローブのデ ータのみが選択される。 [0031] The standardized numerical data (hereinafter sometimes referred to as “standardized numerical data”) for all cases obtained in this manner is integrated and the probe data containing many missing values is collected. The following selection operation is performed for the purpose of removing from the subsequent analysis target. In other words, only probe data for which data has been acquired in more than 128 cases, or more than 85% of all 150 cases analyzed with the microarray, are selected. This allows you to select only probe data that contains 15% or less missing values. In addition, the following selection operations are added to eliminate personal genetic background factors. That is, for each probe, the variance value in the data for 150 primary colon cancer lesions and the variance value in the data for 12 normal colon mucosa were calculated, and the former was 1.1 times the latter. Only the probe data is selected.
[0032] これら一連の選択操作により、 2, 121種類のプローブの標準化された数値データが 以降の解析対象として選択される。このようにして選択される標準化数値データに含 まれる欠損値の存在は、後の統計解析において不都合を生じるため、何らかの方法 で補完される必要がある。  [0032] Through a series of these selection operations, standardized numerical data of 2,121 types of probes are selected for subsequent analysis. The existence of missing values included in standardized numerical data selected in this way causes inconvenience in later statistical analysis and needs to be supplemented in some way.
[0033] 補完の方法としては様々なものが適用可能であるが、例えば、補完する欠損値を含 む症例についての全データの平均値に、その欠損値を含む遺伝子の全症例につい てのデータの平均値をカ卩えた値から、全症例についての全遺伝子のデータの平均 値を引いた値をもって補完する方法がある。他には Troyanskayaらの報告(Bioinforma tics, vol. 17, p. 520-525, 2001)において 3種類の補完方法、すなわち、 K— Neares t Neighoors (KNN) method、 Singular Value Decomposition (b D) based method及び row average methodによる補完の例が示されている。これらのうちのいずれかの方法 を適用することにより、全ての欠損値を補完することが可能である。  [0033] Although various methods can be applied as a complementation method, for example, the average value of all data for cases including missing values to be complemented is the data for all cases of genes containing the missing values. There is a method of supplementing with a value obtained by subtracting the average value of all gene data for all cases from the value obtained by calculating the average value of. In addition, in Troyanskaya et al. (Bioinformatics, vol. 17, p. 520-525, 2001), there are three complementary methods: K—Nearest t Neighoors (KNN) method, Singular Value Decomposition (b D) based An example of completion by method and row average method is shown. By applying one of these methods, it is possible to supplement all missing values.
[0034] 力べして準備される欠損値が補完された標準化数値データ (以下、「標準化遺伝子 発現データ」と称することもある)は、ノ ックグラウンドの影響を受けておらず、 Cy3と C y5の検出感度の違いによる誤差を含まず、また欠損値も含まず、かつ、正常大腸粘 膜との比較における大腸癌原発巣の遺伝子発現の変動幅が個人差に起因する遺伝 子発現の変動幅を超えている遺伝子の発現情報を有しており、以後の統計解析の 信頼性を確保することができるものである。 [0034] Standardized numerical data (hereinafter also referred to as “standardized gene expression data”) supplemented with missing values prepared by force is not affected by knock ground, and Cy3 and Cy5 Inheritance that does not include errors due to differences in detection sensitivity, does not include missing values, and the variation range of gene expression in the colorectal cancer primary lesion compared to normal colon mucosa is due to individual differences It has gene expression information that exceeds the fluctuation range of child expression, and can ensure the reliability of subsequent statistical analysis.
[0035] DNAマイクロアレイの測定で得られる大量の遺伝子発現データを統計学的手法に より処理し、目的に叶う遺伝子セットを導き出す方法については、確立された一般的 なものはなぐ研究者の相当な鋭意工夫を必要とするのが現実である。本発明にお いては、まず 4つの異なるアプローチで解析を行い、それらの各々について、リンパ 節転移の有無の予測に利用可能な遺伝子群が同定される。  [0035] With regard to a method for processing a large amount of gene expression data obtained by DNA microarray measurement using statistical methods and deriving a gene set that meets the purpose, there is considerable research by many researchers. The reality is that it requires diligent ingenuity. In the present invention, analysis is first performed by four different approaches, and for each of these, a gene group that can be used for prediction of the presence or absence of lymph node metastasis is identified.
[0036] 本発明で行った 4つのアプローチは、  [0036] The four approaches taken in the present invention are:
(a) Support Vector Machine (SVM) (Hastieり、 The Elements of Statistical Learning -Data Mining, Inference, and Prediction, Springer, 2001 J、  (a) Support Vector Machine (SVM) (Hastie, The Elements of Statistical Learning -Data Mining, Inference, and Prediction, Springer, 2001 J,
(b) Principal Component Analysis/ artificial Neural Network (PC A/ aNN) (Khanら、 N ature Medicine, vol. 7, p. 673— 679, 2001)の拡張法、  (b) Extension method of Principal Component Analysis / artificial Neural Network (PC A / aNN) (Khan et al., Nature Medicine, vol. 7, p. 673—679, 2001),
(c) Hierarchical Cluster Analysis (HCA) + Stepwise Logistic Discrimination及び (c) Hierarchical Cluster Analysis (HCA) + Stepwise Logistic Discrimination and
(d) Classification And Regression Tree (CART) (Breimanら、 Classification and Regr ession Trees, Wadswarth, 1983) + Logistic Discrimination (d) Classification And Regression Tree (CART) (Breiman et al., Classification and Regresion Trees, Wadswarth, 1983) + Logistic Discrimination
である。遺伝子群の同定に際しては、統計学的な信頼性を担保する目的で、全デー タを予測用遺伝子同定用と評価用の 2群に分けて解析を行う。より具体的に述べると 、 150例のデータを、リンパ節転移ありの 42例とリンパ節転移なしの 57例力も成る 99 例と、リンパ節転移ありの 21症例とリンパ節転移なしの 30例力も成る 51例、の 2群に 分け、前者の 99例分のデータをリンパ節転移の有無を予測するための遺伝子の同 定と判別式の確立に使用し、その判別式で後者の 51例分のデータを判別することに より、判別式の評価を行う。以降の記載においては、予測用遺伝子の同定及び判別 式の確立に使用される前者の 99例分のデータを「トレーニング用データ」と表現し、 判別式の評価に使用する後者の 51例分のデータを「テスト用データ」と表現すること がある。  It is. When identifying gene groups, the analysis is divided into two groups, one for predicting gene identification and the other for evaluation, to ensure statistical reliability. More specifically, the data of 150 cases were also divided into 42 cases with lymph node metastasis and 57 cases with no lymph node metastasis, 99 cases, and 21 cases with lymph node metastasis and 30 cases with no lymph node metastasis. The data from the former 99 cases are used to identify genes and establish discriminants for predicting the presence or absence of lymph node metastasis. The discriminant is evaluated by discriminating this data. In the following description, the former 99 cases of data used for identification of the predictive gene and establishment of the discriminant are expressed as “training data” and the latter 51 cases used for discriminant evaluation. Data is sometimes expressed as “test data”.
[0037] 上記 4つのアプローチのうちの(a)、(c)及び(d)の 3つに関しては、上記のデータ の 2分割を、分割時の標本変動を考慮して、ランダムに 100回行って解析することに より、 100通りの判定用遺伝子群を同定したうえで、同定された回数が多力つた遺伝 子を採用する。 [0037] With regard to (a), (c), and (d) of the above four approaches, the above data is divided into 100 times randomly, taking into account the sample variation at the time of division. Analysis of 100 gene groups for judgment, and the number of identified genes Adopt a child.
[0038] 一方、上記 (b)のアプローチに関しては、膨大な計算量を考慮して最初の 2分割に ついてのみ実施する。ただし、トレーニング用の 99例分のデータを 2対 1の割合で 12 50回ランダムに 2分割し、それを用いて主成分分析と-ユーラルネットワークの学習 を反復する。学習後、リンパ節転移の有無を識別する感度に基づき遺伝子をランキ ングし,遺伝子を絞り込む。 2121個の遺伝子から開始し、以降 1536個、 768個、 38 4個、 192個、 96個、 48個、 24個の絞り込み個数のそれぞれで学習を進める。  [0038] On the other hand, the above approach (b) is implemented only for the first two divisions considering a huge amount of calculation. However, the data for 99 cases for training is divided into 2 1250 times randomly at a ratio of 2: 1, and the principal component analysis and the learning of the -Ural network are repeated using it. After learning, rank genes based on their sensitivity to identify the presence or absence of lymph node metastasis. Start with 2121 genes, and continue learning with 1536, 768, 384, 192, 96, 48, and 24 refinements.
[0039] 以上の解析を行うことにより、各遺伝子セットに含まれる遺伝子の個数と、確立され た判別式を用いたテスト用データの正分類率 (判別式での判別結果と組織病理学的 検査の結果が一致した症例数 Zテスト用データ数 X 100 (%) )の平均値として、(a) については遺伝子数が 144個で正分類率は 80.2% (標準偏差は 5.6%)、 (b)につ いては遺伝子数が 192個で正分類率は(90.2%)、 (c)については遺伝子数が 133 個で正分類率は 78.6% (標準偏差は 6.2%)及び (d)については遺伝子数が 138個 で正分類率は 86.3% (標準偏差: 4.5%)が得られる。このとき、 16種類の遺伝子が、 各アプローチで選択された遺伝子セットに共通して含まれる。  [0039] By performing the above analysis, the number of genes included in each gene set and the correct classification rate of test data using the established discriminant (the discriminant discriminant result and histopathological examination) As the average of the number of cases in which the results of Z match the number of test data X 100 (%)), (a) has 144 genes and the correct classification rate is 80.2% (standard deviation is 5.6%), (b For), the number of genes is 192 and the correct classification rate is (90.2%). For (c), the number of genes is 133 and the correct classification rate is 78.6% (standard deviation is 6.2%) and for (d). The number of genes is 138, and the correct classification rate is 86.3% (standard deviation: 4.5%). At this time, 16 types of genes are commonly included in the gene set selected by each approach.
[0040] 次いで、正分類率を落とさないようにしつつ、予測に使用する遺伝子の数を絞り込 むために、まず対象とする遺伝子を上記の 16遺伝子とし、これら 16遺伝子各々の寄 与 (以下、「主効果」と記載する)に加えて、 2遺伝子の交互作用も加味した統計解析 を行う。これにより、個別の遺伝子による主効果だけでなぐ遺伝子間の交互作用を 含めたより広い範囲で判別ルールを探索することとなり、高い判別性能を維持できる ことが期待される。  [0040] Next, in order to narrow down the number of genes used for prediction while keeping the normal classification rate from falling, the target genes are first set to the above 16 genes, and each of these 16 genes is donated (hereinafter referred to as " In addition to the “main effect”), a statistical analysis is performed that also takes into account the interaction of two genes. As a result, it will be possible to search for a discrimination rule in a wider range including the interaction between genes only by the main effect of individual genes, and it is expected that high discrimination performance can be maintained.
[0041] 交互作用探索のために、上記解析で用いた 100通りのトレーニング用データのそ れぞれで、リンパ節転移の有無を応答とする CART解析を再度行う。この CART解 析では、 Freeのソフトウェア Rの rpartを用いた。その際、操作パラメータは全てデフォ ルトの値を用いた。この解析により、データの分割を指示する変数として登場する遺 伝子の個数として、 1回の解析あたり 3個から 5個が得られる。  [0041] In order to search for interaction, CART analysis is performed again with the presence or absence of lymph node metastasis as a response for each of the 100 training data used in the above analysis. In this CART analysis, rpart of Free software R was used. At that time, the default values were used for all operation parameters. From this analysis, 3 to 5 genes can be obtained per analysis as the number of genes that appear as variables that instruct data division.
[0042] 登場する遺伝子が例えば 3個の場合、とり得る遺伝子のペアは 3通りあるため、それ ら全てのペアを交互作用として捉える。同様にして、遺伝子が 4個の場合は 6組、 5個 の場合は 10組の各ペアを交互作用として捉える。そして、できる限り多くの候補を力 バーするため 100通りの解析のうち 12回以上現れた 18組の遺伝子ペアを交互作用 の候補として選択する。 [0042] When there are three genes that appear, for example, there are three possible gene pairs, so all these pairs are considered as interactions. Similarly, if there are 4 genes, 6 sets, 5 sets In the case of, 10 pairs are considered as interactions. To select as many candidates as possible, 18 gene pairs that appear 12 times or more out of 100 analyzes are selected as interaction candidates.
[0043] 次に判別式の確立のために、 150例のデータを用いて、 16遺伝子の主効果と 18 通りの交互作用を説明変数とし、リンパ節転移の有無を応答としたロジスティック回帰 モデルにおいてステップワイズの変数選択を行う。その際、回帰係数の有意性検定 の P値を、変数の組入れ基準 (0.05未満)及び除外基準 (0.05超)として用いる。これ により、 6個の変数、すなわち、 G1592、 G3031、 G3826、 G4370、 G2645と G3177の交 互作用、 G3753と G3826の交互作用が選択される。そして、リンパ節転移の有無を予 測するための判別式は、 [0043] Next, in order to establish a discriminant, using the data of 150 cases, in a logistic regression model with the main effect of 16 genes and 18 interactions as explanatory variables and the presence or absence of lymph node metastasis as a response Perform stepwise variable selection. In this case, the P value of the significance test of the regression coefficient is used as the inclusion criterion (less than 0.05) and exclusion criterion (greater than 0.05). This selects six variables: G1592, G3031, G3826, G4370, G2645 and G3177 interaction, and G3753 and G3826 interaction. And the discriminant for predicting the presence or absence of lymph node metastasis is
D = 0.2307- 2.7132 X「G1592の発現量」  D = 0.2307- 2.7132 X `` G1592 expression level ''
+ 8.9509 X「G3031の発現量」  + 8.9509 X `` G3031 expression level ''
+ 8.7975 X「G3826の発現量」  + 8.7975 X `` G3826 expression level ''
— 2.3098 X「G4370の発現量」  — 2.3098 X “G4370 expression level”
+ 3.5126 X「G2645の発現量」 X「G3177の発現量」  + 3.5126 X `` G2645 expression level '' X `` G3177 expression level ''
— 8.8226 X「G3753の発現量」 X「G3826の発現量」  — 8.8226 X "Expression level of G3753" X "Expression level of G3826"
と推定され、 D>0のときリンパ節転移あり、 D≤0のときリンパ節転移なし、とする判別 ルールが導かれる。この判別式に登場した 7個の遺伝子、すなわち G1592、 G2645、 G3031、 G3177、 G3753、 G3826、 G4370をリンパ節転移の有無の識別に寄与する遺 伝子のセットとして選択する。それらの遺伝子名を表 1に記した。  It is estimated that when D> 0 there is a lymph node metastasis, and when D≤0 there is no lymph node metastasis. Seven genes appearing in this discriminant, namely G1592, G2645, G3031, G3177, G3753, G3826, and G4370, are selected as a set of genes that contribute to the identification of the presence or absence of lymph node metastasis. Their gene names are listed in Table 1.
[0044] 最後に、選択した遺伝子セットによるリンパ節転移の判別性能が LOO法により評価 される。すなわち、 1サンプルを除いた残りの 149サンプルのデータを用いて、上記の 6個の変数を含むロジスティック判別式を推定し、それによつて除 、たサンプルを判 別する操作を、 150サンプルのそれぞれで実施する。これにより、表 2に示すように、 選択した遺伝子のセットによる正分類率は 88.7% (感度: 77.8%、特異度: 96.6%) と推定される。以上のように、本発明においては、大腸癌のリンパ節転移の有無を高 い精度で予測するのに必要な遺伝子セットを明らかにすることができる。  [0044] Finally, the discrimination performance of lymph node metastasis by the selected gene set is evaluated by the LOO method. That is, using the remaining 149 sample data excluding one sample, the logistic discriminant including the above six variables is estimated, and the operation of discriminating the sample is divided into 150 samples. To implement. As a result, as shown in Table 2, the correct classification rate for the selected set of genes is estimated to be 88.7% (sensitivity: 77.8%, specificity: 96.6%). As described above, in the present invention, it is possible to clarify a gene set necessary for predicting the presence or absence of lymph node metastasis of colorectal cancer with high accuracy.
[0045] 以下に本発明を実施例により詳細に説明するが、これら実施例によって本発明は 何ら制約を受けることはない。なお、実施例において使用した試薬類は特にことわり のない限り、ナカライテスタ株式会社より購入したものを使用した。 [0045] Hereinafter, the present invention will be described in detail with reference to examples. There are no restrictions. The reagents used in the examples were those purchased from Nakarai Testa Co., Ltd. unless otherwise specified.
実施例 1  Example 1
[0046] (1)大腸 鉬.織試料からの全 RNA調製  [0046] (1) Total RNA preparation from large intestine 織.
DNAマイクロアレイを用いた、大腸癌における遺伝子発現解析を行うための試料と しては、インフォームドコンセントを経て収集された、大腸癌手術時に切除された大腸 癌原発巣組織 150例を用いた。その内訳は、原発巣除去手術時の組織病理学的な 観察でリンパ節転移が認められた患者に由来する 63例(以下、「リンパ節転移陽性 症例」と記載する)と、リンパ節転移が認められな力つた患者に由来する 87例(以下、 「リンパ節転移陰性症例」と記載する)である。これらの大腸癌組織試料から TRIzol試 薬 (GIBCO BRL社より購入)を用いて全 RNAを抽出した。抽出手順は基本的に上記 試薬に添付のマニュアルに従った。この他に、 40例分の正常大腸粘膜部分由来の 全 RNAを抽出し、それらを混合して、全ての実験を通して使用する標準正常大腸粘 膜全 RNAとした。これらの RNAサンプルの濃度は、定法通りに分光光度計を用いて 測定した波長 260nmでの吸光度に基づいて算出した。  As samples for performing gene expression analysis in colorectal cancer using a DNA microarray, 150 cases of primary colorectal cancer tissue collected at the time of colorectal cancer surgery collected through informed consent were used. The breakdown consists of 63 patients (hereinafter referred to as “positive lymph node metastasis”) derived from patients whose lymph node metastasis was observed by histopathological observation at the time of removal of the primary lesion. There are 87 cases (hereinafter referred to as “negative lymph node metastasis cases”) derived from patients who were not recognized. Total RNA was extracted from these colon cancer tissue samples using TRIzol reagent (purchased from GIBCO BRL). The extraction procedure basically followed the manual attached to the reagent. In addition, 40 cases of total RNA from normal large intestine mucosa were extracted and mixed to obtain standard normal large intestine mucosa total RNA for use in all experiments. The concentrations of these RNA samples were calculated based on the absorbance at a wavelength of 260 nm measured using a spectrophotometer as usual.
[0047] (2)蛍光ラベルターゲットの調製  [0047] (2) Preparation of fluorescent label target
DNAマイクロアレイにハイブリダィズさせる蛍光ラベルターゲットは以下の手順で作 製した。まず、 25 gの大腸癌部試料由来全 RNA (以下、「大腸癌 RNA」と記す)と 25 μ gの標準正常大腸粘膜全 RNA (以下、「標準大腸粘 HRNA」と記す)を別々の チューブに入れ、それぞれに 2 gの 18ヌクレオチド力も成るオリゴ dTプライマーをカロ え、滅菌蒸留水にて容量を 14 Lとし、 70°Cで 10分間加熱した後、直ちに氷上に移 して急冷した。その後、それぞれのチューブに、 6 μ Lの 5 X First Strand Buffer、 3 μ Lの 0.1M DTT、 1.5 /z Lの 20 X dNTPmix (10mMの dATP、 dCTP、 dGTP及び 6 mMの dTTPの混合物)及び 0.5 μ Lの RNAguardを添カ卩した。  The fluorescent label target to be hybridized to the DNA microarray was prepared by the following procedure. First, 25 g of colon cancer sample-derived total RNA (hereinafter referred to as “colon cancer RNA”) and 25 μg of standard normal colon mucosa total RNA (hereinafter referred to as “standard colon mucosa HRNA”) are in separate tubes. 2 g of oligo dT primers each having a force of 18 nucleotides were prepared, brought to a volume of 14 L with sterilized distilled water, heated at 70 ° C. for 10 minutes, immediately transferred to ice and rapidly cooled. Then in each tube, add 6 μL of 5 X First Strand Buffer, 3 μL of 0.1 M DTT, 1.5 / z L of 20 X dNTPmix (10 mM dATP, dCTP, dGTP and 6 mM dTTP mixture) and 0.5 μL of RNAguard was added.
[0048] さらに、大腸癌 RNAを入れた方のチューブに蛍光色素 Cy3でラベルされた dUTP  [0048] dUTP labeled with the fluorescent dye Cy3 on the tube containing colon cancer RNA
(以下、「Cy3— dUTP」と記す;濃度 ImM)を 3 μ L、標準大腸粘 URN Aを入れた 方のチューブに Cy5でラベルされた dUTP (以下、「Cy5— dUTP」と記す;濃度 lm M)を 3 Lカ卩えて、 42°Cにて 2分間保温した。その後、逆転写酵素である Superscrip tilを各チューブに 2 Lカ卩えて、 42°Cにてさらに 1時間保温することによりラベル反応 を行った。この反応により、大腸癌 RNAと標準大腸粘 HRNAを铸型として cDNA合 成が起こる際に、それぞれ Cy3— dUTPと Cy5— dUTPが取り込まれることにより、そ れぞれ Cy3と Cy5で蛍光ラベルされた大腸癌ラベルターゲットと標準大腸粘膜ラベ ルターゲットが生成する。 (Hereinafter referred to as “Cy3—dUTP”; concentration ImM) 3 μL, standard colonic mucosa URN A tube containing Cy5 labeled dUTP (hereinafter referred to as “Cy5—dUTP”; concentration lm 3) M) was added and incubated at 42 ° C for 2 minutes. Then Superscrip, a reverse transcriptase Label reaction was performed by adding 2 L of til to each tube and incubating at 42 ° C for an additional hour. As a result of this reaction, when cDNA synthesis occurs using colon cancer RNA and standard colon mucosa HRNA as a saddle type, Cy3-dUTP and Cy5-dUTP were incorporated, respectively, so that they were fluorescently labeled with Cy3 and Cy5, respectively. A colorectal cancer label target and a standard colorectal mucosa label target are generated.
[0049] この反応で使用した 5 X First Strand Buffer, O.IM DTT及び Superscriptllは、いず れも GIBCO BRL社より購入した。また、 dATP、 dCTP、 dGTP及び dTTP、 Cy5— d UTP及び Cy3— dUTP、そして RNAguardはいずれも Amersham Biosciences社より 購入した。反応後は、各チューブに 5 /z Lの変性溶液(0.5N NaOH、 50mM EDT A)を添カ卩して 70°Cで 10分間加熱した後、 7.5 μ Lの 1M Tris— HCl(pH7.5)をカロ えることにより中和した。これらの処理を行った段階で、大腸癌ラベルターゲットと標 準大腸粘膜ラベルターゲットを混合し、ここに 10 gのヒト COT— 1 DNA (GIBCO B RL社より購入)を添カ卩した。この混合液に TEバッファーをカ卩えて 500 Lに調整し、 Microcon 30 (Amicon社より購入)を用 、て精製 ·濃縮することにより、未反応の Cy5 dUTP及び Cy3— dUTPなどを除去した。精製 ·濃縮の手順は Microcon— 30に添 付のマニュアルに従った。最終的には、全容量が 5 Lとなるまで濃縮し、これを DN Aマイクロアレイにハイブリダィズさせるラベルターゲットとした。  [0049] The 5 X First Strand Buffer, O.IM DTT and Superscriptll used in this reaction were all purchased from GIBCO BRL. DATP, dCTP, dGTP and dTTP, Cy5-d UTP and Cy3-dUTP, and RNAguard were all purchased from Amersham Biosciences. After the reaction, add 5 / z L of denaturing solution (0.5N NaOH, 50 mM EDT A) to each tube, heat at 70 ° C for 10 minutes, and then add 7.5 μL of 1M Tris-HCl (pH 7. It was neutralized by calorie 5). At the stage where these treatments were performed, the colon cancer label target and the standard colon mucosa label target were mixed, and 10 g of human COT-1 DNA (purchased from GIBCO BRL) was added thereto. TE buffer was added to this mixture, adjusted to 500 L, and purified and concentrated using Microcon 30 (purchased from Amicon) to remove unreacted Cy5 dUTP and Cy3-dUTP. The purification / concentration procedure followed the manual attached to Microcon-30. Finally, it was concentrated until the total volume became 5 L, and this was used as a label target to be hybridized to the DNA microarray.
[0050] (3) DNAマイクロアレイの前処理  [0050] (3) DNA microarray pretreatment
DNAマイクロアレイをマスキング溶液(3gの無水コハク酸、 190mLの N—メチルー 2 ピロリドン及び 21mLの 0.2Mホウ酸ナトリウムの混合液)に 5分間浸すことにより マスキングを行った後、 95°Cの蒸留水に 3分間浸すことにより、マイクロアレイ上にプ リントされている cDNAを熱変性させた。その後直ちに 95%以上のエタノールに 1分 間浸して脱水し風乾させた。  Mask the DNA microarray by immersing it in a masking solution (3 g of succinic anhydride, 190 mL of N-methyl-2-pyrrolidone and 21 mL of 0.2 M sodium borate) for 5 minutes, and then in distilled water at 95 ° C. The cDNA printed on the microarray was heat denatured by soaking for 3 minutes. Immediately after that, it was immersed in 95% or more ethanol for 1 minute, dehydrated and air-dried.
[0051] (4)ラベルターゲットと DNAマイクロアレイとのハイブリダィゼーシヨン  [0051] (4) Hybridization of label target and DNA microarray
前述のようにして調製したラベルターゲット溶液 5 μ Lに対して、 2.5 μ Lの lOmgZ mLのポリアデニン(Roche社より購入)、 0.5 μ Lの 10%SDS溶液、 3 μ Lの 20 Χ ΡΜ 溶液(0.4%BSAと 1%SDSの混合液)、 15 Lのホルムアミド、 3 Lの 20 X SSC ( 3M塩ィ匕ナトリウム、 0.3Mクェン酸ナトリウム、 pH7.0)及び滅菌蒸留水 1 μ Lを添カロ し、 100°Cで 2分間加熱した後、暗所にて約 30分間室温で静置した。その後、前項 に記載の方法で前処理した DNAマイクロアレイの cDNAがプリントされている部分に 滴下し、 24 X 40ミリメートルのカバーガラス (マツナミガラス工業より購入)で覆い、マ イクロアレイを密閉容器に入れ、その容器ごと 42°Cのインキュベーターに約 16時間 入れておくことにより、ラベルターゲットをマイクロアレイ上の cDNAにハイブリダィズさ せた。ハイブリダィゼーシヨンの後、マイクロアレイを 0.1%SDSを含む 2 X SSCに浸 して 10分間洗浄し、次に、 0.1%SDSを含む 0.1 X SSCに浸して 10分間洗浄した。 さらに、 0.1 X SSCに浸して 5分間の洗浄を 2回行った後、滴を切って暗所で風乾さ せた。 To 5 μL of the label target solution prepared as described above, 2.5 μL of lOmgZ mL of polyadenine (purchased from Roche), 0.5 μL of 10% SDS solution, 3 μL of 20 Χ (solution ( 0.4% BSA and 1% SDS), 15 L formamide, 3 L 20 X SSC (3 M sodium chloride, 0.3 M sodium citrate, pH 7.0) and 1 μL of sterile distilled water After heating at 100 ° C. for 2 minutes, the mixture was allowed to stand at room temperature in the dark for about 30 minutes. After that, drop it on the DNA microarray cDNA pre-treated by the method described in the previous section, cover it with a 24 x 40 mm cover glass (purchased from Matsunami Glass Industry), place the microarray in a sealed container, The label target was hybridized to the cDNA on the microarray by placing the container in a 42 ° C incubator for about 16 hours. After hybridization, the microarray was immersed in 2 × SSC containing 0.1% SDS and washed for 10 minutes, and then immersed in 0.1 × SSC containing 0.1% SDS for 10 minutes. Furthermore, after immersing in 0.1 X SSC and washing for 5 minutes twice, drops were cut and air-dried in the dark.
[0052] (5)マイクロアレイのスキャン データ解析  [0052] (5) Microarray scan data analysis
洗浄後風乾させたマイクロアレイを、マイクロアレイ専用共焦点レーザースキャナで ある ScanArray 4000 (GSI Lumonics社製)を使って Cy3と Cy5の蛍光を独立にスキヤ ンすることにより、マイクロアレイ上の各プローブにハイブリダィズした大腸癌ターゲッ トと標準大腸ターゲットに由来する Cy3と Cy5の蛍光パターンを 16のビット Tiff形式 のスキャン画像データとして得た。続いて、それらの画像データをマイクロアレイデー タ専用解析ソフトである QuantArrayソフトウェア(GSI Lumonics社製)を用いて解析す ることにより、全プローブについての Cy3と Cy5の蛍光強度をテキスト形式の数値デ ータとして得た。バックグラウンドの補正のために、 cDNAがプリントされていない部 分の蛍光強度値を、各プローブについての蛍光強度値力も差し引いた。また、蛍光 強度値が低い部分は実験誤差の影響を大きく受けるため、蛍光強度値が高い方力 約 3000のデータポイントを残して他のデータは棄却した。各プローブにつ 、ての Cy 3と Cy5の蛍光強度値の比、すなわち Cy3ZCy5を算出し、底が 2の対数値 (以下、「 log (Cy3/Cy5)」と記載する)に変換した。スキャンの際に起こりうる Cy3と Cy5の検 出感度調整のずれを補正して標準化するために、各プローブにつ!/、ての log (Cy3 /Cy5)値から、全 log (Cy3/Cy5)値の中央値 (median)を差し引くことにより標準 ィ匕 log (Cy3,Cy5)値を得た。  After washing and air-drying microarrays, ScanArray 4000 (GSI Lumonics), a confocal laser scanner dedicated to microarrays, independently scans the fluorescence of Cy3 and Cy5 to hybridize each probe on the microarray. Fluorescence patterns of Cy3 and Cy5 derived from cancer targets and standard colon targets were obtained as 16-bit Tiff scanned image data. Subsequently, these image data are analyzed using QuantArray software (GSI Lumonics), which is analysis software dedicated to microarray data, so that the fluorescence intensity of Cy3 and Cy5 for all probes is numerically expressed in text format. Obtained as In order to correct the background, the fluorescence intensity value for each probe was subtracted from the fluorescence intensity value of the part where the cDNA was not printed. In addition, since the portion with a low fluorescence intensity value is greatly affected by experimental errors, other data were rejected, leaving a data point of approximately 3000 forces with a high fluorescence intensity value. For each probe, the ratio of the fluorescence intensity values of Cy3 and Cy5, ie, Cy3ZCy5, was calculated and converted to a logarithmic value with a base of 2 (hereinafter referred to as “log (Cy3 / Cy5)”). In order to correct and standardize the detection sensitivity adjustment of Cy3 and Cy5 that may occur during scanning, the total log (Cy3 / Cy5) is calculated from the log (Cy3 / Cy5) value of each probe! The standard log (Cy3, Cy5) value was obtained by subtracting the median of the values.
[0053] 以上の操作により、標準大腸粘 HRNAを基準としたときの、リンパ節転移ありの症 例 63例分及びリンパ節転移なしの症例 87例分の大腸癌原発巣の相対的発現強度 を対数ィ匕し、標準化した数値データを得ることができた。また、同様の操作によって、 標準大腸粘 HRNAを基準としたときの、正常大腸粘膜サンプル 12例分の数値デー タも得た。これらの数値データのうち、解析した 150症例の大腸癌原発巣のうちの 85 %にあたる 128症例以上についてデータが取得できており、かつ、 150症例の大腸 癌原発巣のデータ内での分散値 (variance)力 12例の正常大腸粘膜についてのデ ータ内での分散値の 1. 1倍を超えて 、た合計 2, 121種類のプローブにつ!/、てのデ ータのみを以降の統計解析に使用した。 [0053] By the above operation, the relative expression intensity of the primary colorectal cancer lesions for 63 cases with lymph node metastasis and 87 cases without lymph node metastasis when standard colonic mucosa HRNA was used as a reference. We were able to obtain standardized numerical data. In the same manner, numerical data for 12 normal colon mucosa samples were obtained using standard colon mucosa HRNA as a reference. Of these numerical data, data were acquired for more than 128 cases, or 85% of the 150 colorectal cancer lesions analyzed, and the variance within the data for 150 colon cancer primary lesions ( (dispersion) force The variance value in the data for 12 normal colonic mucosa is 1.1 times more than a total of 2,121 types of probes! Used for statistical analysis.
[0054] これらの 2, 121プローブのデータの中には、症例によっては発現量が低ぐカットォ フ値を下回ったために棄却されたものも含まれており、それらのデータは欠損値とな つている。このような欠損値は合計で 8,816個存在していた。全データ数は 150 X 2, 121 = 318, 150個であること力 、欠損値の含有率は約 2.8%である。これらの欠損 値の存在は、後の統計解析において不都合を生じるため、 k-Nearest Neighbor法を 用いて、それらを補完した。具体的には、 150症例 X 2, 121プローブのデータ行列 において、欠損値となっていた発現量を、サンプル間距離でその遺伝子に最も近い 8個の遺伝子の発現量の平均値で推定した。具体的には、 150症例 X 2, 121プロ一 ブのデータ行列において、ペアにした全てのサンプル間の距離を計算した。そして、 欠損値のあるサンプルと最も距離の近カゝつた 8個のサンプルで補完すべき遺伝子発 現量の平均値を求め、それを欠損値の補完値として用いた。ここに、欠損値のあるサ ンプルと最も距離の近いサンプルの個数は、個数を順次増加させ root mean square ( RMS)が最小となる個数で定義した。このようにして欠損値を補完して得られた全数値 データを以降、標準化遺伝子発現データと記載する。 [0054] Among these 2,121 probe data, some cases were rejected because the expression level was below the cut-off value, which was low, and these data were missing values. Yes. There were a total of 8,816 such missing values. The total number of data is 150 X 2, 121 = 318, 150, and the missing value content is about 2.8%. The existence of these missing values causes inconveniences in later statistical analyses, so they were supplemented using the k-Nearest Neighbor method. Specifically, in the data matrix of 150 cases X 2,121 probes, the expression level that was a missing value was estimated by the average value of the expression levels of the 8 genes closest to that gene in the distance between samples. Specifically, the distance between all paired samples was calculated in a data matrix of 150 cases x 2,121 probes. Then, the average value of the gene expression level to be complemented by the eight samples that were closest to the sample with the missing value was obtained and used as the complement value for the missing value. Here, the number of samples closest to the sample with missing values is defined as the number that increases the number sequentially and minimizes the root mean square (RMS). All the numerical data obtained by complementing the missing values in this way are hereinafter referred to as standardized gene expression data.
実施例 2  Example 2
[0055] 標準化遣伝早 現データの統 t解析によるリンパ節転移予沏 Iのための高判 能遣 伝子セットの決定  [0055] Standardized transmission early determination of high-performance gene set for lymph node metastasis prediction I by t analysis of current data
本実施例にぉ 、ては、 DNAマイクロアレイにプリントしたプローブを指して遺伝子と 呼称することがある。  In this example, a probe printed on a DNA microarray may be referred to as a gene.
最初の試みとして、判定用遺伝子セットの同定のために次の 4つのアプローチを適 用した。すなわち、(a) Support Vector Machine (SVM) (Hastieら、 The Elements of St atistical Learning-Data Mining, Inference, and Prediction, Springer, 2001)、 (b) Prin cipal Component Analysis/ artificial Neural Network (PC A/ aNN) (Khanら、 Nature Me dicine, vol. 7, p. 673— 679, 2001)の拡張法、(c) Hierarchical Cluster Analysis (HCA ) + Stepwise Logistic Discrimination及び (d) Classification And Regression Tree (C ART) (Breimanら、 Classification and Regression Trees, Wadswarth, 1983) + Logisti c Discrimination,の 4通りの手法を用いた。 As an initial attempt, the following four approaches were applied to identify the gene set for judgment. (A) Support Vector Machine (SVM) (Hastie et al., The Elements of St atistical Learning-Data Mining, Inference, and Prediction, Springer, 2001), (b) Prin cipal Component Analysis / artificial Neural Network (PC A / aNN) (Khan et al., Nature Me dicine, vol. 7, p. 673—679 (C) Hierarchical Cluster Analysis (HCA) + Stepwise Logistic Discrimination and (d) Classification And Regression Tree (C ART) (Breiman et al., Classification and Regression Trees, Wadswarth, 1983) + Logistic c Discrimination, The following four methods were used.
[0056] 遺伝子群の同定に際しては、統計学的な信頼性を担保する目的で、全データを判 定用遺伝子同定用と評価用の 2群に分けて解析を行った。より具体的に述べると、 1 50例のデータを、リンパ節転移ありの 42例とリンパ節転移なしの 57例力も成る 99例 と、リンパ節転移ありの 21症例とリンパ節転移なしの 30例力も成る 51例、の 2群に分 け、前者の 99例分のデータをリンパ節転移の有無を判定するための遺伝子の同定と 判別式の確立に使用し、その判別式で後者の 51例分のデータを判別することにより 、遺伝子の同定と判別式の評価を行った。以降の記載においては、前者の 99例分 のデータのように判定用遺伝子の同定及び判別式の確立に使用されるデータを「ト レーニング用データ」と表現し、後者の 51例分のデータのように判別式の評価に使 用するデータを「テスト用データ」と表現することがある。  [0056] For identification of the gene group, all data were divided into two groups for determination gene identification and evaluation for the purpose of ensuring statistical reliability. More specifically, 1 50 cases of data from 42 cases with lymph node metastases and 57 cases with no lymph node metastases, 99 cases, 21 cases with lymph node metastases, and 30 cases without lymph node metastasis The data from the former 99 cases were used to identify genes and establish discriminants for determining the presence or absence of lymph node metastasis. By discriminating the minute data, gene identification and discriminant evaluation were performed. In the following description, the data used to identify the determination gene and establish the discriminant, as in the former 99 cases, is expressed as “training data” and the latter 51 cases. Thus, data used for discriminant evaluation may be expressed as “test data”.
[0057] 上記 4つのアプローチのうちの(a)、(c)及び(d)の 3つに関しては、上記のデータ の 2分割をランダムに 100回行って解析することにより、 100通りの判定用遺伝子群 を同定したうえで、同定された回数が多力つた遺伝子を採用した。一方、上記 (b)の アプローチに関しては、最初の 2分割についてのみ実施した。ただし、トレーニング用 の 99例分のデータを 2対 1の割合で 1250回ランダムに 2分割し、それを用いて主成 分分析とニューラルネットワークの学習を反復した。学習後、リンパ節転移の有無を 識別する感度に基づき遺伝子をランキングし、遺伝子を絞り込んだ。 2121個の遺伝 子力も開始し、以降 1536個、 768個、 384個、 192個、 96個、 48個、 24個の絞り込 み個数のそれぞれで学習を進めた。  [0057] With regard to (a), (c), and (d) of the above four approaches, the above data can be divided into two parts 100 times and analyzed for 100 different judgments. After identifying the genes, we selected genes that were identified many times. On the other hand, the approach (b) above was implemented only for the first two splits. However, the data for 99 cases for training were randomly divided into 2 parts at a ratio of 2: 1 1250 times, and the main component analysis and neural network learning were repeated using this. After learning, genes were ranked based on their sensitivity to identify the presence or absence of lymph node metastasis, and the genes were narrowed down. 2121 gene strengths have also begun, and learning has progressed with 1536, 768, 384, 192, 96, 48, and 24 refinements.
[0058] 以上の検討の結果、各アプローチごとに判別用遺伝子セットと判別式を確立するこ とができた。各遺伝子セットに含まれる遺伝子の個数と、確立された判別式を用いた テスト用データの正分類率 (判別式での判別結果と組織病理学的検査の結果が一 致した症例数 Zテスト用データ数 X 100 (%) )の平均値は、 (a)につ 、ては遺伝子 数が 144個で正分類率は 80.2% (標準偏差は 5.6%)、 (b)については遺伝子数が 192個で正分類率は(90.2%)、 (c)については遺伝子数が 133個で正分類率は 78 .6% (標準偏差は 6.2%)及び (d)については遺伝子数が 138個で正分類率は 86.3 % (標準偏差: 4.5%)であった。また、各アプローチで選択された遺伝子セットに共 通して含まれていた遺伝子が 16種類あった。 [0058] As a result of the above examination, a discriminant gene set and a discriminant were established for each approach. The number of genes included in each gene set and the correct classification rate of test data using established discriminants (the discriminant discriminant results and histopathological examination results are the same The average value of the number of Z-test data x 100 (%)) is (a), the number of genes is 144, the correct classification rate is 80.2% (standard deviation is 5.6%), (b ) Has 192 genes and the correct classification rate is (90.2%), (c) has 133 genes and the correct classification rate is 78.6% (standard deviation is 6.2%) and (d) The number of genes was 138, and the correct classification rate was 86.3% (standard deviation: 4.5%). In addition, there were 16 genes that were commonly included in the gene set selected by each approach.
[0059] 上記のように、 4つの異なるアプローチのそれぞれで、 80%を超える高!ヽ正分類率 を達成することができた。しかしながら、判別に使用する遺伝子の数はどのアブロー チでも 100種類を超えており、実際にアツセィ法として実用化するためには多すぎる と考えた。 [0059] As described above, each of the four different approaches was able to achieve a high! However, the number of genes used for discrimination exceeds 100, and it is considered that there are too many genes to be put into practical use as an assembly method.
[0060] そこで、正分類率を落とさな!/、ようにしつつ、判定に使用する遺伝子の数を絞り込 んで新たな判別ルールを確立することを考えた。そのために、まず対象とする遺伝子 を上記の各アプローチで選択された遺伝子セットに共通して含まれていた 16遺伝子 とした。次に、これら 16遺伝子各々の寄与(以下、「主効果」と記載する)に加えて、 2 遺伝子の交互作用も加味することに工夫した。個別の遺伝子による主効果だけでな ぐ遺伝子間の交互作用を含めたより広い範囲で判別ルールを探索することで、高い 判別性能を維持できることを期待した。  [0060] In view of this, it was considered to establish a new discrimination rule by narrowing down the number of genes used in the determination while reducing the normal classification rate! /. To that end, we first selected 16 genes that were included in the gene set selected by each of the above approaches. Next, in addition to the contribution of each of these 16 genes (hereinafter referred to as “main effect”), we devised to take into account the interaction of 2 genes. It was expected that high discrimination performance could be maintained by searching for discrimination rules in a wider range including interactions between genes as well as main effects of individual genes.
[0061] 交互作用探索のために、前述の解析で用いた 100通りのトレーニング用データの それぞれで、リンパ節転移の有無を応答とする CART解析を再度行った。その結果 、データの分割を指示する変数として登場した遺伝子の個数は、 1回の解析あたり 3 個から 5個であった。登場した遺伝子が例えば 3個の場合、とり得る遺伝子のペアは 3 通りあるため、それら全てのペアを交互作用として捉えた。同様にして、遺伝子が 4個 の場合は 6組、 5個の場合は 10組の各ペアを交互作用として捉えた。そして、 100通 りの解析のうち、 12回以上現れた 18組の遺伝子ペアを交互作用の候補として選択し た。  [0061] In order to search for interactions, CART analysis was performed again with the presence or absence of lymph node metastasis in each of the 100 training data used in the above analysis. As a result, the number of genes that appeared as variables to instruct data division was 3 to 5 per analysis. For example, if there are three genes, there are three possible gene pairs, so all these pairs were considered as interactions. In the same way, 6 pairs for 4 genes and 10 pairs for 5 genes were considered as interactions. Then, out of 100 analyzes, 18 gene pairs that appeared 12 times or more were selected as interaction candidates.
[0062] 次に判別式の確立のために、 150例のデータを用いて、 16遺伝子の主効果と 18 通りの交互作用を説明変数とし、リンパ節転移の有無を応答としたロジスティック回帰 モデルにおいてステップワイズの変数選択を行った。その際、回帰係数の有意性検 定の p値を、変数の組入れ基準 (0.05未満)及び除外基準 (0.05超)として用いた。 その結果、 6個の変数、すなわち、 G1592、 G3031、 G3826、 G4370、 G2645と G3177の 交互作用、 G3753と G3826の交互作用が選択された。 G1592、 G3031、 G3826、 G4370 、 G2645、 G3177及び G3753とは、本発明で使用した ColonoChip上の各プローブ(遺 伝子)に付与したシリアル番号である。そして、リンパ節転移の有無を判定するための 判別式は、 [0062] Next, in order to establish a discriminant, using the data of 150 cases, in a logistic regression model with the main effects of 16 genes and 18 interactions as explanatory variables and the presence or absence of lymph node metastasis as a response Stepwise variable selection was performed. At that time, the significance test of the regression coefficient Constant p-values were used as inclusion criteria (less than 0.05) and exclusion criteria (greater than 0.05). As a result, six variables were selected: G1592, G3031, G3826, G4370, G2645 and G3177 interaction, and G3753 and G3826 interaction. G1592, G3031, G3826, G4370, G2645, G3177 and G3753 are serial numbers assigned to each probe (gene) on the ColonoChip used in the present invention. And the discriminant for judging the presence or absence of lymph node metastasis is
D = 0.2307— 2.7132 X G1592の発現量  D = 0.2307— 2.7132 X G1592 expression level
+ 8.9509 X G3031の発現量  + 8.9509 X G3031 expression level
+ 8.7975 X G3826の発現量  + 8.7975 X G3826 expression level
— 2.3098 X G4370の発現量  — 2.3098 X G4370 expression level
+ 3.5126 02645の発現量 。3177の発現量  + 3.5126 02645 expression level. Expression level of 3177
-8.8226 X G3753の発現量 X G3826の発現量  -8.8226 X G3753 expression level X G3826 expression level
と推定され、 D>0のときリンパ節転移あり、 D≤0のときリンパ節転移なし、とする判別 ルールが導かれた。  It was estimated that there was a rule for lymph node metastasis when D> 0 and no lymph node metastasis when D≤0.
[0063] この判別式に登場した 7個の遺伝子、すなわち G1592、 G2645、 G3031、 G3177、 G3 753、 G3826、及び G4370をリンパ節転移の有無の識別に寄与する遺伝子のセットとし て選択した。それらの遺伝子名を表 1に記した。表 1中には上記の各遺伝子の RefSe qデータベースにおけるアクセス番号も併記した。 ReSeqデータベースには National し enter for Biotechnology Information (NCBI)の Web site (http://www.ncbi. nlm.nih.g ov/ReSeq/index.html)からアクセスすることができる。  [0063] Seven genes that appeared in this discriminant, namely G1592, G2645, G3031, G3177, G3 753, G3826, and G4370 were selected as a set of genes that contribute to the identification of the presence or absence of lymph node metastasis. Their gene names are listed in Table 1. In Table 1, the access numbers in the RefSeq database for each of the above genes are also shown. The ReSeq database can be accessed from the National enter for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/ReSeq/index.html).
[0064] [表 1]  [0064] [Table 1]
Figure imgf000022_0001
Figure imgf000022_0001
[0065] 最後に、選択した遺伝子セットによるリンパ節転移の判定性能を LOO法により評価 した。すなわち、 1サンプルを除いた残りの 149サンプルのデータを用いて、上記の 6 個の変数を含むロジスティック判別式を推定し、それによつて除 ヽたサンプルを判別 する操作を、 150サンプルのそれぞれで実施した。その結果を表 2に示した。表 2か ら、選択した遺伝子のセットによる正分類率は 88.7% (感度: 77.8%、特異度: 96.6 %)と推定された。 [0065] Finally, the evaluation of lymph node metastasis using the selected gene set was evaluated by the LOO method did. That is, using the remaining 149 sample data excluding one sample, the logistic discriminant including the above six variables is estimated, and the operation of discriminating the sample by that is performed for each 150 samples. Carried out. The results are shown in Table 2. From Table 2, the correct classification rate for the selected set of genes was estimated to be 88.7% (sensitivity: 77.8%, specificity: 96.6%).
[0066] [表 2] [0066] [Table 2]
Figure imgf000023_0001
産業上の利用可能性
Figure imgf000023_0001
Industrial applicability
[0067] 本発明により可能となるリンパ節転移の判定によって、症例に応じたよりよい治療方 針の選択が可能であり、また医療経済効果も期待できる。例えば、リンパ節転移の可 能性の高い症例に対しては積極的な治療を施行することによって予後の改善が期待 できるし、一方、リンパ節転移の可能性の低い症例に対しては、術後補助療法は軽 度のものにすることができ、患者の肉体的 ·経済的負担を軽くすることができる。  [0067] By the determination of lymph node metastasis enabled by the present invention, it is possible to select a better treatment policy according to the case and to expect a medical economic effect. For example, prognosis can be improved by aggressive treatment for patients with a high possibility of lymph node metastasis, while surgery is recommended for cases with a low possibility of lymph node metastasis. Post-adjuvant therapy can be mild and reduce the physical and economic burden on the patient.
[0068] さらに、本発明で開示される遺伝子セットに含まれる個々の遺伝子は、リンパ節転 移の原因として機能するものである可能性も推察されるため、これらの遺伝子及びそ の発現産物を標的とする薬剤を開発し、リンパ節転移を直接抑制できるようにするこ とも期待できる。  [0068] Furthermore, since it is speculated that individual genes included in the gene set disclosed in the present invention may function as a cause of lymph node migration, these genes and their expression products are used. It can also be expected to develop targeted drugs to directly suppress lymph node metastasis.

Claims

請求の範囲 [1] 下記(1)〜(4)の工程を含む、大腸癌リンパ節転移の有無を予測するための遺伝 子セットの選択方法: Claims [1] A method of selecting a gene set for predicting the presence or absence of colorectal cancer lymph node metastasis, comprising the following steps (1) to (4):
(1)組織病理学的判定によりリンパ節転移の有無が明らかにされた患者の大腸癌原 発巣組織における遺伝子発現情報を、教師あり学習解析方法を少なくとも一つ含む (1) At least one supervised learning analysis method for gene expression information in the primary colorectal cancer tissue of patients whose presence or absence of lymph node metastasis was revealed by histopathological determination
、 4以上の解析方法で解析することにより、リンパ節転移の有無を正分類率 75%以上 で分類できる遺伝子群をそれぞれの解析方法において選定する工程、 Selecting a group of genes in each analysis method that can classify the presence or absence of lymph node metastasis with a positive classification rate of 75% or more by analyzing with 4 or more analysis methods,
(2) (1)で用いたそれぞれの解析方法で選定された遺伝子群から、何れの解析方法 でも共通して選定された共通遺伝子を選択する工程、  (2) a step of selecting a common gene selected in common by any analysis method from the gene group selected by each analysis method used in (1),
(3)前記遺伝子発現情報を解析することにより、任意の 2以上の遺伝子の組合せの 中から、リンパ節転移の有無の分類を指示し、交互作用を示す遺伝子の組合わせを 選択する工程、及び  (3) analyzing the gene expression information, instructing classification of the presence or absence of lymph node metastasis from any combination of two or more genes, and selecting a combination of genes exhibiting an interaction; and
(4)前記共通遺伝子及び前記遺伝子の組合わせを説明変数として、リンパ節転移の 有無を応答としたロジスティック回帰モデルにおける変数選択を行う工程。  (4) A step of selecting a variable in a logistic regression model in which the combination of the common gene and the gene is used as an explanatory variable and the presence or absence of lymph node metastasis is used as a response.
[2」 (1)の解析万 力 (a) Support Vector Machine^ (b) Principal Component Analysis  [2] (1) Analysis power (a) Support Vector Machine ^ (b) Principal Component Analysis
Artincial Neural Networkの拡張法、 (c) Hierarchical Cluster Analysisと Stepwise Log istic Discriminationの糸且合せ及び (d) Classification And Regression Treeと Logistic D iscriminationの組合せよりなる群から選択されるものを少なくとも一つ含むものである 、請求項 1に記載の方法。  It includes at least one selected from the group consisting of (c) Hierarchical Cluster Analysis and Stepwise Logistic Discrimination and (d) Classification And Regression Tree and Logistic Discrimination. The method of claim 1.
[3] (3)の解析方法力 S、 (d) Classification And Regression Treeと Logistic Discriminatio nの組合せである、請求項 1または 2に記載の方法。  [3] The method according to claim 1 or 2, wherein the analysis method power S of (3) is a combination of (d) Classification And Regression Tree and Logistic Discriminatio n.
[4] (4)の変数選択の方法が、ステップワイズの変数選択法である、請求項 1な!、し 3の いずれかに記載の方法。  [4] The method according to any one of claims 1 and 3, wherein the variable selection method in (4) is a stepwise variable selection method.
[5] 請求項 1ないし 4のいずれかに記載の方法により選択される、大腸癌リンパ節転移 の有無を予測するための遺伝子セット。  [5] A gene set for predicting the presence or absence of colorectal cancer lymph node metastasis, which is selected by the method according to any one of claims 1 to 4.
[6] 少なくとも NM— 003404 (G1592)、 NM— 002128 (G2645)、 NM— 052868 (G3031)、 NM— 00 [6] At least NM—003404 (G1592), NM—002128 (G2645), NM—052868 (G3031), NM—00
5034 (G3177)、 NM— 001540 (G3753)、 NM.005722 (G3826)、及び NM— 015315 (G4370 )のデータベースのアクセス番号 (シリアル番号)で表される遺伝子を含む、請求項 5 に記載の遺伝子セット。 50. (G3177), NM—001540 (G3753), NM.005722 (G3826), and NM—015315 (G4370), which contains the gene represented by the access number (serial number) of the database. The gene set described in.
請求項 5または 6の何れかに記載の遺伝子セットを用いることを特徴とする大腸癌リ ンパ節転移の有無を予測するための方法。  A method for predicting the presence or absence of colon cancer lymph node metastasis, wherein the gene set according to claim 5 or 6 is used.
下記の判別式を用いることを特徴とする請求項 7記載の方法:  8. The method of claim 7, wherein the following discriminant is used:
D = 0.2307— 2.7132 X NM— 003404 (G1592)の発現量 D = 0.2307— 2.7132 X NM— 003404 (G1592) expression level
+ 8.9509 X NM— 052868 (G3031)の発現量 + 8.9509 X NM— 052868 (G3031) expression level
+ 8.7975 X NM.005722 (G3826)の発現量 + 8.7975 X NM.005722 (G3826) expression level
2.3098 X NM— 015315 (G4370)の発現量  2.3098 X NM— 015315 (G4370) expression level
+ 3.5126 X NM.002128 (G2645)の発現量 X NM— 005034 (G3177)の発現量  + 3.5126 X NM.002128 (G2645) expression level X NM—005034 (G3177) expression level
-8.8226 X NM— 001540 (G3753)の発現量 X NM.005722 (G3826)の発現量 -8.8226 X NM— Expression level of 001540 (G3753) X Expression level of NM.005722 (G3826)
(D>0のときリンパ節転移あり、 D≤0のときリンパ節転移なし、と判別する)。 (When D> 0, lymph node metastasis is determined, and when D≤0, no lymph node metastasis is determined).
PCT/JP2006/315143 2005-08-01 2006-07-31 Gene set for use in prediction of occurrence of lymph node metastasis of colorectal cancer WO2007015459A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005222995A JP2007037421A (en) 2005-08-01 2005-08-01 Gene set for predicting the presence or absence of colon cancer lymph node metastasis
JP2005-222995 2005-08-01

Publications (1)

Publication Number Publication Date
WO2007015459A1 true WO2007015459A1 (en) 2007-02-08

Family

ID=37708739

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2006/315143 WO2007015459A1 (en) 2005-08-01 2006-07-31 Gene set for use in prediction of occurrence of lymph node metastasis of colorectal cancer

Country Status (2)

Country Link
JP (1) JP2007037421A (en)
WO (1) WO2007015459A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012107786A1 (en) 2011-02-09 2012-08-16 Rudjer Boskovic Institute System and method for blind extraction of features from measurement data
CN112948687A (en) * 2021-03-25 2021-06-11 重庆高开清芯智联网络科技有限公司 Node message recommendation method based on name card file characteristics
CN113436684A (en) * 2021-07-02 2021-09-24 南昌大学 Cancer classification and characteristic gene selection method

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2436763B1 (en) 2009-05-27 2018-07-04 National University Corporation Akita University Method for assessing lymph node metastasis of cancer or the risk thereof, and rapid assessment kit for said method
WO2018088826A2 (en) * 2016-11-09 2018-05-17 한국생명공학연구원 Composition for diagnosing colorectal cancer metastasis or predicting prognosis and use thereof
CN108492884A (en) * 2018-02-08 2018-09-04 浙江大学 Pancreatic Neuroendocrine Tumors lymphatic metastasis forecasting system based on Logistic regression models
KR102605248B1 (en) * 2021-06-09 2023-11-23 사회복지법인 삼성생명공익재단 Metastasis of lymph node predicting method using image of endoscopically resected specimens in early colorectal cancer and analysis apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004033082A (en) * 2002-07-02 2004-02-05 Ichiro Takemasa Gene set for prediction of liver metastasis of large intestinal tumor
WO2005028676A2 (en) * 2003-09-24 2005-03-31 Oncotherapy Science, Inc. Method of diagnosing breast cancer
WO2005054508A2 (en) * 2003-12-01 2005-06-16 Ipsogen Gene expression profiling of colon cancer by dna microarrays and correlation with survival and histoclinical parameters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004033082A (en) * 2002-07-02 2004-02-05 Ichiro Takemasa Gene set for prediction of liver metastasis of large intestinal tumor
WO2005028676A2 (en) * 2003-09-24 2005-03-31 Oncotherapy Science, Inc. Method of diagnosing breast cancer
WO2005054508A2 (en) * 2003-12-01 2005-06-16 Ipsogen Gene expression profiling of colon cancer by dna microarrays and correlation with survival and histoclinical parameters

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
BOULESTEIX A.-L. ET AL.: "A CART-based approach to discover emerging patterns in microarray data", BIOINFORMATICS, vol. 19, no. 18, 2003, pages 2465 - 2472, XP003008212 *
COFFEY C.S. ET AL.: "An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene interactions on risk of myocardial infarctions: the importance of model validation", BMC BIOINFORMATICS, vol. 5, 2004, pages 49, XP021000631 *
DATABASE BIOSIS [online] ARTINYAN A. ET AL.: "Increased risk of lymph node metastasis in T3 colon cancers with decreased thymidylate synthase expression", XP003008214, Database accession no. (2006:211136) *
GASTROENTEROLOGY, vol. 128, no. 4, SUPPL. 2, April 2005 (2005-04-01), pages A784 *
KOMORI T. ET AL.: "Gene expression profiling for the prediction of lymph node metastasis in colorectal cancer", THE MOLECULAR BIOLOGY SOCIETY OF JAPAN, vol. 27, 2004, pages 1019, XP003008211 *
KOMURO K. ET AL.: "Right- and left-sided colorectal cancers display distinct expression profiles and the anatomical stratification allows a high accuracy prediction of lymph node metastasis", J. SURGICAL RES., vol. 124, April 2005 (2005-04-01), pages 216 - 224, XP004835905 *
KUNIYASU H. ET AL.: "Co-expression of receptor for advanced lycation end products and the ligand amphoterin associates closely with metastasis of colorectal cancer", ONCOLOGY REPORTS, vol. 10, no. 2, 2003, pages 445 - 448, XP003008217 *
MASAKI T. ET AL.: "Expression of adhesion molecules as a novel risk factor for lymph node metastasis or local recurrence in early invasive (T1) colorectal carcinomas", JOURNAL OF THE JAPAN SOCIETY OF COLOPROCTOLOGY, vol. 55, no. 10, 2002, pages 858 - 866, XP003008215 *
ORIAN-ROUSSEAU V. ET AL.: "Genes upregulated in a metastaizing human colon carcinoma cell line", INT. J. CANCER, vol. 113, February 2005 (2005-02-01), pages 699 - 705, XP003008216 *
OTSUBO T. ET AL.: "Involvement of Arp2/3 complex in the process of colorectal carcinogenesis", MODERN PATHOLOGY, vol. 17, 2004, pages 461 - 467, XP003008218 *
YANG X. ET AL.: "The relationship between expression of c-ras, c-erbB-2, nm23, and p53 gene products and development of trophoblastic tumor and their predictive significance for the malignant transformation of complete hydatidiform mole", GYNECOLOGIC ONCOLOGY, vol. 85, 2002, pages 438 - 444, XP003008213 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012107786A1 (en) 2011-02-09 2012-08-16 Rudjer Boskovic Institute System and method for blind extraction of features from measurement data
CN112948687A (en) * 2021-03-25 2021-06-11 重庆高开清芯智联网络科技有限公司 Node message recommendation method based on name card file characteristics
CN113436684A (en) * 2021-07-02 2021-09-24 南昌大学 Cancer classification and characteristic gene selection method

Also Published As

Publication number Publication date
JP2007037421A (en) 2007-02-15

Similar Documents

Publication Publication Date Title
Hicks et al. Novel patterns of genome rearrangement and their association with survival in breast cancer
JP6464316B2 (en) Genetic marker for discrimination and detection of aquatic product infectious disease-causing virus, and method for discriminating and detecting the virus using the same
TWI582236B (en) Prognosis prediction for melanoma cancer
US20130023437A1 (en) Diagnostic for lung disorders using class prediction
JP7005596B2 (en) Detection of chromosomal interactions associated with breast cancer
WO2007015459A1 (en) Gene set for use in prediction of occurrence of lymph node metastasis of colorectal cancer
KR20200035427A (en) Augmentation of cancer screening using cell-free viral nucleic acids
JP2021531016A (en) Cell-free DNA damage analysis and its clinical application
WO2008118839A1 (en) Exon grouping analysis
CN107735500A (en) For detecting the grand genome composition and method of breast cancer
WO2023093782A1 (en) Molecular analyses using long cell-free dna molecules for disease classification
KR20230038263A (en) Nuclease-related end characterization of cell-free nucleic acids
US20220098677A1 (en) Method for determining rcc subtypes
JP2020068673A (en) Oral cancer determination device, oral cancer determination method, program and oral cancer determination kit
Terp et al. Extraction of cell-free DNA: evaluation of efficiency, quantity, and quality
CN101457254B (en) Gene chip and kit for liver cancer prognosis
JP4229647B2 (en) Gene set for predicting liver metastasis of colorectal cancer
WO2023021978A1 (en) Method for examining autoimmune disease
CN106414775A (en) Compositions and methods for metagenome biomarker detection
JP5866669B2 (en) Breast cancer susceptibility determination method
CN107385089A (en) A kind of Nucleic acid combinations, kit and method for detecting Cryptosporidium parvum oocysts suspended
WO2013157215A1 (en) Method for assessing endometrial cancer susceptibility
JP2004350576A (en) Kit for detecting bladder cancer
JP2024527370A (en) Circulating microRNA signatures for pancreatic cancer
WO2009157251A1 (en) Method of diagnosing integration dysfunction syndrome

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06782021

Country of ref document: EP

Kind code of ref document: A1