WO2010026821A1

WO2010026821A1 - Method for discrimination between bipolar disorder and schizophrenia

Info

Publication number: WO2010026821A1
Application number: PCT/JP2009/061158
Authority: WO
Inventors: 秀幸青島; 一男竹村; 健太朗飯嶋; 浩志林
Original assignee: 株式会社エスアールエル
Priority date: 2008-09-03
Filing date: 2009-06-19
Publication date: 2010-03-11
Also published as: JP2010057407A

Abstract

Disclosed is a novel means which can diagnose schizophrenia and bipolar disorder distinctively, objectively and with high accuracy by using blood collected from a patient as a sample. Specifically disclosed is a method for discriminating between schizophrenia and bipolar disorder, which utilizes the expression levels of specific 12 genes in a sample isolated from a living body as measures. The method enables the distinctive, objective and highly accurate diagnosis of schizophrenia and bipolar disorder. It is confirmed actually by using multiple samples that both the sensitivity (true positive rate) and the degree of specificity (true negative rate) of the detection are 80% or more. The method utilizes blood as a sample, and therefore can be practiced conveniently.

Description

How to distinguish between bipolar disorder and schizophrenia

The present invention relates to a method for discriminating between bipolar disorder and schizophrenia.

The incidence of schizophrenia in Japan is about 0.8% of the population, and it occurs mainly in adolescence. The prognosis for the disease varies. About 1/3 of all patients see significant and continuous improvement. 1/3 improves somewhat, but leaves intermittent recurrence and residual disability. The remaining 1/3 is a serious mental illness that is severely impaired and permanently impairs social functioning.

Bipolar disorder (manic-depressive illness) is a mental illness that has a high incidence along with schizophrenia, and repeats a manic state and a depressive state. The lifetime prevalence of bipolar disorder is said to be 0.2-1.6%, often recurring, and is often said to require lifelong drug treatment.

∙ It is important that both bipolar disorder and schizophrenia be treated appropriately at an early stage. Traditionally, the diagnosis of these mental disorders is based on a comprehensive evaluation by DSM-IV (Diagnostic and Manual Manual of Mental Disorders-IV), a diagnostic and statistical manual for mental disorders established by the American Psychiatric Association (APA). Be defeated. However, such a method greatly depends on the subjectivity and skill of the diagnostician. Bipolar mania is similar to positive symptoms of schizophrenia and bipolar depression is similar to negative symptoms of schizophrenia, so it is sometimes difficult to diagnose the disease objectively and early. Yes, there are many cases that become severe without appropriate treatment.

If an objective diagnostic method using biological markers for schizophrenia or bipolar disorder is established, early diagnosis and early treatment will be possible, and it will be possible to avoid the severity and improve the cure rate. Examples of diagnostic methods using biological markers that have been reported so far include a method of diagnosing schizophrenia (schizophrenia) using serum concentration of epidermal growth factor as an index (Patent Document 1). Alternatively, there is a method using blood as a sample and using the expression level of a specific gene as an index (Patent Document 2). However, these methods cannot accurately diagnose and diagnose schizophrenia and bipolar disorder.

Japanese Patent No. 3706913 JP 2004-135667 A

Therefore, an object of the present invention is to provide means capable of objectively diagnosing schizophrenia or bipolar disorder with high accuracy using patient blood as a sample.

The inventors of the present application use blood as a sample, compare the expression levels of about 55,000 types of genes between bipolar patients and schizophrenia patients, select genes whose expression levels vary significantly, and further described later A considerable number of classification prediction candidate genes are narrowed down according to the criteria that the inventors have independently devised, and a low-cost and highly versatile microarray equipped with these genes is created, and the expression data measured using this gene is used as a neural network. The classification prediction algorithm is constructed by the variable increase method and the cross-validation method making full use of, and the detection sensitivity (true positive rate) and specificity (true negative rate) are actually over 80% using a large number of samples. The present invention was completed by finding out.

That is, the present invention provides a method for discriminating between bipolar disorder and schizophrenia using the expression levels of the following gene groups (1) to (12) in a sample isolated from a living body as an index.
(1) FLJ21881 (SEQ ID NO: 1)
(2) DLGAP3 (SEQ ID NO: 2)
(3) FAM20A (SEQ ID NO: 3)
(4) MAX (SEQ ID NO: 4)
(5) ZNF74 (SEQ ID NO: 5)
(6) DIAPH2 (SEQ ID NO: 6)
(7) CR1 (SEQ ID NO: 7)
(8) RAD54B (SEQ ID NO: 8)
(9) GPR30 (SEQ ID NO: 9)
(10) SCD5 (SEQ ID NO: 10)
(11) IMAGE: 5785888 (SEQ ID NO: 11)
(12) INSL3 (SEQ ID NO: 12)

The present invention provides for the first time a means capable of objectively diagnosing either bipolar disorder or schizophrenia with high accuracy.

It is a figure which shows the relationship between the number of probes and the correct answer rate in the classification | category prediction model by a neural network performed in the Example of this invention. It is a figure which shows the dependent variable of each sample of the learning example calculated | required by multiple regression analysis, and a test example.

As described above, the present invention uses the expression level of the gene groups (1) to (12) as an index. The sample for measuring the expression level of each gene is not particularly limited as long as it is a sample isolated from a living body, but as described in detail in the following examples, the gene group is selected using blood as a sample. Therefore, it is preferable to use blood as a sample. In addition, the gene group includes those whose expression level is increased and decreased in patients with schizophrenia than those with bipolar disorder. Further, in the following examples, determination can be made based on the expression levels of only the above 12 types of genes that have been confirmed to have a detection sensitivity (true positive rate) and specificity (true negative rate) of 80% or more. preferable. As specifically described in the examples below, it is preferable to simultaneously measure the expression levels of other genes such as various genes for normalization in order to ensure measurement accuracy. Yes, “based on the expression level of only the 12 types of genes” means that the expression level of only the 12 types of genes is used as a direct variable for classification prediction. The true positive rate representing sensitivity is a / (a + b) in Table 1 below, and the true negative rate representing specificity is 1-false positive = d / (c + d). The correct answer rate is (a + d) / (a + b + c + d).

The sequences of the above 12 types of genes are described in each of the above SEQ ID Nos. The GenBank accession number of each gene, the gene product, and the probe used for measuring the expression level of each gene used in the following Examples The numbers and their sequence numbers are shown in Table 2 below.

Measurement of the expression level of each gene in the sample itself can be performed by a known method. The measurement method is not particularly limited, but a method using a single-stranded oligonucleotide probe that hybridizes to the sense strand or antisense strand of each gene, preferably a DNA array on which a DNA probe is immobilized, is simple and preferred. For example, as specifically described in the Examples below, oligonucleotide probes that extract total mRNA from blood, prepare cRNA labeled with biotin from the extracted mRNA, and hybridize with cRNA derived from each gene. CRNA is applied to the immobilized array, the cRNA and probe are hybridized, the array is washed, and the amount of label remaining on the substrate is measured to determine the amount of cRNA, and hence the amount of mRNA, that is, the gene expression level. Can be measured.

The probe to be immobilized has a size that specifically hybridizes with cRNA, and usually has a size of about 18 to 50 bases, preferably about 20 to 40 bases. In addition, the probe to be immobilized is preferably completely complementary to the region of RNA to which it is hybridized, but the normal hybridization when using a DNA array as specifically described in the following examples. A small number (usually 1 or 2) of mismatches is acceptable as long as it hybridizes under the conditions. Therefore, even when a natural SNP occurs in a gene, it can be measured using the same DNA array.

In the following examples, the expression level of the gene group is SEQ ID NO: 18, SEQ ID NO: 44, SEQ ID NO: 61, SEQ ID NO: 90, SEQ ID NO: 97, SEQ ID NO: 120, SEQ ID NO: 125, SEQ ID NO: 161, SEQ ID NO: 167, sequence It is measured using oligonucleotide probes having the nucleotide sequences shown in No. 195, SEQ ID No. 218 and SEQ ID No. 220, and a DNA array on which these probes are immobilized can be preferably used. In addition, according to the measured values using these probes, 10 out of 12 probes have a significant difference (p <0.05, t test) in the expression level between schizophrenia and healthy subjects. Met.

The determination based on the expression level of the gene group is basically performed by comparing the expression level of the gene group with the expression level of the gene group in known schizophrenia patients and bipolar disorder patients. Done. This comparison is preferably performed by a neural network trained by the variable increment method using the expression level of the gene group in known schizophrenia patients and bipolar disorder patients. Input the expression levels of the above 12 genes measured in the constructed learned neural network (construction method will be described later), output the prediction probability of the group classified into the neural network, and use this prediction probability as a criterion Whether schizophrenia or bipolar disorder can be determined.

Alternatively, the above comparison is preferably performed by multiple regression analysis. Using the measured expression levels of the 12 genes as explanatory variables, and performing multiple regression analysis of the expression levels of the genes in known schizophrenia patients and bipolar disorder patients, a prediction formula (multiple regression formula) can be obtained. it can. In the obtained prediction formula, the expression level of the above gene group in the subject whose disease should be determined is input to obtain a dependent variable, and the value of this dependent variable is calculated as the dependent of known schizophrenic patients and bipolar disorder patients. By comparing with the variable, it can be determined whether the subject is schizophrenia or bipolar disorder. This contrast is, for example, based on the dependent variable calculated for each patient of the known schizophrenia patient group and bipolar disorder patient group, the value of the dependent variable that can preferably classify both groups as a cutoff value, This can be done by comparing the subject's dependent variable with this cutoff value. For example, if the expression level is analyzed in a patient with schizophrenia and the dependent variable is set to be large, if the numerical value of the dependent variable calculated for the subject is greater than the cutoff value, the subject is integrated. Can be predicted to be ataxia. The cut-off value can be appropriately determined by routine statistical processing based on the dependent variables calculated for known schizophrenia patients and bipolar disorder patients. The technique of multiple regression analysis itself is well known, and various software and the like for performing multiple regression analysis are known, and there are many commercially available products. Any software may be used in the present invention. The prediction formula can be determined once the analysis for the known patient is performed. Therefore, it is not necessary to perform the analysis for the known patient group every time it is performed. Can also be used.

In the present invention, in the case of “multiple regression analysis”, an analysis method including a step of obtaining a dependent variable of a sample using the obtained multiple regression equation is widely included, and an analysis step for obtaining a multiple regression equation includes Not necessarily included. Therefore, as described above, any method for discriminating a disease using the already obtained multiple regression equation is included in the “discrimination method for performing comparison by multiple regression analysis” in the present invention.

The measurement value of the expression level used in the present invention is preferably a value obtained by normalizing the measured signal intensity by a global normalization method as described in the following examples. Here, the global normalization method is a method of calculating the relative expression level by calculating the median value of the expression levels of all genes mounted on the DNA microarray and dividing the expression level of each gene by this median value. .

When performing the method of the present invention using a neural network, the neural network itself is well known and a commercially available neural network can be used. However, although the neural network itself can use a commercial product, in the present invention, there is a feature in the data to be learned by the neural network, and sensitivity (true positive rate) and specificity (true negative rate) can be obtained by learning any data. It is necessary to devise whether both can be increased to 80% or more (described later).

An optimal model of a classification prediction model using a neural network can be constructed by a method detailed in the following example, for example. Briefly, for example, the optimum model can be determined as follows. First, the expression levels of various genes are measured using samples collected from a large number of schizophrenia patients, bipolar disorder patients, and healthy individuals. The expression level of the gene can be performed using a DNA microarray as described above. In the following examples, a commercially available DNA microarray equipped with DNA probes of about 55,000 types of human genes was used.

Next, data cleansing is performed on the expression level measured using a DNA microarray. Here, the data cleansing can be performed, for example, by excluding probes of genes less than 30% tile or 98% tile or more of the entire expression level.

Divide the DNA microarray data of a large number of schizophrenia patients and bipolar disorder patients into learning examples and learning examples independent of the learning examples. It was constructed by the Hold-out-cross-validation method in which the degree of sensitivity and specificity was achieved and calculated using test examples. The classification prediction model was constructed by changing the parameters of the neural network, ensuring the independence of the samples assigned to the learning examples and test examples, and continuing the verification, and adopting the model with the best results.

First, prepare the learning data to be learned by the neural network. In the following examples, probes other than Quality Flag “Good”, probes of genes located on the Y chromosome, probes set distally from the mRNA 3 ′ end, etc. are excluded, and 10,498 from about 55,000 probes. Narrow down to the probe. Here, the quality “Flag” being “Good” means that the measured expression level is larger than 1.5SD of the background around the spot and can be trusted as the measurement value. Further, since the gene located on the Y chromosome is present only in males, it was excluded because the sensitivity and / or specificity of detection might be lowered when females were examined. Probes set distal from the mRNA 3 'end are excluded because they are subject to bias in the preparation of cRNA and are a significant variation in the measured values. Furthermore, preliminary analysis excluded those with a missing value of 25% or more, those with a large difference in expression between men and women, and those with a large difference between batches during array production.

Next, the expression level of the gene derived from the RNA hybridized with each probe, measured for each probe selected in this way, is input to the neural network, and the two-group test (t test), that is, the learning example A significant difference test (t-test) is performed between schizophrenia (unmedicated) and bipolar disorder groups. In the following examples, a significant difference test is performed between the schizophrenia group and the bipolar disorder group, between the schizophrenia group and the healthy person group, and between the bipolar disorder group and the healthy person group. A total of 216 probes with squeezed out. In the following examples, samples were also narrowed down. That is, the median value of 56 healthy subjects was calculated for each probe, the correlation of each sample was examined for the data set, and the parameters of the approximate curve and the signal intensity ratio greatly separated were excluded from the analysis target.

プローブ Select a probe for which a significant difference is recognized by the significant difference test by the forward selection method. The variable increasing method itself is well known, and is performed by adding explanatory variables (measurement results of each probe) one by one and obtaining a combination having a high correlation with the objective variable (correct answer rate). Using a neural network installed on the computer, the variable increase method is performed, and the number of probes with the highest correct answer rate is selected. At this time, it is preferable to perform the selection by combining N-fold cross validation methods. The cross-validation method itself is also well known. In N-fold cross validation, data (measurement results of each probe) is divided into N subsets of approximately the same size, and neural network learning (training) is performed a total of N times while excluding one subset. In the following example, the data set of the learning example is divided into three subsets, and the neural network performs classification prediction using the probes with significant differences one by one while switching the data sets. The combination of was identified.

Here, the inventors of the present application produced a practical microarray in which the 216 types of probes are mounted on a substrate as a low-cost microarray for practical use. The practical array was also equipped with a probe used for global normalization and a management probe (for alignment) (detailed examples below). The global normalization was selected with small variation between arrays. In this practical array, a plurality of chambers (16 in the following embodiment) can be formed on a single substrate, that is, 16 samples can be simultaneously tested with a single array. A DNA microarray equipped with about 55,000 types of probes is expensive, and only one specimen can be processed with one microarray. However, according to the practical array, the cost of preparing the array and the cost and labor of testing are greatly reduced. Can do.

Using the measured values using the above practical array, using the neural network installed in the computer, using the cross-validation method and the variable increase method as described above, the combination of probes with the highest correct answer rate is selected. Asked. As a result, the above 12 genes were identified.

When the measured values of the above 12 kinds of gene probes are input to the learned neural network as explanatory variables and the sensitivity and specificity are calculated for the test examples, the sensitivity and specificity are 80 for both schizophrenia and bipolar disorder. It was confirmed that schizophrenia and bipolar disorder can be discriminated with high sensitivity and high specificity.

In addition, when multiple regression analysis was performed using the measured values of the above 12 kinds of gene probes as explanatory variables, and sensitivity and specificity were calculated for the test examples, both schizophrenia and bipolar disorder were sensitive. And the specificity exceeded 80%. Multiple regression analysis also confirmed that schizophrenia and bipolar disorder can be distinguished with high sensitivity and high specificity using the above 12 gene expression levels.

The method of the present invention can be preferably carried out to examine whether a patient is suspected of having psychiatric disorder, particularly schizophrenia or bipolar disorder, whether it is schizophrenia or bipolar disorder. For example, when the method of the present invention is used in combination with a method for detecting schizophrenia (that is, a method for determining whether schizophrenia is a healthy person), a more accurate diagnosis is possible. Various detection methods for schizophrenia are known (see, for example, the above-mentioned patent document).

In the present invention, “having a base sequence” means that the bases are arranged in such an order. Therefore, for example, “an oligonucleotide probe having the base sequence represented by SEQ ID NO: 18” means an oligonucleotide probe having a base sequence of attttgcctt cacataccag acatgagaca represented by SEQ ID NO: 18 and having a size of 30 bases.

Hereinafter, the present invention will be described more specifically based on examples.

1. Narrowed blood sampling and sample storage Schizophrenia patients Antipsychotics 58 untreated groups, 56 healthy subjects, 41 bipolar patients, PAXgene Blood RNA Kit (Qiagen, Valencia, CA, USA) The blood was collected and RNA was extracted. Two PAXgene Blood RNA Tubes were collected 2.5 ml each, mixed by inversion, frozen, and transported to the laboratory. Storage was -80 ° C.

RNA extraction PAXgene Blood RNA Tubes stored at -80 ° C were thawed at room temperature and total RNA was extracted according to the manufacturer's instructions. The extracted total RNA was stored at -80 ° C.

Confirmation of concentration and quality of extracted RNA Extracted total RNA was diluted 50-fold with 10 mM Tris-HCl (pH 7.5), absorbance at 230, 260, and 280 nm was measured, and the concentration of total RNA was measured. The quality of the extracted RNA was confirmed with an Agilent 2100 bioanalyzer (Agilent Technologies, Inc. Santa Clara, CA, USA).

Preparation of cRNA cRNA was prepared using 0.5 μg of extracted total RNA. Biotin-labeled cRNA was prepared using an iExpress kit (GE Healthcare Bioscience, Chandler, CA, USA) according to the manufacturer's instructions.

The quantification and quality confirmation of the prepared cRNA were performed in the same manner as the quantification and quality confirmation of the extracted total RNA. That is, the absorbance of 230, 260, and 280 nm of cRNA solution diluted 50 times was measured, and the concentration of total RNA was measured. The quality of the cRNA was confirmed using an Agilent 2100 bioanalyzer.

Hybridization and Washing to Array As a microarray, Codelink ™ 55K Bioarray (GE Healthcare Bioscience) was used. Codelink (trademark) 55K Bioarray is coated with acrylamide with special chemical modification on the surface of glass slide, and 30mer probe is fixed three-dimensionally. It is an excellent microarray, and probes corresponding to about 55000 genes in humans are immobilized.

10 μg of cRNA was prepared with RNase-Free H ₂ O to a final volume of 20 μl, 5 μl of 5 × Fragmentation Buffer of iExpress kit was added, and then incubated at 94 ° C. for 20 minutes to fragment the cRNA.

10 μg of fragmented cRNA (25 μl), 78 μl of iExpress® kit Hybridization Buffer A, and 130 μl of iExpress kit Hybridization Buffer B were mixed to prepare a total of 260 μl. After incubating at 90 ° C. for 5 minutes, it was incubated on ice for 5-30 minutes.

250 μl of hybridization solution was injected into the chamber of CodeLink ™ 55K Bioarray (GE Healthcare Bioscience, Chandler, CA, USA) and using a CodeLink ™ INNOVA shaker (GE Healthcare Bioscience, Chandler, CA, USA) The array was incubated for 18-24 hours at 37 ° C. with swirling at 300 rpm.

The array was fixed using Hybridization® Removal Tool, the hybridization chamber was peeled off, and the array was set on Bioarray® Rack. The Bioarray® Rack with the array set was transferred to a Large® Reagent reservoir containing 0.75 × TNT® Buffer at 46 ° C. and incubated at 46 ° C. for 1 hour.

The Bioarray® Rack was transferred to a Small® Reagent reservoir filled with 3.4 μml of Streptavidin-Cy5 diluted solution and incubated at room temperature for 30 minutes. After staining, the Bioarray Rack was transferred to a Large Reagent reservoir filled with 240 ml of 1 × TNT Buffer, and washed by repeating the operation of incubating at room temperature for 5 minutes 4 times. Next, the Bioarray® Rack was transferred to a Large® Reagent reservoir filled with 0.1 × SSC / 0.05% Tween-20, washed for 30 seconds, the array was centrifuged and dried, and then stored in the dark until scanning.

Array Scanning The washed and dried arrays were scanned with an Agilent Scanner (Agilent Technologies, Santa Clara, CA, USA). The scanner settings were Red PMT [%] 70%, Dye Channel Red (Red is Cy5). The other settings are the default. The scanned array data was saved as a TIF file and digitized.

Digitization of array data According to the manufacturer's instructions, CodeLink (trademark) Expression Analysis was used to digitize array data saved in TIF files and normalize by global normalization.

Narrowing down probes From the experimental results obtained above, probes other than Quality Flag “Good”, probes located on the Y chromosome, probes set distal from the mRNA 3 ′ end, and the like were excluded. Furthermore, preliminary analysis also excluded those with a missing value of 25% or more, those with a large difference in the expression level between men and women, and those with a large difference between batches during array production. These narrowed down from about 55,000 probes to 10,498 probes.

2. Statistical processing Based on the above-mentioned conditions, schizophrenic patients, antipsychotics, 43 untreated groups and 38 healthy subjects, significant differences between the 2 groups, and 32 bipolar patients and schizophrenia Patients who have a statistically significant difference between 40 specimens of the antipsychotic drug untreated group, and a probe that has a statistically significant difference between 32 specimens of bipolar disorder patients and 38 specimens of healthy individuals 216 were extracted. The sequences of these probes are shown in SEQ ID NOs: 13 to 228 in the sequence listing. In addition, Table 3 below shows the gene name and GenBank Accession No. from which each probe is derived.

3. Designing a practical array CodeLink (trade name) 55K Bioarray is very expensive, and only one sample can be processed and analyzed. In order to put it to practical use, a microarray that can be analyzed at a lower cost is required. Therefore, the same surface treatment as CodeLink 55K Bioarray is applied, and this is divided into 16 chambers, so that CodeLink (trade name) 16-Assay can process and analyze up to 16 samples at a time. A practical array based on Bioarray (Applied Microarrays) was designed. In addition to the 216 gene probes, probes used for global normalization (SEQ ID NOs: 229 to 527) and management probes by manufacturers were added to design the following array.

CodeLink (Product Name) 16-Assay Bioarray Probe Breakdown
(Total 1714 spots / chamber)
-Classification prediction candidate probe: 216 probes x 4 spots-Additional probe for normalization: 299 probes x 2 spots-Management probe by the manufacturer: 96 probes x 1 spot each- Standard reserved probe: Grid (32), Positive Control (60), Negative Control (64)

4. CodeLink (trade name) 16-Assay Bioarray classification prediction based on measurement results (neural network)
Based on gene expression information of 60 untreated schizophrenic patients (schizophrenic patients antipsychotic untreated group) and 48 bipolar disorder patients, we tried to construct a classification prediction model by neural network with excellent classification prediction It was.

When constructing a classification prediction model, 40 untreated schizophrenia patients and 32 bipolar disorder patients were used as learning examples to construct a classification prediction model, and the remaining samples that were not involved in the model were tested. Validation was performed by the Hold out cross validation method, which evaluates the model as 20 untreated schizophrenia patients and 16 bipolar disorder patients.

(1) Acquisition of gene expression information Using the above practical array including 216 probes, gene expression information of 60 untreated schizophrenia patients and 48 bipolar disorder patients was acquired. Blood collection, RNA extraction, cRNA preparation, array hybridization, and fluorescent signal (fluorescent dye Cy5) scanning were performed as described above. The image data of 1714 spots read by the scanner was digitized and normalized by global normalization using CodeLink ™ Expression Analysis software.

(2) Construction of classification prediction model by neural network Analyzes normalized data using neural network installed in commercially available analysis software ArrayAssist (registered trademark) (STRATAGENE) to construct classification prediction algorithm Tried. A classification prediction algorithm is a series of algorithms that can output an optimal solution by inputting a data set whose attributes have been clarified in advance and performing “learning and training”. It is said that an algorithm with high classification accuracy can be constructed from the high learning effect.

A part of the normalized data (40 untreated schizophrenia patients, 32 bipolar disorder patients) was input to the ArrayAssist neural network as a learning example, and an algorithm was constructed. Feature Selection was performed by the variable increment method (Forward Selection), and the classification prediction algorithm was constructed by cross validation (N-fold cross validation (N = 3)) that divided the learning data into three sets. Specifically, the data set of the learning example was divided into three parts, and the prediction was made using one of the 216 probes with significant difference one by one while changing the data set. We decided to adopt the probe as a probe to be used for classification prediction, and added probes that were repeatedly used in this order. In this way, the number of probes used was gradually increased, and when the ratio (Number 分類 of Class Accuracy (%)) that was correctly classified reached a plateau, the learning was terminated.

Next, the data set of the test example was analyzed using the algorithm learned in this way. Enter the normalized data for the probe set used at the point where Number of Class Accuracy reaches the plateau into the learned algorithm above to verify how well the classification of the test example matches the clinical diagnosis did.

Numerous algorithms are constructed by variously changing various parameters of the neural network (learning efficiency, momentum, number of repetitions, number of layers, number of neurons), and learning accuracy by using cross-validation and test examples described above for each. Verification was performed.

As a result, it was possible to classify the test examples most correctly in the algorithm in which the parameters were set as follows.
Learning efficiency: 0.45
Momentum: 0.3
Number of repetitions: 150
Number of layers: 1
Number of neurons: 3

Table 4 and Table 5 show the prediction results for learning examples and test examples based on this algorithm. Further, FIG. 1 shows the result of Forward Selection for this algorithm.

As shown in FIG. 1, according to the constructed algorithm, schizophrenia and bipolar disorder can be classified with high accuracy using expression data from 12 probes (Table 2, supra).

5. CodeLink (trade name) 16-Assay Bioarray classification prediction based on measurement results (multiple regression analysis)
Similar to the above, classification prediction by multiple regression analysis was attempted using gene expression information of 60 untreated schizophrenia patients and 48 bipolar disorder patients. Using the expression data from the 12 probes as explanatory variables, a multiple regression analysis was performed on the learning example using commercially available software (SPSS) to construct a prediction formula. Multiple regression analysis was performed so that the dependent variable was increased in patients with schizophrenia. Subsequently, the dependent variable was calculated about the said test example using the constructed prediction formula. The obtained prediction formula is as follows.
Y = (A ₁ X ₁ + A ₂ X ₂ + A ₃ X ₃ + A ₄ X ₄ + A ₅ X ₅ + A ₆ X ₆ + A ₇ X ₇ + A ₈ X ₈ + A ₉ X ₉ + A ₁₀ X ₁₀ + A ₁₁ X ₁₁ + A ₁₂ X ₁₂ + C) x 100
X ₁ is the gene expression level of FLJ21881 (GE492524 SEQ ID NO: 18)
X ₂ is the gene expression level of DLGAP3 (GE54859 SEQ ID NO: 44)
X ₃ is the gene expression level of FAM20A (GE56606 SEQ ID NO: 61)
X ₄ is the gene expression level of MAX (GE59858 SEQ ID NO: 90)
X ₅ is the gene expression level of ZNF74 (GE60153 SEQ ID NO: 97)
X ₆ is the gene expression level of DIAPH2 (GE62680 SEQ ID NO: 120)
X ₇ is the gene expression level of CR1 (GE62914 SEQ ID NO: 125)
X ₈ is the gene expression level of RAD54B (GE79729 SEQ ID NO: 161)
X ₉ is the gene expression level of GPR30 (GE80129 SEQ ID NO: 167)
X ₁₀ is the gene expression level of SCD5 (GE82995 SEQ ID NO: 195)
X ₁₁ is the gene expression level of IMAGE: 5785888 (GE878897 SEQ ID NO: 218)
X ₁₂ indicates the gene expression level of INSL3 (GE88024 SEQ ID NO: 220), respectively.
The coefficient multiplied by the expression level of each gene is
A ₁ is -0.166864749248881
A ₂ is 0.578595208826776
A ₃ is -0.251720387137894
A ₄ is 0.285088434152454
A ₅ is, 0.149134281702735
A ₆ is -0.581754365047968
A ₇ is 0.965099433225613
A ₈ is -0.237927470298808
A ₉ is 0.724852821407317
A ₁₀ is 0.467110697687733
A ₁₁ is 1.55443576023811
A ₁₂ is -0.0695649776248728
Constant C is -1.20827077326351
It is.

Tables 6 and 7 show the results of tabulating the dependent variable 50 as a cutoff value. Moreover, the dependent variable calculated about each sample of a learning example and a test example is shown in FIG.

Sensitivity: 95.0% (38/40), Specificity: 87.5% (28/32), Correct answer rate: 91.7% (66/72)

Sensitivity: 85.0% (17/20), Specificity: 81.3% (13/16), Correct answer rate: 83.3% (30/36)

As described above, when the cutoff value is 50, both the sensitivity and specificity in the test example exceeded 80%. Multiple regression analysis was also able to classify schizophrenia and bipolar disorder with high accuracy using the above 12 gene expression levels.

Claims

A method for discriminating between bipolar disorder and schizophrenia, using as an index the expression level of the following gene groups (1) to (12) in a sample isolated from a living body:
(1) FLJ21881 (SEQ ID NO: 1)
(2) DLGAP3 (SEQ ID NO: 2)
(3) FAM20A (SEQ ID NO: 3)
(4) MAX (SEQ ID NO: 4)
(5) ZNF74 (SEQ ID NO: 5)
(6) DIAPH2 (SEQ ID NO: 6)
(7) CR1 (SEQ ID NO: 7)
(8) RAD54B (SEQ ID NO: 8)
(9) GPR30 (SEQ ID NO: 9)
(10) SCD5 (SEQ ID NO: 10)
(11) IMAGE: 5785888 (SEQ ID NO: 11)
(12) INSL3 (SEQ ID NO: 12)
The method according to claim 1, wherein only the expression level of the gene group (1) to (12) is used as an index.
The expression level of the gene group is SEQ ID NO: 18, SEQ ID NO: 44, SEQ ID NO: 61, SEQ ID NO: 90, SEQ ID NO: 97, SEQ ID NO: 120, SEQ ID NO: 125, SEQ ID NO: 161, SEQ ID NO: 167, SEQ ID NO: 195, SEQ ID NO: The method according to claim 1 or 2, which is measured by an oligonucleotide probe having a base sequence represented by 218 and SEQ ID NO: 220.
4. The method according to claim 1, further comprising a step of comparing the expression level of the gene group in the sample with the expression level of the gene group in a known bipolar disorder patient and schizophrenic patient. The method according to item.
The method according to claim 4, wherein the comparison is performed by a neural network trained by a variable increment method using expression levels of the genes in known bipolar disorder patients and schizophrenia patients.
The method according to claim 4, wherein the comparison is performed by multiple regression analysis using the expression level of the gene group as an explanatory variable.
7. The method of claim 6, comprising comparing the dependent variable calculated for the sample with a cutoff value determined based on the dependent variable calculated for known bipolar disorder patients and schizophrenic patients.
The expression level of the gene group is obtained by normalizing the signal intensity of the gene expression level measured by an array spotted with oligonucleotide probes having the nucleotide sequences shown in SEQ ID NOs: 13 to 228 by the global normalization method. 8. A method according to any one of claims 3-7.
The method according to claim 8, wherein the array further comprises oligonucleotide probes having the base sequences shown in SEQ ID NOs: 229 to 527.
10. The method according to any one of claims 1 to 9, wherein the sample is blood.