JP4985480B2

JP4985480B2 - Method for classifying cancer cells, apparatus for classifying cancer cells, and program for classifying cancer cells

Info

Publication number: JP4985480B2
Application number: JP2008054904A
Authority: JP
Inventors: 浩嗣松野; 功典佐々木; 智子近藤; 裕教北風; 信彦池田; 秀実松岡
Original assignee: NATIONAL UNIVERSITY CORPORATION YAMAGUCHI UNIVERSITY; Institute of National Colleges of Technologies Japan
Current assignee: NATIONAL UNIVERSITY CORPORATION YAMAGUCHI UNIVERSITY; Institute of National Colleges of Technologies Japan
Priority date: 2008-03-05
Filing date: 2008-03-05
Publication date: 2012-07-25
Anticipated expiration: 2028-03-05
Also published as: JP2009210465A

Description

本発明は、がん細胞を分類する方法、がん細胞を分類するための装置及びがん細胞を分類するためのプログラムに関し、特にレーザ走査サイトメータ（ＬＳＣ）により取得された蛋白質データについて自己組織化マップ（ＳＯＭ）を用いたがん細胞を分類する方法、がん細胞を分類するための装置及びがん細胞を分類するためのプログラムに関する。 The present invention relates to a method for classifying cancer cells, a device for classifying cancer cells, and a program for classifying cancer cells, and more particularly to self-organization of protein data acquired by a laser scanning cytometer (LSC). The present invention relates to a method for classifying cancer cells using a conversion map (SOM), an apparatus for classifying cancer cells, and a program for classifying cancer cells.

がんの検査、治療方針決定、さらには治療法および治療薬開発のために、患者からがん組織および細胞を摘出し、組織および細胞レベルで分析することが医学、薬学、生物学の分野で行われている。がんはその形態の違いからある程度は分類することが可能であるが（組織型の決定）、同一部位から摘出した同じ組織型のがんであっても、進行の速さ、治療薬に対する反応が一様ではないことがしばしばあり、形態学的特徴により決定される組織型分類よりさらに詳細な分類が医療分野では求められている。 In the fields of medicine, pharmacy, and biology, cancer tissue and cells are removed from patients and analyzed at the tissue and cell level for cancer testing, treatment policy determination, and treatment and drug development. Has been done. Cancers can be classified to some extent based on their morphology (determination of tissue type), but even if cancers of the same tissue type are removed from the same site, the speed of progression and response to therapeutic agents Often it is not uniform and a more detailed classification is required in the medical field than the histological classification determined by morphological features.

がん等の悪性細胞の検査、分類を行うことについては下記の文献に開示されるようなものがある。 Regarding the examination and classification of malignant cells such as cancer, there are those disclosed in the following literature.

特許文献１には、顕微鏡及びＣＣＤカメラで得られた画像から分類装置により悪性細胞を検出するシステム及び方法について記載されている。これにより細胞の形状等視覚的な情報からのがん細胞の検出は可能であるが、同形状のがん細胞を分類することはできない。 Patent Document 1 describes a system and method for detecting malignant cells from an image obtained with a microscope and a CCD camera by using a classification device. Accordingly, cancer cells can be detected from visual information such as cell shape, but cancer cells having the same shape cannot be classified.

特許文献２には、撮像された細胞の画像データからＬａｂ色空間のｂ座標値を取得し、その座標値を使用して細胞の色彩に関する特徴パラメータを用いて細胞を分類することについて記載されているが、この方法では細胞の色彩、形状により組織としての分類はなされるが、同形状の細胞を分類することはできない。 Patent Document 2 describes that a b coordinate value of a Lab color space is obtained from imaged cell image data, and the cell is classified using a feature parameter related to the color of the cell using the coordinate value. However, in this method, the tissue is classified according to the color and shape of the cells, but the cells having the same shape cannot be classified.

特許文献３には、健康状態の判断を行う方法として自己組織化マップを利用し、健康状態を自己組織化マップ上に表示することについて記載されているが、細胞の分類を行うものではない。 Patent Document 3 describes using a self-organizing map as a method for determining a health state and displaying the health state on the self-organizing map, but does not classify cells.

特許文献４には、ＭＲＩによって取得された画像を対象とし自己組織化マップを用いて所望の画像に容易に検索、アクセスできるようにすることについて記載されているが、細胞１つ１つのレベルの画像を対象とするものではなく、がん細胞の特徴を抽出し分類することはできない。 Patent Document 4 describes that an image acquired by MRI can be easily searched and accessed using a self-organizing map for an image acquired by MRI. It is not intended for images, and features of cancer cells cannot be extracted and classified.

細胞内物質を定量する方法としてサイトメトリーという方法がある。これは蛍光色素単体あるいは蛍光色素を標識した抗体、ＤＮＡといった分子プローブで染色した試料となる細胞にレーザを照射し、発せられる蛍光量を測定することによって細胞の大きさ、細胞内物質の定量を行うものである。近年、このサイトメトリーの技術の一つとしてレーザ走査サイトメータ（ＬＳＣ）が開発され、さらに細胞内物質の凝集度についても測定することができるようになった。このＬＳＣにより多種多様ながん細胞のデータを取得し、タンパク質の量とその凝集度に着目した分析を行い、顕微鏡では分類が困難ながん細胞について分類することが試みられているが、これまでに有効な分類手法は見出されていない。
特表２００１−５１２８２４号公報特開２００４−３４０７３８号公報特開２００３−２６３５０２号公報特開２００６−２３５９７１号公報 There is a method called cytometry as a method for quantifying intracellular substances. This is a method of quantifying the size of cells and intracellular substances by irradiating laser to cells that become samples stained with molecular probes such as fluorescent dyes alone, fluorescent dye-labeled antibodies and DNA, and measuring the amount of fluorescence emitted. Is what you do. In recent years, a laser scanning cytometer (LSC) has been developed as one of the cytometry techniques, and it has become possible to measure the degree of aggregation of intracellular substances. This LSC is used to acquire data on a wide variety of cancer cells, conduct analysis focusing on the amount of protein and its degree of aggregation, and attempt to classify cancer cells that are difficult to classify with a microscope. No effective classification method has been found so far.
Special table 2001-512824 gazette JP 2004-340738 A JP 2003-263502 A JP 2006-235971 A

ＣＣＤカメラで撮影された画像を用いる従来のがん細胞の分類においては、がん細胞の視覚的な情報しか得られず、がん細胞の検出は行えるが、特徴抽出は限定されたものであった。また、ほぼ同形状のがん細胞はそれ以上の詳細な分類を行うことができなかった。そのため、がん細胞の特徴に基づいたさらに適格な情報を取得し、同形状のがん細胞についてもその特徴に基づいたより詳細な分類を行えるようにすることが求められていた。 In conventional classification of cancer cells using images taken with a CCD camera, only visual information about the cancer cells can be obtained and cancer cells can be detected, but feature extraction is limited. It was. In addition, cancer cells having almost the same shape could not be further classified. For this reason, it has been required to obtain more appropriate information based on the characteristics of cancer cells so that more detailed classification of cancer cells having the same shape can be performed based on the characteristics.

本発明は、前述した課題を解決すべくなしたものであり、本発明によるがん細胞を分類する方法は、レーザ走査サイトメータにより抽出されたがん細胞についてのデータを処理することによりがん細胞を分類する方法であって、がん細胞についてレーザ走査サイトメータで取得されたデータから必要なデータとして蛋白質量と蛋白質の凝集度のデータを切り出して散布図を作成することと、該散布図を示すデータを低次元化し入力データを作成することと、該入力データをそれぞれニューロンの集まりである入力層及びマップ層を有するネットワークの入力層に入力し前記入力層とマップ層との間での結合荷重についての学習を行ってトポロジカルマッピングを行うことにより自己組織化マップを形成することと、形成された自己組織化マップにおける蛋白質の分布形態によりがん細胞の特徴を判別しがん細胞の分類を行うことと、からなるものである。 The present invention has been made to solve the above-mentioned problems, and a method for classifying cancer cells according to the present invention is performed by processing data about cancer cells extracted by a laser scanning cytometer. A method for classifying cells, wherein data of protein mass and protein aggregation degree are cut out as necessary data from data acquired by a laser scanning cytometer for cancer cells, and a scatter diagram is created. The input data is input to the input layer of a network having an input layer and a map layer, each of which is a collection of neurons, and is input between the input layer and the map layer. A self-organizing map is formed by learning about connection weights and performing topological mapping, and the formed self-organizing map is formed. And carrying out the classification of cancer cells to determine the characteristics of the cancer cells by the distribution form of the protein in, it is made of.

本発明によるがん細胞を分類する装置は、レーザ走査サイトメータにより抽出されたがん細胞についてのデータを処理することによりがん細胞を分類する装置であって、がん細胞についてレーザ走査サイトメータで取得されたデータから必要なデータとして蛋白質量と蛋白質の凝集度のデータを切り出して散布図を作成する散布図作成手段と、前記散布図を示すデータを低次元化し入力データを作成する入力データ作成手段と、それぞれニューロンの集まりである入力層及びマップ層からなるネットワークを有し前記入力層とマップ層との間での結合荷重についての学習を行ってトポロジカルマッピングを行うことにより自己組織化マップを形成する自己組織化マップ形成手段と、からなるものである。 An apparatus for classifying cancer cells according to the present invention is an apparatus for classifying cancer cells by processing data about cancer cells extracted by a laser scanning cytometer, and the laser scanning cytometer for cancer cells. Scatter plot creating means for cutting out the data of protein mass and protein aggregation degree as necessary data from the data acquired in step 1 to create a scatter plot, and input data for lowering the data indicating the scatter plot and creating input data A self-organizing map by creating a topological mapping by learning about the connection weight between the input layer and the map layer, having a network comprising an input layer and a map layer, each of which is a collection of neurons; And a self-organizing map forming means for forming

また、本発明によるがん細胞を分類するためのプログラムは、レーザ走査サイトメータにより抽出されたがん細胞についてのデータを処理することによりがん細胞を分類することをコンピュータ上で行うためのプログラムであって、がん細胞についてレーザ走査サイトメータで取得されたデータから必要なデータとして蛋白質量と蛋白質の凝集度のデータを切り出して散布図を作成し、該散布図を示すデータを低次元化し入力データを作成し、該入力データをそれぞれニューロンの集まりである入力層及びマップ層を有するネットワークの入力層に入力し前記入力層とマップ層との間での結合荷重についての学習を行ってトポロジカルマッピングを行うことにより自己組織化マップを形成するようにしたものである。 In addition, the program for classifying cancer cells according to the present invention is a program for classifying cancer cells on a computer by processing data on cancer cells extracted by a laser scanning cytometer. However, from the data obtained with a laser scanning cytometer for cancer cells, the data of protein mass and protein aggregation degree are cut out as necessary data to create a scatter diagram, and the data indicating the scatter diagram is reduced in dimension. Create input data, input the input data to an input layer of a network having an input layer and a map layer, each of which is a collection of neurons, and learn about connection weights between the input layer and the map layer to obtain topological A self-organizing map is formed by mapping.

本発明では、レーザ走査サイトメータで取得されたがん細胞のデータについて自己組織化マップを用いた学習によりマッピングを行うことにより、がん細胞の特徴を抽出し分類することができ、同形状のがん細胞であっても細胞の特徴に基づいたさらに詳しい分類が可能となる。 In the present invention, cancer cell data acquired with a laser scanning cytometer can be extracted and classified by performing mapping using learning using a self-organizing map, and the same shape can be obtained. Even cancer cells can be classified in more detail based on the characteristics of the cells.

本発明によるがん細胞の分類では、データの解析による分類の手法として自己組織化マップ（ＳＯＭ）を用いて細胞の分類を行う。ＳＯＭはニューラルネットワークにおける教師なし学習モデルの一つであり、学習により入力データの類似度を自動的に見出し、類似する入力同士をマップ層の近くに配置するトポロジカルマップを形成するものである。トポロジカルマップの形成に際し、レーザ走査サイトメータ（ＬＳＣ）により抽出したがん細胞データから蛋白質の量とその凝集度の関係を表す画像データを作成し、その距離情報を学習によりマップ化して細胞の分類を行う。 In the classification of cancer cells according to the present invention, cells are classified using a self-organizing map (SOM) as a classification technique based on data analysis. The SOM is one of unsupervised learning models in a neural network, and automatically finds out the similarity of input data by learning, and forms a topological map in which similar inputs are arranged near the map layer. When forming a topological map, image data representing the relationship between the amount of protein and the degree of aggregation is created from cancer cell data extracted by a laser scanning cytometer (LSC), and the distance information is mapped by learning to classify cells. I do.

本発明によるがん細胞の分類は、レーザ走査サイトメータでの検出により細胞の試料から取得されたデータについて解析処理を行うことにより細胞の分類を行うものである。そこで、最初にレーザ走査サイトメータについて説明する。
〔レーザ走査サイトメータ〕
レーザ走査サイトメータ（以下、ＬＳＣという）は、細胞の試料にレーザを照射することによって細胞から発せられる散乱光や蛍光を検出する装置である。試料としてはスライドガラス上に培養あるいは貼り付けられた細胞を用いる。これらを測定対象とする細胞内物質（ＤＮＡ、蛋白質、非ペプチド性物質、薬物など）に応じた方法で蛍光標識（染色）する。細胞はレーザで走査し、細胞から発せられる蛍光の輝度をフォトセンサーで検出し、検出された信号を変換することにより細胞の大きさ、細胞内物質の量、局在（位置情報）などが測定できる。 The classification of cancer cells according to the present invention classifies cells by performing analysis processing on data acquired from a sample of cells by detection with a laser scanning cytometer. First, a laser scanning cytometer will be described.
[Laser scanning cytometer]
A laser scanning cytometer (hereinafter referred to as LSC) is a device that detects scattered light and fluorescence emitted from a cell by irradiating a cell sample with a laser. As the sample, cells cultured or pasted on a slide glass are used. These are fluorescently labeled (stained) by a method according to the intracellular substance (DNA, protein, non-peptide substance, drug, etc.) to be measured. The cells are scanned with a laser, the brightness of the fluorescence emitted from the cells is detected with a photosensor, and the detected signal is converted to measure the size of the cell, the amount of intracellular material, localization (location information), etc. it can.

ＬＳＣでは、蛍光量を微小なピクセル単位で測定し、蛍光物質の量、面積等を測定する際に、同時に位置情報も取得する。１つの細胞として認識された領域内の全ピクセルの蛍光量を合算して１つの細胞における蛍光量（Ｉｎｔｅｇｒａｌ）とし、その中で最も高い蛍光量を示すピクセルを極大ピクセル（Ｍａｘｐｉｘｅｌ）とする。この極大ピクセルは、換言すれば蛍光物質の凝集度、すなわち測定する細胞内物質の凝集度を表す指標である。この極大ピクセルの値の値によってがん細胞周期を簡単に識別することができる。
〔細胞の試料〕
抗体は抗原と特異的に反応する特性がある。この性質を利用すれば細胞内物質を抗原抗体反応によって蛍光免疫染色することができる。ＬＳＣでは、図１に示すようなセルアレイ上に貼り付けた、あるいはセルアレイ上で培養された細胞について測定を行う。セルアレイはスライドガラス上に多数（例えば５０個）の直径２ｍｍ程度の凹形のスポットが設けられたものであり、図の斜線部分が無蛍光インクによる印刷で盛り上がっている。スポットの配列された部分は縦１６ｍｍ、横３０ｍｍ程度の寸法である。セルアレイの１つのスポットの中には通常１００〜５００個程度のがん細胞がある。従来の方法では、１枚のスライドガラスで１種類の細胞について測定を行っていたが、セルアレイを使用することにより１度に多くのがん細胞のデータを取得することができる。
〔ＬＳＣのデータ〕
本発明においては、ＬＳＣとしてＬＳＣ２（オリンパス株式会社製）を使用している。ＬＳＣでの測定により得られるデータはテキスト形式のデータであり、その例を図２に示す。図２の形のデータについて概略説明すると、最初の部分はＬＳＣの機器設定を表し、レーザの種類やセルアレイ上の細胞のある範囲等が設定される。その機器設定データに続いて、細胞一つ一つに関するＮｏ、位置情報、直径、面積、周囲、ＤＮＡインデックス（ＤＩ）等のデータが記されている。 In LSC, the amount of fluorescence is measured in units of minute pixels, and when measuring the amount, area, etc. of the fluorescent material, position information is also acquired at the same time. The amount of fluorescence of all the pixels in the region recognized as one cell is added to obtain the amount of fluorescence (Integral) in one cell, and the pixel showing the highest amount of fluorescence among them is defined as the maximum pixel (Max pixel). In other words, the maximum pixel is an index representing the aggregation degree of the fluorescent substance, that is, the aggregation degree of the intracellular substance to be measured. The cancer cell cycle can be easily identified by the value of the maximum pixel value.
[Cell samples]
Antibodies have the property of reacting specifically with antigens. If this property is utilized, the intracellular substance can be fluorescent immunostained by antigen-antibody reaction. In the LSC, measurement is performed on cells attached on a cell array as shown in FIG. 1 or cultured on the cell array. The cell array has a large number (for example, 50) of concave spots having a diameter of about 2 mm on a slide glass, and the hatched portion in the figure is raised by printing with non-fluorescent ink. The portion where the spots are arranged is about 16 mm long and 30 mm wide. There are usually about 100 to 500 cancer cells in one spot of the cell array. In the conventional method, one type of cell is measured with one slide glass. However, data of many cancer cells can be acquired at a time by using a cell array.
[LSC data]
In the present invention, LSC2 (manufactured by Olympus Corporation) is used as the LSC. Data obtained by LSC measurement is data in text format, and an example is shown in FIG. The data in the form of FIG. 2 will be described briefly. The first part represents LSC device settings, in which the type of laser, a certain range of cells on the cell array, and the like are set. Subsequent to the device setting data, data such as No, position information, diameter, area, circumference, and DNA index (DI) related to each cell are described.

細胞分類のための解析を行うに際して、機器設定については初期値を用いることにすれば、機器設定のデータを使用することなく、各細胞についてのデータを考慮すればよいことになる。また、各細胞についてのデータに関して、細胞が重なり合っているような不要なデータ（ゴミと称される）があるが、それらの不要なデータを削除し必要なデータを取得することになる。 When performing the analysis for cell classification, if the initial value is used for the device setting, the data for each cell may be considered without using the device setting data. Further, regarding the data about each cell, there is unnecessary data (referred to as garbage) in which cells overlap each other. However, the unnecessary data is deleted and necessary data is acquired.

必要なデータを取得するために、データ切り出しプログラム（例えばＡＷＣ：ＡｎｙＷｈｅｒｅＣｙｔｅ）を用い、テキスト形式のデータから蛋白質量等の必要なデータを選択形式で抽出することにより、セルアレイのスポットごとにデータが切り出され、そのデータをｃｓｖ形式で蓄積する。このｃｓｖ形式のデータの例は図３に示すようなものである。 In order to obtain the necessary data, data is extracted for each spot in the cell array by extracting the necessary data such as protein mass from the text format data in a selected format using a data extraction program (for example, AWC: Any Where Cyte). Is extracted and the data is stored in csv format. An example of the data in the csv format is as shown in FIG.

図３に示すデータのうち、細胞の分類に用いるのは蛋白質の量（Ｉｎｔｅｇｒａｌ）と蛋白質の凝集度（ＭａｘＰｉｘｅｌ）である。蛋白質の量は、がん細胞内に存在する蛋白質の量であり、凝集度は蛋白質ががん細胞内にどのように存在しているかを表す量である。蛋白質の凝集度はＬＳＣを用いることにより新たに測定が可能になったものである。 In the data shown in FIG. 3, the amount of protein (Integral) and the degree of protein aggregation (Max Pixel) are used for cell classification. The amount of protein is the amount of protein present in the cancer cell, and the degree of aggregation is an amount representing how the protein is present in the cancer cell. The degree of protein aggregation can be newly measured by using LSC.

次に、図３のような切り出されたデータから散布図を作成する。散布図は、セルアレイのスポットごとにデータを抽出し、特定の蛋白質に蛋白質の量と凝集度とをパラメータとした分布特性を表すようなものであり、その例を図４（ａ）〜（ｃ）に示す。図中の各点はがん細胞についてのデータを表している。これらの散布図において、蛋白質の違いにより点の集まりの形状、傾斜、密度等に違いがあることが見られるが、異種の蛋白質でも形状が似ている場合、同種のがん細胞、同種の蛋白質でも異なった形状になるというように多様であり、散布図を見ただけではデータの分類はできないため、ＳＯＭを用いてデータを分類する。
〔自己組織化マップ（ＳＯＭ）〕
（１）ＳＯＭネットワークの構造
図５はＳＯＭネットワークの構造を示すものである。ネットワークはそれぞれニューロンの集まりである入力層とマップ層との２層からなる。層内でのニューロンの結合はなく、入力層とマップ層との間ではニューロンが全結合になっている。マップ層では出力を視覚的に見るために、通常ニューロンが２次元に配置されている。 Next, a scatter diagram is created from the cut out data as shown in FIG. The scatter diagram is such that data is extracted for each spot of the cell array, and the distribution characteristic is expressed by using the amount of protein and the degree of aggregation as parameters for a specific protein, and examples thereof are shown in FIGS. ). Each point in the figure represents data on cancer cells. In these scatter plots, it can be seen that there is a difference in the shape, slope, density, etc. of the cluster of points due to the difference in protein, but when different proteins have similar shapes, the same type of cancer cell, the same type of protein However, there are various shapes such as different shapes, and data cannot be classified simply by looking at the scatter diagram. Therefore, data is classified using SOM.
[Self-Organizing Map (SOM)]
(1) Structure of SOM network FIG. 5 shows the structure of the SOM network. Each network consists of two layers, an input layer and a map layer, each of which is a collection of neurons. There is no neuron connection in the layer, and the neuron is fully connected between the input layer and the map layer. In the map layer, neurons are usually arranged in two dimensions in order to visually see the output.

時刻ｔにおいて入力層に入力ベクトル
ｘ（ｔ）＝［ｘ_１（ｔ），…，ｘ_ｉ（ｔ），…，ｘ_ｎ（ｔ）］
が与えられると、マップ層のニューロンは結合荷重
ｘ_ｊ（ｔ）＝［ｗ_ｊ１（ｔ），…，ｗ_ｊｉ（ｔ），…，ｗ_ｊｎ（ｔ）］
（ｗ_ｊｉは入力層のｉ番目のニューロンとマップ層のｊ番目のニューロンとの間の荷重である。）を介して入力層からの入力を受け、学習のアルゴリズムに従って後述する学習を繰り返す。その結果、似た入力に対しては、マップ層の互いに近くのニューロンが反応するようになる。すなわち、ネットワークはトポロジカルマッピングを行う。
（２）ＳＯＭでの学習のアルゴリズム
入力層とマップ層との間でのニューロン間の結合荷重についての学習を行うことによりトポロジカルマッピングを行うのであるが、その学習のアルゴリズムを示す。
ａ．ネットワークの初期化
入力層とマップ層との間の結合荷重の初期値を乱数によって設定する。
ｂ．入力ベクトルの入力
入力層に入力ベクトルｘ＝（ｘ_１，…，ｘ_ｉ，…，ｘ_ｎ）を入力する。
ｃ．入力ベクトルと結合荷重ベクトルの距離計算
入力ベクトルとマップ層の各ニューロンの結合荷重ベクトルの距離を計算する。入力ベクトルとマップ層のｊ番目のニューロンの結合荷重ベクトルとの距離ｄ_ｊは Input vector x (t) = [x ₁ (t),..., X _i (t),..., X _n (t)] at time t
, The neuron of the map layer has a connection weight x _j (t) = [w _j1 (t),..., W _ji (t) _,.
(W _ji is a load between the i-th neuron in the input layer and the j-th neuron in the map layer.) The input from the input layer is received through, and learning described later is repeated according to the learning algorithm. As a result, nearby neurons in the map layer react to similar inputs. That is, the network performs topological mapping.
(2) Learning algorithm in SOM Topological mapping is performed by learning about the connection load between neurons between the input layer and the map layer. The learning algorithm is shown below.
a. Network initialization The initial value of the bond weight between the input layer and the map layer is set by a random number.
b. Input of input vector Input vector x = (x ₁ ,..., X _i ,..., X _n ) is input to the input layer.
c. Calculating the distance between the input vector and the connection weight vector Calculate the distance between the input vector and the connection weight vector of each neuron in the map layer. The distance d _j between the coupling weight vector of the j-th neuron of the input vector and map layer

で与えられる。
ｄ．勝者ニューロンの決定
距離ｄ_ｊが最小となるニューロン、すなわち入力ベクトルに最も近い結合荷重ベクトルをもつマップ層でニューロンを選択する。このニューロンを勝者ニューロンと呼び、ｊ^＊とする。
ｅ．結合荷重と各パラメータの更新
勝者ニューロンとその近傍領域内の全てのニューロンの結合荷重を
Δｗ_ｊｉ＝αｈ（ｊ，ｊ^＊）（ｘ−ｗ_ｊｉ）・・・・・・（２）
に基づいて更新する。この近傍関数ｈ（ｊ，ｊ^＊）は

Given in.
d. Determining the winner neuron Select the neuron with the smallest distance d _j , that is, the map layer with the connection weight vector closest to the input vector. This neuron is called the winner neuron and is designated j ^* .
e. Connection weight and update of each parameter The connection weights of the winner neuron and all the neurons in the neighborhood are expressed as follows: Δw _ji = αh (j, j ^* ) (x−w _ji ) (2)
Update based on. This neighborhood function h (j, j ^* ) is

で定義される。αは学習率係数であり、学習の経過とともに減少させるようにする。同様に、σも学習の経過とともに減少させる。
ｆ．反復
ｂに戻り、ｂからｅの過程を反復する。がん細胞、蛋白質についてＬＳＣで取得されたデータを分散図データとし処理して得られた入力データについてＳＯＭによる学習を行い、その結果として、がん細胞の特性を表すマップ図が得られる。
〔システムの構成〕
ＬＳＣで取得されたデータを処理して入力データとしＳＯＭによる学習を行う過程をフローで示すと図６のようになる。ＬＳＣで取得されデータ処理を行って得られた散布図のデータについて、データ量を減らすために特徴を保持したまま低次元化する平滑化処理、グレースケール変換を行い、標本化を行う。この処理により例えば１０×１０＝１００次元まで次元を落とし、入力ベクトルを作成する。入力ベクトルの各要素は標本化後の散布図の画像の１ブロックに対応し、０〜２５５の値をもつ。

Defined by α is a learning rate coefficient, and is decreased with the progress of learning. Similarly, σ is decreased with the progress of learning.
f. Return to iteration b and repeat the process from b to e. Learning by SOM is performed on the input data obtained by processing the data acquired by the LSC for cancer cells and proteins as scatter diagram data, and as a result, a map diagram representing the characteristics of the cancer cells is obtained.
[System configuration]
FIG. 6 is a flowchart showing a process of learning by SOM by processing data acquired by LSC and using it as input data. The scatter diagram data obtained by the data processing obtained by the LSC is sampled by performing a smoothing process and a gray scale conversion for reducing the dimensions while maintaining the characteristics in order to reduce the data amount. By this processing, for example, the dimension is reduced to 10 × 10 = 100 dimensions, and an input vector is created. Each element of the input vector corresponds to one block of the image of the scatter diagram after sampling, and has a value of 0 to 255.

ＳＯＭネットワークにおけるマップ層は例えば２０×２０＝４００のニューロンの２次元マップとし、マップ層の各ニューロンは１００次元の結合荷重ベクトルをもつ。これは標本化後の散布図の画像全体に対応している。 The map layer in the SOM network is, for example, a two-dimensional map of 20 × 20 = 400 neurons, and each neuron in the map layer has a 100-dimensional connection weight vector. This corresponds to the entire image of the scatter diagram after sampling.

ＬＳＣで取得されたデータを処理して入力データとしＳＯＭによる学習を行いがん細胞を分類するための装置は図７に示す構成を有する。この装置は、概略的にＬＳＣで取得されたテキストデータを処理してＳＯＭ作成のための入力データとする入力データ生成部１０と、入力データについて学習を行うＳＯＭ作成部２０とを備えるものである。 A device for classifying cancer cells by processing data acquired by the LSC and using it as input data to perform learning by SOM has the configuration shown in FIG. This apparatus includes an input data generation unit 10 that roughly processes text data acquired by an LSC to be input data for SOM generation, and an SOM generation unit 20 that learns about the input data. .

入力データ生成部１０は、ＬＳＣで取得されたテキストデータから必要なデータを切り出すデータ切り出し部と、切り出されたデータの低次元化を行いｃｖｓデータに変換するデータ変換部と、変換されたｃｖｓデータから散布図を作成する散布図作成部と、散布図のデータについて低次元化、標本化を行い入力信号とする入力データ生成部とを備えている。 The input data generation unit 10 includes a data cutout unit that cuts out necessary data from text data acquired by the LSC, a data conversion unit that lowers the cutout data and converts it into cvs data, and converted cvs data A scatter diagram creating unit for creating a scatter diagram from the scatter diagram, and an input data generating unit for reducing the dimensionality and sampling of the data of the scatter diagram as an input signal.

ＳＯＭ生成部２０は、入力データ生成部１０において生成された入力データを入力し学習を行うニューラルネットワークの構成を備え、それぞれニューロンの集まりである入力層及びマップ層からなり、入力層とマップ層との間の結合荷重についての学習を行ってトポロジカルマッピングを行うことによりＳＯＭを形成する。
〔分析例〕
５種のがん細胞と蛋白質４種についてＳＯＭによる学習を行った結果について示す。がん細胞はＫＵＲＡＭＯＣＨＩ，ＭＫＮ１，ＭＳ−１，ＳＷ−１３，ＷｉＤｒであり、各がん細胞の蛋白質はＬ−２６，ＬＣＡ，ＲＥＴ，ｐ２７である。ＳＯＭによる学習は１種類の蛋白質当たり６枚の散布図の画像について１５回行い、これを１サイクルとし、３０サイクル反復する。１種類の蛋白質についての学習をサブ学習と呼び、その回数をサブ学習回数と言う。学習における各パラメータと学習回数を表１のように設定した。 The SOM generation unit 20 has a configuration of a neural network that inputs and learns the input data generated by the input data generation unit 10, and includes an input layer and a map layer, each of which is a collection of neurons. The SOM is formed by performing topological mapping by learning about the coupling load between the two.
[Example of analysis]
It shows about the result of having learned by SOM about five kinds of cancer cells and four kinds of proteins. The cancer cells are KURAMOCHI, MKN1, MS-1, SW-13, and WiDr, and the proteins of each cancer cell are L-26, LCA, RET, and p27. Learning by SOM is performed 15 times for 6 scatterplot images per type of protein, which is defined as one cycle, and is repeated 30 cycles. Learning about one type of protein is called sub-learning, and the number of times is called sub-learning. Each parameter in learning and the number of learning were set as shown in Table 1.

また、学習終了後に得られたマップを図８〜１２に示す。マップ図では、１種類のがん細胞についての４種類の蛋白質データをそれぞれについての勝者ニューロンとして表示している。４種類の蛋白質をＡ（Ｌ−２６），Ｂ（ＬＣＡ），Ｃ（ＲＥＴ），Ｄ（ｐ２７）で表し、１種類の蛋白質データについての６枚のデータにそれぞれ番号１〜６を対応づけている。図中、Ｘは２種類以上の蛋白質に対する勝者ニューロンを示しており、例えばＡ１は蛋白質Ｌ−２６に関する１番目のデータの勝者ニューロンを示している。

Moreover, the map obtained after completion | finish of learning is shown to FIGS. In the map diagram, four types of protein data for one type of cancer cell are displayed as winner neurons for each type. Four types of proteins are represented by A (L-26), B (LCA), C (RET), and D (p27), and numbers 1 to 6 are associated with six pieces of data for one type of protein data, respectively. Yes. In the figure, X indicates a winner neuron for two or more types of proteins. For example, A1 indicates a winner neuron of the first data regarding the protein L-26.

マッピングした結果の全てのマップでＸが見られる。これは、２種類以上の蛋白質においてその散布図のデータが類似していることを示しているが、この結果から特徴的な蛋白質が存在することがわかる。例えば、図８において、Ａ１〜Ａ４やＤ１、Ｄ２はＸとは異なる場所にマッピングされているが、これは、同じ種類のがん細胞データであっても、散布図に違いがあるということを示している。 X is seen in all maps resulting from mapping. This indicates that the data of the scatter plots are similar for two or more types of proteins, and it can be seen from this result that a characteristic protein exists. For example, in FIG. 8, A1 to A4, D1, and D2 are mapped to different locations from X, but this means that there is a difference in the scatter diagram even for the same type of cancer cell data. Show.

がん細胞のデータを取得する際、細胞の周期や蛋白質のデータの取得に要する時間が異なるということがあり、このことから、Ａ１〜Ａ４が示すＬ−２６とＤ１、Ｄ２が示すｐ２７はがん細胞ＫＵＲＡＭＯＣＨＩにおいて時間的に変化があるもので、他と区別される特徴的な蛋白質データであることが考えられる。 When acquiring cancer cell data, the time required for acquiring cell cycle and protein data may differ, and from this, L-26 indicated by A1 to A4 and p27 indicated by D1 and D2 are different. It is considered that this is characteristic protein data that is different from others because it changes with time in cancer cells KURAMOCHI.

図８〜１２に示した５種のがん細胞についての結果から、各がん細胞において時間的に変化する特徴的な蛋白質を表２に示す。 Table 2 shows characteristic proteins that change with time in each cancer cell based on the results for the five types of cancer cells shown in FIGS.

このように、がん細胞の試料からＬＳＣにより取得されたデータについてＳＯＭでの学習による結果として得られたマップ図によりがん細胞ごとに時間的に変化することで他と区別される特徴的な蛋白質が存在することがわかり、これにより同種のがん細胞を特徴的な蛋白質により分類できることが示される。

As described above, the data acquired by the LSC from the sample of cancer cells is distinguished from others by changing with time for each cancer cell by the map obtained as a result of learning by SOM. It can be seen that proteins exist, indicating that the same type of cancer cells can be classified by characteristic proteins.

本発明は、ＬＳＣで取得されたがん細胞のデータについてＳＯＭによる学習の結果として自己組織化マップを形成しがん細胞を分類する方法であり、また、
ＬＳＣで取得されたテキストデータを処理してＳＯＭ作成のための入力データとする入力データ生成部と、入力データについて学習を行うＳＯＭ作成部とを備えるがん細胞を分類するための装置でもある。 The present invention is a method for classifying cancer cells by forming a self-organizing map as a result of learning by SOM on cancer cell data acquired by LSC,
It is also an apparatus for classifying cancer cells, which includes an input data generation unit that processes text data acquired by the LSC to be input data for SOM generation, and an SOM generation unit that learns about the input data.

さらにＬＳＣで取得されたがん細胞のデータについて必要な処理を行った上で自己組織化マップでの学習を行うことは独自のデータ処理手段を備えた装置として構成するほか、汎用コンピュータにより実行することができ、そのためのプログラムは、がん細胞についてレーザ走査サイトメータで取得されたデータから必要なデータとして蛋白質量と蛋白質の凝集度のデータを切り出して散布図を作成し、散布図を示すデータを低次元化し入力データを作成し、入力データをそれぞれニューロンの集まりである入力層及びマップ層を有するネットワークの入力層に入力し前記入力層とマップ層との間での結合荷重についての学習を行ってトポロジカルマッピングを行うことにより自己組織化マップを形成するものである。 Furthermore, performing the necessary processing on the cancer cell data acquired by the LSC and then learning with the self-organizing map is configured as a device equipped with an original data processing means, and is executed by a general-purpose computer. Data that shows the scatter diagram can be created by cutting out the protein mass and protein aggregation data as necessary data from the data acquired with a laser scanning cytometer for cancer cells. The input data is input to the input layer of the network having the input layer and the map layer, each of which is a collection of neurons, and learning about the coupling load between the input layer and the map layer is performed. A self-organizing map is formed by performing topological mapping.

セルアレイを上方から見た図である。It is the figure which looked at the cell array from the upper part. ＬＳＣによるデータの例を示す図である。It is a figure which shows the example of the data by LSC. 図２のデータを切り出しｃｓｖ形式に変換したデータの例を示す図である。It is a figure which shows the example of the data which cut out and converted the data of FIG. 2 to csv format. 散布図の例を示す図である。It is a figure which shows the example of a scatter diagram. ＳＯＭネットワークの構造を示す図である。It is a figure which shows the structure of a SOM network. 本発明によるＬＳＣのデータからＳＯＭマップを形成する過程を示すフロー図である。FIG. 6 is a flowchart illustrating a process of forming an SOM map from LSC data according to the present invention. 本発明によるＬＳＣのデータからＳＯＭマップを形成する装置の構成を概略的に示す図である。It is a figure which shows roughly the structure of the apparatus which forms a SOM map from the data of LSC by this invention. 作成されたＳＯＭマップの例を示す図である。It is a figure which shows the example of the produced SOM map. 作成されたＳＯＭマップの例を示す図である。It is a figure which shows the example of the produced SOM map. 作成されたＳＯＭマップの例を示す図である。It is a figure which shows the example of the produced SOM map. 作成されたＳＯＭマップの例を示す図である。It is a figure which shows the example of the produced SOM map. 作成されたＳＯＭマップの例を示す図である。It is a figure which shows the example of the produced SOM map.

Explanation of symbols

１０入力データ生成部
２０ＳＯＭ作成部 10 Input data generation unit 20 SOM creation unit

Claims

A method of classifying cancer cells by processing data about cancer cells extracted by a laser scanning cytometer,
Cutting out the data of protein mass and protein aggregation degree as necessary data from the data acquired with a laser scanning cytometer for cancer cells, and creating a scatter plot,
Reducing the data representing the scatter diagram to create input data;
The input data is input to an input layer of a network having an input layer and a map layer, each of which is a collection of neurons, and learning is performed on the connection weight between the input layer and the map layer to perform topological mapping. Forming an organizational map;
Classifying the cancer cells by distinguishing the characteristics of the cancer cells according to the protein distribution in the self-organizing map formed,
A method for classifying cancer cells characterized by comprising:

An apparatus for classifying cancer cells by processing data about cancer cells extracted by a laser scanning cytometer,
Scatter plot creation means for cutting out data on protein mass and protein aggregation degree as necessary data from data acquired with a laser scanning cytometer for cancer cells,
Input data creating means for creating input data by reducing the data representing the scatter diagram;
Self having a network composed of an input layer and a map layer, each of which is a collection of neurons, and forming a self-organizing map by performing topological mapping by learning about connection weights between the input layer and the map layer An organizing map forming means;
An apparatus for classifying cancer cells, characterized by comprising:

A program for performing on a computer to classify cancer cells by processing data about cancer cells extracted by a laser scanning cytometer, which was obtained with a laser scanning cytometer Cut out the data of protein mass and protein aggregation degree as necessary data, create a scatter diagram, reduce the data showing the scatter diagram to create input data, and each input data is a collection of neurons A self-organizing map is formed by inputting into an input layer of a network having an input layer and a map layer, learning about a coupling load between the input layer and the map layer, and performing topological mapping. A program for classifying cancer cells characterized by.