JP2017026482A

JP2017026482A - Data processor, determination tree generation method, identification device, and program

Info

Publication number: JP2017026482A
Application number: JP2015145634A
Authority: JP
Inventors: 朝春喜友名; Asaharu Kiyuna; 上條　憲一; Kenichi Kamijo; 憲一上條; 亨宇坂元; Yukitaka Sakamoto; 明典橋口; Akinori Hashiguchi; 時也阿部; Tokiya Abe
Original assignee: NEC Corp; Keio University
Current assignee: NEC Corp; Keio University
Priority date: 2015-07-23
Filing date: 2015-07-23
Publication date: 2017-02-02
Anticipated expiration: 2035-07-23
Also published as: JP6665999B2

Abstract

PROBLEM TO BE SOLVED: To provide a data processor for generating an identification rule for enabling a doctor or the like to easily understand a basis of a determination result which can be obtained from the featured value of a cell image.SOLUTION: A data processor includes: an input part; and a determination tree generation part. The input part inputs learning data consisting of a label given to a cell image and a plurality of featured values extracted from the cell image as one set. The determination tree generation part generates a determination tree for identifying information which is equivalent to the label of a sample on the basis of the learning data.SELECTED DRAWING: Figure 1

Description

本発明は、データ処理装置、決定木生成方法、識別装置及びプログラムに関する。特に、細胞画像から生成された特徴量ベクトルを処理するデータ処理装置、決定木生成方法、識別装置及びプログラムに関する。 The present invention relates to a data processing device, a decision tree generation method, an identification device, and a program. In particular, the present invention relates to a data processing apparatus, a decision tree generation method, an identification apparatus, and a program that process a feature vector generated from a cell image.

患者から臓器の一部を検体として採取し、当該検体を薄く切断した断面を顕微鏡にて観察する病理診断が行われている。例えば、肝臓癌が疑われる患者から肝細胞を採取し、当該肝細胞を撮影することで得られる病理画像（肝病理画像）を医師が確認し、癌の悪性度（グレード）を判定することが行われている。 Pathological diagnosis is performed in which a part of an organ is collected from a patient as a specimen, and a cross section obtained by thinly cutting the specimen is observed with a microscope. For example, a doctor may confirm a pathological image (liver pathological image) obtained by collecting hepatocytes from a patient suspected of having liver cancer and photographing the hepatocytes, and determine the malignancy (grade) of cancer. Has been done.

しかし、上記のような病理診断自体の作業量は膨大であり、医師の負担が大きいものとなっている。そのため、医師の負担を軽減することを目的とした画像処理技術、情報処理技術等の技術開発が行われている。 However, the amount of work for the pathological diagnosis itself as described above is enormous, which places a heavy burden on the doctor. For this reason, technical developments such as image processing technology and information processing technology for reducing the burden on doctors have been carried out.

例えば、特許文献１には、病理画像から細胞核、空孔、細胞質、間質等を中心とするサブイメージを抽出すると同時に、細胞核の色情報を抽出し、両者を特徴候補として記憶することにより、より高い精度で腫瘍の有無、及び腫瘍の良性・悪性を判定する技術が開示されている。また、非特許文献１に、決定木の生成方法に関する詳細が開示されている。 For example, in Patent Document 1, by extracting a sub-image centered on cell nuclei, vacancies, cytoplasm, stroma and the like from a pathological image, simultaneously extracting color information of cell nuclei and storing both as feature candidates, A technique for determining the presence or absence of a tumor and the benign / malignant tumor with higher accuracy is disclosed. Further, Non-Patent Document 1 discloses details regarding a decision tree generation method.

特開２００６−１５３７４２号公報JP 2006-153742 A

Roman Timofeev、“Classification and Regression Trees (CART) Theory and Applications”、２００４年１２月２０日、［online］、［平成２７年６月２６日検索］、インターネット〈URL：http://edoc.hu-berlin.de/master/timofeev-roman-2004-12-20/PDF/timofeev.pdf〉Roman Timofeev, “Classification and Regression Trees (CART) Theory and Applications”, December 20, 2004, [online], [Search June 26, 2015], Internet <URL: http: //edoc.hu- berlin.de/master/timofeev-roman-2004-12-20/PDF/timofeev.pdf>

なお、上記先行技術文献の各開示を、本書に引用をもって繰り込むものとする。以下の分析は、本発明者らによってなされたものである。 Each disclosure of the above prior art document is incorporated herein by reference. The following analysis was made by the present inventors.

例えば、特許文献１に開示されるように、医師による病理診断をサポートする病理画像解析技術が存在する。しかし、上記技術は、患者が癌に罹患しているか否かの判定（癌、非癌の判定）に留まっており、上記の判定が下された理由、根拠を積極的に開示しようとするものではない。つまり、病理画像（細胞画像）に現れている種々の特徴からどのような理由、過程により判定結果が算出されているのかを、医師等は容易に理解することができない。細胞画像に現れている特徴と判定結果との関係が、医師等にとって理解しがたいものである場合、当該判定結果をどの程度信用して良いか分からず、医師等の負担を軽減するための技術が十分活用できているとは言い難い状況にある。 For example, as disclosed in Patent Document 1, there is a pathological image analysis technique that supports pathological diagnosis by a doctor. However, the above technique is limited to the determination of whether or not the patient is afflicted with cancer (determination of cancer or non-cancer), and the reason why the above determination is made, and the basis for actively disclosing the reason is not. That is, a doctor or the like cannot easily understand what reason and process the determination result is calculated from various features appearing in the pathological image (cell image). When the relationship between the characteristics appearing in the cell image and the judgment result is difficult for a doctor or the like to understand, it is difficult to understand how much the judgment result can be trusted, and to reduce the burden on the doctor etc. It is difficult to say that the technology is fully utilized.

本発明は、細胞画像の特徴量から得られる判定結果の根拠が容易に理解可能な識別規則を生成する、データ処理装置、決定木生成方法、識別装置及びプログラムを提供することを目的とする。 An object of the present invention is to provide a data processing device, a decision tree generation method, an identification device, and a program for generating an identification rule that can easily understand the basis of a determination result obtained from a feature amount of a cell image.

本発明の第１の視点によれば、細胞画像に与えられたラベルと、前記細胞画像から抽出された複数の特徴量と、を１組とする学習データを入力する入力部と、サンプルの前記ラベルに相当する情報を識別するための決定木を、前記学習データに基づいて生成する、決定木生成部と、を備える、データ処理装置が提供される。 According to a first aspect of the present invention, an input unit for inputting learning data including a label given to a cell image and a plurality of feature amounts extracted from the cell image as a set, and the sample of the sample There is provided a data processing apparatus comprising: a decision tree generation unit that generates a decision tree for identifying information corresponding to a label based on the learning data.

本発明の第２の視点によれば、細胞画像に与えられたラベルと、前記細胞画像から抽出された複数の特徴量と、を１組とする学習データを入力するステップと、サンプルの前記ラベルに相当する情報を識別するための決定木を、前記学習データに基づいて生成するステップと、を含む、決定木生成方法が提供される。 According to a second aspect of the present invention, a step of inputting learning data including a label given to a cell image and a plurality of feature amounts extracted from the cell image as a set, and the label of the sample Generating a decision tree for identifying information corresponding to the learning data based on the learning data.

本発明の第３の視点によれば、上記の決定木生成方法により生成された決定木を用いて、サンプルの識別を行う識別装置が提供される。 According to a third aspect of the present invention, there is provided an identification device for identifying a sample using the decision tree generated by the above-described decision tree generation method.

本発明の第４の視点によれば、細胞画像に与えられたラベルと、前記細胞画像から抽出された複数の特徴量と、を１組とする学習データを入力する処理と、サンプルの前記ラベルに相当する情報を識別するための決定木を、前記学習データに基づいて生成する処理と、をデータ処理装置に搭載されたコンピュータに実行させるプログラムが提供される。
なお、このプログラムは、コンピュータが読み取り可能な記憶媒体に記録することができる。記憶媒体は、半導体メモリ、ハードディスク、磁気記録媒体、光記録媒体等の非トランジェント（non-transient）なものとすることができる。本発明は、コンピュータプログラム製品として具現することも可能である。 According to a fourth aspect of the present invention, a process of inputting learning data including a label given to a cell image and a plurality of feature amounts extracted from the cell image, and the label of the sample There is provided a program for causing a computer mounted on a data processing apparatus to execute a process for generating a decision tree for identifying information corresponding to the above-described learning data based on the learning data.
This program can be recorded on a computer-readable storage medium. The storage medium may be non-transient such as a semiconductor memory, a hard disk, a magnetic recording medium, an optical recording medium, or the like. The present invention can also be embodied as a computer program product.

本発明の各視点によれば、細胞画像の特徴量から得られる判定結果の根拠が容易に理解可能な識別規則を生成することに寄与するデータ処理装置、決定木生成方法、識別装置及びプログラムが、提供される。 According to each aspect of the present invention, there is provided a data processing device, a decision tree generation method, an identification device, and a program that contribute to generating an identification rule that can easily understand the basis of a determination result obtained from a feature amount of a cell image. Provided.

一実施形態の概要を説明するための図である。It is a figure for demonstrating the outline | summary of one Embodiment. 第１の実施形態に係る病理画像処理システムの構成の一例を示す図である。It is a figure which shows an example of a structure of the pathological image processing system which concerns on 1st Embodiment. 学習データ生成装置の内部構成の一例を示す図である。It is a figure which shows an example of an internal structure of a learning data generation apparatus. 注視領域画像データの一例とラベル情報の一例を示す図である。It is a figure which shows an example of gaze area | region image data, and an example of label information. 特徴量ベクトル生成部が生成する特徴量を説明するための図である。It is a figure for demonstrating the feature-value which a feature-value vector production | generation part produces | generates. 細胞核領域の一例を示す図である。It is a figure which shows an example of a cell nucleus area | region. 特徴量ベクトル生成部による特徴量に対する統計処理を説明するための図である。It is a figure for demonstrating the statistical process with respect to the feature-value by a feature-value vector production | generation part. 学習データ生成装置が生成する学習データの一例を示す図である。It is a figure which shows an example of the learning data which a learning data generation apparatus produces | generates. データ処理装置の内部構成の一例を示す図である。It is a figure which shows an example of an internal structure of a data processor. 特徴量選択部が参照する第１の選択ポリシの一例を示す図である。It is a figure which shows an example of the 1st selection policy which a feature-value selection part refers. 第１の選択処理を実行した結果の学習データの一例を示す図である。It is a figure which shows an example of the learning data of the result of performing the 1st selection process. 特徴量選択部が参照する第２の選択ポリシの一例を示す図である。It is a figure which shows an example of the 2nd selection policy which a feature-value selection part refers. 特徴量選択部の第２選択処理の一例を示すフローチャートである。It is a flowchart which shows an example of the 2nd selection process of a feature-value selection part. 特徴量選択部が生成する決定木の一例を示す図である。It is a figure which shows an example of the decision tree which a feature-value selection part produces | generates. 特徴量の品質を説明するための図である。It is a figure for demonstrating the quality of a feature-value. 特徴量選択部による特徴量の絞り込みを説明するための図である。It is a figure for demonstrating the narrowing-down of the feature-value by a feature-value selection part. 第２の選択処理を実行した結果の学習データの一例を示す図である。It is a figure which shows an example of the learning data of the result of performing the 2nd selection process. データ処理装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of a data processor. 決定木の生成に使用する学習データの一例を示す図である。It is a figure which shows an example of the learning data used for the production | generation of a decision tree. 図１９に示す学習データから得られる決定木の一例を示す図である。It is a figure which shows an example of the decision tree obtained from the learning data shown in FIG. 決定木によるグレーディング結果の一例を示す図である。It is a figure which shows an example of the grading result by a decision tree. 第２の実施形態に係る病理画像処理システムの構成の一例を示す図である。It is a figure which shows an example of a structure of the pathological image processing system which concerns on 2nd Embodiment. 付随情報の一例を示す図である。It is a figure which shows an example of accompanying information. 第２の実施形態に係るデータ処理装置の内部構成の一例を示す図である。It is a figure which shows an example of the internal structure of the data processor which concerns on 2nd Embodiment. 決定木による分類結果の一例を示す図である。It is a figure which shows an example of the classification result by a decision tree. 第２の実施形態の解析部による解析結果の一例を示す図である。It is a figure which shows an example of the analysis result by the analysis part of 2nd Embodiment. 決定木による分類結果ごとの抗癌剤の有効性を説明するための図である。It is a figure for demonstrating the effectiveness of the anticancer agent for every classification result by a decision tree. 付随情報の別の一例を示す図である。It is a figure which shows another example of accompanying information. 第２の実施形態による解析部の解析結果をグラフ化した図である。It is the figure which plotted the analysis result of the analysis part by 2nd Embodiment. データ処理装置の別の内部構成の一例を示す図である。It is a figure which shows an example of another internal structure of a data processor. 注視領域ＩＤに対応する患者の癌再発情報をラベルとして用いる場合のラベル情報の一例を示す図である。It is a figure which shows an example of label information in the case of using the cancer recurrence information of the patient corresponding to gaze area ID as a label.

初めに、一実施形態の概要について説明する。なお、この概要に付記した図面参照符号は、理解を助けるための一例として各要素に便宜上付記したものであり、この概要の記載はなんらの限定を意図するものではない。 First, an outline of one embodiment will be described. Note that the reference numerals of the drawings attached to the outline are attached to the respective elements for convenience as an example for facilitating understanding, and the description of the outline is not intended to be any limitation.

一実施形態に係るデータ処理装置１００は、入力部１０１と、決定木生成部１０２と、を備える。入力部１０１は、細胞画像に与えられたラベルと、細胞画像から抽出された複数の特徴量と、を１組とする学習データを入力する。決定木生成部１０２は、サンプルのラベルに相当する情報を識別するための決定木を、学習データに基づいて生成する。 The data processing apparatus 100 according to an embodiment includes an input unit 101 and a decision tree generation unit 102. The input unit 101 inputs learning data including a label given to the cell image and a plurality of feature amounts extracted from the cell image as a set. The decision tree generation unit 102 generates a decision tree for identifying information corresponding to the sample label based on the learning data.

データ処理装置１００は、細胞画像を特徴量付ける複数の特徴量（特徴量ベクトル）を受け付ける。データ処理装置１００は、当該特徴量を用いて、細胞画像に与えられたラベル（例えば、癌細胞のグレード）を識別するための決定木を生成する。決定木は、葉が分類（クラスラベル）を表し、枝がその分類に至るまでの根拠を示す木構造を有する。従って、細胞画像のグレーディング等に利用する識別規則を決定木により生成することで、医師等は当該識別規則による判定結果、予測結果の根拠を容易に理解することができる。 The data processing apparatus 100 accepts a plurality of feature amounts (feature amount vectors) that add feature amounts to cell images. The data processing apparatus 100 generates a decision tree for identifying a label (for example, cancer cell grade) given to the cell image using the feature amount. The decision tree has a tree structure in which leaves represent classifications (class labels) and branches indicate the basis for reaching the classification. Therefore, by generating an identification rule used for grading a cell image or the like using a decision tree, a doctor or the like can easily understand the basis of the determination result and the prediction result based on the identification rule.

以下に具体的な実施の形態について、図面を参照してさらに詳しく説明する。なお、各実施形態において同一構成要素には同一の符号を付し、その説明を省略する。 Hereinafter, specific embodiments will be described in more detail with reference to the drawings. In addition, in each embodiment, the same code | symbol is attached | subjected to the same component and the description is abbreviate | omitted.

［第１の実施形態］
第１の実施形態について、図面を用いてより詳細に説明する。 [First Embodiment]
The first embodiment will be described in more detail with reference to the drawings.

図２は、第１の実施形態に係る病理画像処理システムの構成の一例を示す図である。図２を参照すると、病理画像処理システムには、学習データ生成装置１０と、データ処理装置２０と、識別装置３０と、が含まれる。 FIG. 2 is a diagram illustrating an example of the configuration of the pathological image processing system according to the first embodiment. Referring to FIG. 2, the pathological image processing system includes a learning data generation device 10, a data processing device 20, and an identification device 30.

なお、第１の実施形態では、患者の肝臓から採取した細胞から取得される細胞画像をシステムの対象として説明する。但し、細胞及び臓器を限定する趣旨でではなく、他の臓器から採取した細胞でも良いことは勿論である。 In the first embodiment, a cell image acquired from cells collected from a patient's liver will be described as a system target. However, it is not intended to limit the cells and organs, but of course, cells collected from other organs may be used.

学習データ生成装置１０は、細胞画像から抽出された注視領域（ＲＯＩ；Region Of Interest）に係る画像データ（以下、注視領域画像データと表記する）と、上記注視領域に対応するグレードを含むラベル情報と、を入力する。 The learning data generation device 10 includes label data including image data (hereinafter referred to as gaze area image data) related to a gaze area (ROI) extracted from a cell image, and a grade corresponding to the gaze area. And enter.

医師等により取得された細胞画像の一部を顕微鏡に搭載されたＣＣＤ（Charge Coupled Device）カメラにて撮像して得られる画像が注視領域画像データである。 An image obtained by imaging a part of a cell image acquired by a doctor or the like with a CCD (Charge Coupled Device) camera mounted on a microscope is gaze area image data.

ラベル情報に含まれる注視領域画像データのグレード（注視領域画像データに与えられるラベル）は、各注視領域画像データを医師が確認し、当該医師の知見に基づきグレード０（Ｇ０）からグレード４（Ｇ４）の間で定められるものとする。なお、第１の実施形態では、グレードが整数値の場合を例に説明するが、グレードは必ずしも整数でなくてもよく、例えばグレード２．５などとしてもよい。この場合は、回帰決定木を用いることによって、整数の場合と同様な手順で分類を実行可能である。あるいは、医師によるグレードが整数ではない場合には、学習データ生成装置１０は、小数点以下を切り上げる、切り下げる、四捨五入する等の処理によりグレードを整数に変更してもよい。 The grade of the gaze area image data included in the label information (label given to the gaze area image data) is checked by the doctor, and based on the knowledge of the doctor, the grade 0 (G0) to the grade 4 (G4 ). In the first embodiment, the case where the grade is an integer value will be described as an example. However, the grade is not necessarily an integer, and may be grade 2.5, for example. In this case, the classification can be executed in the same procedure as in the case of an integer by using a regression decision tree. Alternatively, when the grade by the doctor is not an integer, the learning data generation device 10 may change the grade to an integer by processing such as rounding up, rounding down, or rounding off after the decimal point.

学習データ生成装置１０は、複数の注視領域画像データと、当該複数の注視領域画像データそれぞれに対応するラベル情報と、を入力する。 The learning data generation device 10 inputs a plurality of gaze area image data and label information corresponding to each of the plurality of gaze area image data.

学習データ生成装置１０は、注視領域画像データとラベル情報に基づいて学習データを生成し、データ処理装置２０に出力する。 The learning data generation device 10 generates learning data based on the gaze area image data and the label information, and outputs the learning data to the data processing device 20.

データ処理装置２０は、入力した学習データに基づき、肝細胞のグレーディング（格付け）を行うための識別規則（識別モデル、識別ルール又は識別関数）を生成する。より具体的には、データ処理装置２０は、入力した学習データに基づき、決定木を生成する。データ処理装置２０が生成した決定木（識別規則）は識別装置３０に提供される。 The data processing device 20 generates an identification rule (identification model, identification rule or identification function) for grading (rating) hepatocytes based on the input learning data. More specifically, the data processing device 20 generates a decision tree based on the input learning data. The decision tree (identification rule) generated by the data processing device 20 is provided to the identification device 30.

識別装置３０は、グレーディングが行われていないサンプルの特徴量（特徴量ベクトル）を入力する。識別装置３０は、データ処理装置２０から提供された決定木を予測モデルとして用いて、上記入力した特徴量に対する応答（決定木の葉に付されたクラスラベル）を出力する（識別結果を出力する）。即ち、データ処理装置２０は、サンプルのグレード（ラベルに相当する情報）を識別するための決定木を学習データに基づき生成する。また、識別装置３０は、データ処理装置２０が生成した決定木を用いて、サンプルデータのグレーディングを行う。 The identification device 30 inputs a feature amount (feature amount vector) of a sample that has not been graded. The identification device 30 uses the decision tree provided from the data processing device 20 as a prediction model and outputs a response (class label attached to the leaf of the decision tree) to the input feature amount (outputs the identification result). That is, the data processing device 20 generates a decision tree for identifying the grade of the sample (information corresponding to the label) based on the learning data. Further, the identification device 30 performs grading of sample data using the decision tree generated by the data processing device 20.

図３は、学習データ生成装置１０の内部構成の一例を示す図である。図３を参照すると、学習データ生成装置１０は、入力部１１と、特徴量ベクトル生成部１２と、学習データ出力部１３と、ＨＤＤ（Hard Disk Drive）等からなる記憶部１４と、を備える。なお、学習データ生成装置１０を操作するための操作デバイス（キーボード、マウス等）や表示デバイスの図示は省略している。また、入力部１１を初めとする各部は、記憶部１４にアクセスし、データの書き込み、読み出しが可能に構成されている。 FIG. 3 is a diagram illustrating an example of an internal configuration of the learning data generation device 10. Referring to FIG. 3, the learning data generation apparatus 10 includes an input unit 11, a feature vector generation unit 12, a learning data output unit 13, and a storage unit 14 including an HDD (Hard Disk Drive). Note that illustrations of operation devices (keyboard, mouse, etc.) and display devices for operating the learning data generation apparatus 10 are omitted. Each unit including the input unit 11 is configured to access the storage unit 14 and write and read data.

入力部１１は、上述の注視領域画像データとラベル情報を入力する手段である。各注視領域画像データには識別子（ＩＤ；Identifier）が与えられており、入力部１１は注視領域画像データと当該画像データを識別する識別子（以下、注視領域ＩＤと表記する）を入力する。例えば、入力部１１は、図４（ａ）に示すような複数の注視領域画像データを入力する。なお、入力部１１が入力する注視領域画像データは、グレースケール画像であってもカラー画像であってもよく、画像の形式（画像の階調、色彩のフォーマット等）に限定はない。 The input unit 11 is means for inputting the above-described gaze area image data and label information. Each gaze area image data is provided with an identifier (ID), and the input unit 11 inputs gaze area image data and an identifier for identifying the image data (hereinafter referred to as gaze area ID). For example, the input unit 11 inputs a plurality of gaze area image data as shown in FIG. The gaze area image data input by the input unit 11 may be a grayscale image or a color image, and there is no limitation on the image format (image gradation, color format, etc.).

ラベル情報は、注視領域ＩＤと医師等により判定されたグレードが１組となるテーブル情報として入力される。例えば、入力部１１は、図４（ｂ）に示すような複数の注視領域画像データそれぞれに対応する注視領域ＩＤにより関連付けられたグレードを含むラベル情報を入力する。 The label information is input as table information in which the gaze area ID and the grade determined by a doctor or the like are one set. For example, the input unit 11 inputs label information including a grade associated with a gaze area ID corresponding to each of a plurality of gaze area image data as illustrated in FIG.

入力部１１は、入力した複数の注視領域画像データと対応するラベル情報を、特徴量ベクトル生成部１２に引き渡す。 The input unit 11 delivers the label information corresponding to the plurality of input gaze area image data to the feature vector generation unit 12.

特徴量ベクトル生成部１２は、注視領域画像データを特徴付ける特徴量ベクトルを算出する。なお、特徴量ベクトル生成部１２は、１枚の注視領域画像データから複数種類の特徴量を生成し、且つ、各種類の特徴量に対する統計処理により複数の特徴量からなる特徴量ベクトルを生成する。第１の実施形態では、特徴量ベクトル生成部１２は、図５に示すような１２種類の特徴量を生成するものとする。 The feature quantity vector generation unit 12 calculates a feature quantity vector that characterizes the gaze area image data. Note that the feature quantity vector generation unit 12 generates a plurality of types of feature quantities from a single gaze area image data, and generates a feature quantity vector including a plurality of feature quantities by statistical processing for each type of feature quantity. . In the first embodiment, it is assumed that the feature quantity vector generation unit 12 generates 12 types of feature quantities as shown in FIG.

初めに、特徴量ベクトル生成部１２は、入力した注視領域画像データに含まれる細胞核の領域（以下、細胞核領域と表記する）を抽出する。例えば、図４を参照すると、特徴量ベクトル生成部１２は、細胞核領域２０１、２０２のような領域を順次抽出する。その際、特徴量ベクトル生成部１２は、細胞核領域とそれ以外の領域との間の輝度差（コントラスト）等を利用して細胞核領域を抽出する。 First, the feature vector generation unit 12 extracts a cell nucleus region (hereinafter referred to as a cell nucleus region) included in the input gaze region image data. For example, referring to FIG. 4, the feature vector generation unit 12 sequentially extracts regions such as cell nucleus regions 201 and 202. At this time, the feature vector generation unit 12 extracts a cell nucleus region using a luminance difference (contrast) between the cell nucleus region and other regions.

次に、特徴量ベクトル生成部１２は、抽出した細胞核領域に対して特徴量算出処理を施すことで各種の特徴量を算出する。ここでは、例えば、図６に示すような細胞核領域が抽出されたものとする。この場合、細胞核の大きさ（細胞核の面積；特徴量Ｆ１、図５参照）を算出する際には、特徴量ベクトル生成部１２は、図６に示す灰色の領域（細胞核領域）を構成する画素の数を計数する。その後、特徴量ベクトル生成部１２は、画素の計数値に所定の定数（１画素の面積に相当する細胞の大きさ）を乗算し、その結果を特徴量Ｆ１とする。あるいは、特徴量ベクトル生成部１２は、細胞核領域を構成する画素数（ピクセル数）を特徴量Ｆ１としてもよい。 Next, the feature quantity vector generation unit 12 calculates various feature quantities by performing a feature quantity calculation process on the extracted cell nucleus region. Here, for example, a cell nucleus region as shown in FIG. 6 is extracted. In this case, when calculating the size of the cell nucleus (the area of the cell nucleus; the feature amount F1, see FIG. 5), the feature amount vector generation unit 12 is a pixel constituting the gray region (cell nucleus region) shown in FIG. Count the number of Thereafter, the feature vector generation unit 12 multiplies the count value of the pixel by a predetermined constant (the size of the cell corresponding to the area of one pixel), and sets the result as the feature F1. Alternatively, the feature quantity vector generation unit 12 may set the number of pixels (number of pixels) constituting the cell nucleus region as the feature quantity F1.

また、特徴量ベクトル生成部１２は、細胞核領域の境界をなす画素（図６に示す境界線２１１上の画素）の数を計数し、その結果に基づき細胞核の周長（特徴量Ｆ２）を算出する。 Further, the feature vector generation unit 12 counts the number of pixels (pixels on the boundary line 211 shown in FIG. 6) that form the boundary of the cell nucleus region, and calculates the perimeter of the cell nucleus (feature amount F2) based on the result. To do.

細胞核の大きさ（面積）とその周長が得られると、下記の式（１）により、特徴量ベクトル生成部１２は、細胞核の円形度（特徴量Ｆ３）を算出することができる。

但し、Ｓが細胞核の面積であり、Ｌは細胞核の周長である。 When the size (area) of the cell nucleus and the circumference thereof are obtained, the feature vector generation unit 12 can calculate the circularity (feature F3) of the cell nucleus by the following equation (1).

Where S is the area of the cell nucleus and L is the circumference of the cell nucleus.

特徴量ベクトル生成部１２は、細胞核領域を楕円形状と扱い、その長軸（例えば、図６に示す長軸２１２）をなす画素数を計数し、その結果から細胞核の楕円長軸長（特徴量Ｆ４）を算出できる。また、特徴量ベクトル生成部１２は、楕円形状の短軸（図６に示す短軸２１３）をなす画素数を計数し、その結果から細胞核の楕円短軸長（特徴量Ｆ５）を算出できる。さらに、特徴量ベクトル生成部１２は、細胞核の楕円長軸長に対する楕円短軸長の比を算出することで、特徴量Ｆ６を算出する。 The feature vector generation unit 12 treats the cell nucleus region as an elliptical shape, counts the number of pixels forming the major axis (for example, the major axis 212 shown in FIG. 6), and based on the result, calculates the elliptical major axis length (feature amount) of the cell nucleus. F4) can be calculated. Further, the feature quantity vector generation unit 12 counts the number of pixels forming the elliptical short axis (short axis 213 shown in FIG. 6), and can calculate the elliptic short axis length (feature quantity F5) of the cell nucleus from the result. Further, the feature quantity vector generation unit 12 calculates the feature quantity F6 by calculating the ratio of the elliptical minor axis length to the elliptical major axis length of the cell nucleus.

特徴量ベクトル生成部１２は、細胞核領域やその周辺の画素値（濃度、輝度値）を用いて、特徴量Ｆ７〜Ｆ１１を算出する。例えば、特徴量ベクトル生成部１２は、細胞核が染色されている場合には、細胞核領域の蛍光領域と非蛍光領域を最も効率よく分離できる閾値を算出し、当該閾値を特徴量Ｆ７として算出する。 The feature quantity vector generation unit 12 calculates feature quantities F7 to F11 using pixel values (density and luminance values) around the cell nucleus region. For example, when the cell nucleus is stained, the feature vector generation unit 12 calculates a threshold that can most efficiently separate the fluorescent region and the non-fluorescent region in the cell nucleus region, and calculates the threshold as the feature F7.

また、特徴量ベクトル生成部１２は、細胞核領域の画素値からグレーレベルの同時生起行例（ＧＬＣＭ；Gray Level Co-occurrence Matrix）を算出し、当該ＧＬＣＭ値から細胞核領域の角度別２次モーメント（ＡＳＭ；Angular Second Moment、特徴量Ｆ８）、コントラスト（特徴量Ｆ９）、一様性（特徴量Ｆ１０）、エントロピー（ＥＮＴ；Entropy、特徴量Ｆ１１）等の特徴量を算出できる。さらに、特徴量ベクトル生成部１２は、細胞核領域の核密度（ＮＤｅｎｓ；Nuclear Density）を計算することで特徴量（Ｆ１２）を算出できる。 Further, the feature vector generation unit 12 calculates a gray level co-occurrence matrix (GLCM) from the pixel value of the cell nucleus region, and uses the angular moment of the cell nucleus region from the GLCM value (GLCM: Gray Level Co-occurrence Matrix). ASM: Angular Second Moment, feature quantity F8), contrast (feature quantity F9), uniformity (feature quantity F10), entropy (ENT; Entropy, feature quantity F11), and other feature quantities can be calculated. Further, the feature vector generation unit 12 can calculate the feature (F12) by calculating the nuclear density (NDens; Nuclear Density) of the cell nucleus region.

特徴量ベクトル生成部１２は、注視領域画像データを特徴付ける特徴量として、少なくとも、細胞核の大きさ（特徴量Ｆ１）、細胞核の円形度（特徴量Ｆ３）、細胞核のコントラスト（特徴量Ｆ９）、細胞核の一様性（特徴量Ｆ１０）を生成する。 The feature quantity vector generation unit 12 includes at least the size of the cell nucleus (feature quantity F1), the roundness of the cell nucleus (feature quantity F3), the contrast of the cell nucleus (feature quantity F9), the cell nucleus as the feature quantity characterizing the gaze area image data. Is generated (feature value F10).

特徴量ベクトル生成部１２は、注視領域画像データに含まれる全ての細胞核（細胞核領域）について、注視領域画像データを特徴付ける特徴量Ｆ１〜Ｆ１２を算出する。その結果、例えば、１枚の注視領域画像データに１００個の細胞核領域が含まれていれば、特徴量Ｆ１〜Ｆ１２のそれぞれについて１００個の特徴量が算出される。 The feature quantity vector generation unit 12 calculates feature quantities F1 to F12 that characterize the gaze area image data for all cell nuclei (cell nucleus areas) included in the gaze area image data. As a result, for example, if 100 cell nucleus regions are included in one gaze region image data, 100 feature amounts are calculated for each of the feature amounts F1 to F12.

特徴量ベクトル生成部１２は、１枚の注視領域画像データから算出した複数の特徴量それぞれについて統計処理を施すことで、当該特徴量を代表する複数の指標を算出する。なお、以降の説明において、特定の特徴量Ｆを代表する統計値（指標）をハイフンと数字を用いて表記する。例えば、図５を参照すると、細胞核の大きさに係る特徴量Ｆ１を例に取ると、細胞核の大きさは、Ｆ１−１〜Ｆ１−５により代表される。なお、各特徴量から算出される複数の統計値もまた、細胞核の特徴を特徴付ける値に相違はないので、特徴量と表記する。例えば、５つの特徴量Ｆ１−１〜Ｆ１−５は、特徴量Ｆ１を代表する統計値である。 The feature vector generation unit 12 performs statistical processing on each of a plurality of feature amounts calculated from a single gaze area image data, thereby calculating a plurality of indexes representing the feature amounts. In the following description, a statistical value (index) representing a specific feature amount F is described using a hyphen and a number. For example, referring to FIG. 5, taking the feature amount F1 related to the size of the cell nucleus as an example, the size of the cell nucleus is represented by F1-1 to F1-5. A plurality of statistical values calculated from each feature amount are also referred to as feature amounts because there is no difference in values that characterize the features of cell nuclei. For example, the five feature amounts F1-1 to F1-5 are statistical values representing the feature amount F1.

特徴量ベクトル生成部１２は、例えば、上述のようにして算出した特徴量Ｆ１に関する度数分布（ヒストグラム）を生成する。ここでは、例えば、図７（ａ）に示すよう度数分布が得られたものとする。次に、特徴量ベクトル生成部１２は、生成した度数分布から累積分布（図７（ｂ）参照）を生成し、当該累積分布から得られるパーセンタイル値を計算することで、細胞核の大きさに関する特徴量Ｆ１−１〜Ｆ１−５を算出する。 For example, the feature vector generation unit 12 generates a frequency distribution (histogram) relating to the feature F1 calculated as described above. Here, for example, it is assumed that a frequency distribution is obtained as shown in FIG. Next, the feature vector generation unit 12 generates a cumulative distribution (see FIG. 7B) from the generated frequency distribution, and calculates the percentile value obtained from the cumulative distribution, whereby the feature relating to the size of the cell nucleus. The amounts F1-1 to F1-5 are calculated.

他の特徴量Ｆ２〜Ｆ１２に関しても、個別の特徴量を算出した後、当該特徴量の度数分布、累積分布を生成することで、各特徴量を代表する複数の特徴量が生成される。特徴量ベクトル生成部１２は、上記のような処理を繰り返すことで、１枚の注視領域画像データから６０（１２×５）個の特徴量を算出する。即ち、特徴量ベクトル生成部１２は、各注視領域画像データを特徴付ける特徴量ベクトルを算出する。 Also for the other feature amounts F2 to F12, after calculating individual feature amounts, a frequency distribution and a cumulative distribution of the feature amounts are generated, thereby generating a plurality of feature amounts representing each feature amount. The feature quantity vector generation unit 12 calculates 60 (12 × 5) feature quantities from one gaze area image data by repeating the above processing. That is, the feature vector generation unit 12 calculates a feature vector that characterizes each gaze area image data.

特徴量ベクトル生成部１２は、入力部１１から取得したラベル情報と、注視領域画像データごとに算出した複数の特徴量と、を学習データ出力部１３に引き渡す。 The feature vector generation unit 12 delivers the label information acquired from the input unit 11 and a plurality of feature amounts calculated for each gaze area image data to the learning data output unit 13.

学習データ出力部１３は、特徴量ベクトル生成部１２から取得した情報に基づき、学習データを生成する。具体的には、学習データ出力部１３は、注視領域ＩＤと、ラベル情報（注視領域画像データのグレード）と、特徴量ベクトル（６０個の特徴量）と、を結合して得られる情報を学習データとして生成する（図８参照）。即ち、学習データ出力部１３は、注視領域画像データを識別する識別子（注視領域ＩＤ）と、各注視領域画像データに与えられたラベル（細胞画像のグレード）と、注視領域画像データから抽出された複数の特徴量（特徴量ベクトル）と、を１組とする学習データを生成し、出力する。 The learning data output unit 13 generates learning data based on the information acquired from the feature vector generation unit 12. Specifically, the learning data output unit 13 learns information obtained by combining a gaze area ID, label information (gaze area image data grade), and feature quantity vectors (60 feature quantities). Data is generated (see FIG. 8). That is, the learning data output unit 13 is extracted from the identifier (gaze area ID) for identifying the gaze area image data, the label (grade of the cell image) given to each gaze area image data, and the gaze area image data. Learning data including a plurality of feature quantities (feature quantity vectors) as a set is generated and output.

学習データ出力部１３は、生成した学習データをデータ処理装置２０に出力する。なお、学習データ生成装置１０からデータ処理装置２０への学習データの入出力は、ＵＳＢ（Universal Serial Bus）メモリ等の外部記憶装置を用いても良いし、ネットワーク、データベースサーバ等を用いても良い。 The learning data output unit 13 outputs the generated learning data to the data processing device 20. Note that learning data may be input / output from the learning data generation device 10 to the data processing device 20 using an external storage device such as a USB (Universal Serial Bus) memory, a network, a database server, or the like. .

図９は、データ処理装置２０の内部構成の一例を示す図である。図９を参照すると、データ処理装置２０は、入力部２１と、特徴量選択部２２と、決定木生成部２３と、出力部２４と、ＨＤＤ等からなる記憶部２５と、を備える。なお、データ処理装置２０を操作するための操作デバイス（キーボード、マウス等）や表示デバイスの図示は省略している。また、入力部２１を初めとする各部は、記憶部２５にアクセスし、データの書き込み、読み出しが可能に構成されている。 FIG. 9 is a diagram illustrating an example of an internal configuration of the data processing device 20. Referring to FIG. 9, the data processing device 20 includes an input unit 21, a feature amount selection unit 22, a decision tree generation unit 23, an output unit 24, and a storage unit 25 including an HDD or the like. Note that illustrations of operation devices (keyboard, mouse, etc.) and display devices for operating the data processing device 20 are omitted. Each unit including the input unit 21 is configured to access the storage unit 25 and write and read data.

入力部２１は、学習データ生成装置１０が出力する学習データを入力する手段である。入力部２１は、取得した学習データを特徴量選択部２２に引き渡す。 The input unit 21 is means for inputting learning data output from the learning data generation device 10. The input unit 21 passes the acquired learning data to the feature amount selection unit 22.

特徴量選択部２２は、取得した学習データに含まれる特徴量ベクトル（複数の特徴量；上記の例では６０個の特徴量）から、決定木生成部２３による決定木の生成に用いられる特徴量を選択する手段である。具体的には、特徴量選択部２２は、第１の選択処理と、第２の選択処理と、を実行し、最終的に決定木生成部２３が利用する特徴量を絞り込む。 The feature quantity selection unit 22 uses the feature quantity vectors (a plurality of feature quantities; 60 feature quantities in the above example) included in the acquired learning data to be used for generation of a decision tree by the decision tree generation unit 23. Is a means for selecting. Specifically, the feature quantity selection unit 22 executes a first selection process and a second selection process, and finally narrows down the feature quantities used by the decision tree generation unit 23.

特徴量選択部２２は、記憶部２５に格納された第１の選択ポリシを参照しつつ、第１の選択処理を実行する。例えば、第１の選択ポリシとして図１０に示すような情報が、記憶部２５に格納されている。 The feature quantity selection unit 22 executes the first selection process while referring to the first selection policy stored in the storage unit 25. For example, information as shown in FIG. 10 is stored in the storage unit 25 as the first selection policy.

図１０を参照すると、第１の選択ポリシには利用する特徴量の種別は特徴量Ｆ１〜Ｆ１１であることが示されているので、特徴量選択部２２は、学習データの特徴量ベクトルに含まれる特徴量Ｆ１〜Ｆ１２のうち、特徴量Ｆ１２を除外した特徴量Ｆ１〜Ｆ１１を選択する。 Referring to FIG. 10, the first selection policy indicates that the types of feature quantities to be used are feature quantities F1 to F11. Therefore, the feature quantity selection unit 22 is included in the feature quantity vector of the learning data. Among the feature amounts F1 to F12 to be selected, the feature amounts F1 to F11 excluding the feature amount F12 are selected.

さらに、第１の選択ポリシには、特徴量を代表する複数の統計値のうち、いずれを採用するかに関する情報が含まれ、当該情報は「中央値（メディアン）」と記載されているので、特徴量選択部２２は、中央値に該当する特徴量を選択する。具体的には、図８を参照すると、細胞核の大きさに関する特徴量Ｆ１を代表する特徴量Ｆ１−１〜Ｆ１−５のうち、特徴量Ｆ１−３が中央値に該当（図７（ｂ）参照）するので、特徴量選択部２２は、細胞核の大きさに関する特徴量Ｆ１を代表する特徴量として特徴量Ｆ１−３を選択する。このように、特徴量選択部２２は、第１の選択ポリシに従い、各特徴量を代表する複数の特徴量（統計値）から１つの特徴量を選択する。 Further, the first selection policy includes information on which one of the plurality of statistical values representing the feature value is adopted, and the information is described as “median”, The feature amount selection unit 22 selects a feature amount corresponding to the median value. Specifically, referring to FIG. 8, among the feature amounts F1-1 to F1-5 representing the feature amount F1 related to the size of the cell nucleus, the feature amount F1-3 corresponds to the median value (FIG. 7B). Therefore, the feature amount selection unit 22 selects the feature amount F1-3 as the feature amount representing the feature amount F1 related to the size of the cell nucleus. As described above, the feature quantity selection unit 22 selects one feature quantity from a plurality of feature quantities (statistical values) representing each feature quantity in accordance with the first selection policy.

特徴量選択部２２が、図８に示す学習データに対して、第１の選択処理を実行したが結果が図１１に示されている。図１１に示すように、特徴量選択部２２は、第１の選択処理を実行することで、６０個の特徴量から１１個の特徴量に絞り込みを行っている。 The feature quantity selection unit 22 performs the first selection process on the learning data shown in FIG. 8, and the result is shown in FIG. As illustrated in FIG. 11, the feature amount selection unit 22 narrows down from 60 feature amounts to 11 feature amounts by executing the first selection process.

特徴量選択部２２は、第１の選択処理が終了した学習データに対し、記憶部２５に格納された第２の選択ポリシに従い、第２の選択処理を実行する。 The feature quantity selection unit 22 executes the second selection process on the learning data for which the first selection process has been completed, according to the second selection policy stored in the storage unit 25.

例えば、第２の選択ポリシとして図１２に示すような情報が、記憶部２５に格納されている。第２の選択ポリシの「分析モデル＝決定木」は、学習データ（例えば、図１１に示す学習データ）に対し、ラベル（グレード）を目的変数（被説明変数）とし、特徴量を説明変数として決定木による分析モデルを生成することを意味する。そして、当該決定木による分析モデルを用いて、特徴量（説明変数）の重要度を評価し、重要度が低い２つの特徴量を削除し、最終的に特徴量を４つに絞り込むことを、第２の選択ポリシは示す。 For example, information as illustrated in FIG. 12 is stored in the storage unit 25 as the second selection policy. The “analysis model = decision tree” of the second selection policy has a label (grade) as an objective variable (explained variable) and a feature amount as an explanatory variable for learning data (for example, learning data shown in FIG. 11). It means to generate an analysis model by decision tree. Then, using the analysis model based on the decision tree, the importance of the feature quantity (explanatory variable) is evaluated, two feature quantities with low importance are deleted, and finally the feature quantity is narrowed down to four. The second selection policy is shown.

ここでは、図１３を参照しつつ、特徴量選択部２２による上記第２の選択処理について説明する。 Here, the second selection process by the feature amount selection unit 22 will be described with reference to FIG.

初めに、特徴量選択部２２は記憶部２５に格納された第２の選択ポリシを参照する（ステップＳ１０１）。 First, the feature quantity selection unit 22 refers to the second selection policy stored in the storage unit 25 (step S101).

次に、特徴量選択部２２は、第２の選択ポリシに記載された「分析モデル＝決定木」に従い、第１の選択処理が終了した学習データに基づき決定木の生成を行う（ステップＳ１０２）。なお、特徴量選択部２２や後述する決定木生成部２３による決定木の生成には、ＣＡＲＴ（Classification And Regression Trees）アルゴリズムやＩＤ（Iterative Dichotomiser）３等のアルゴリズムを使用することができる。また、ジニ係数やエントロピーを計算することで決定木の分岐条件を生成することができる。 Next, the feature quantity selection unit 22 generates a decision tree based on the learning data for which the first selection process has been completed in accordance with “analysis model = decision tree” described in the second selection policy (step S102). . It should be noted that an algorithm such as CART (Classification And Regression Trees) algorithm or ID (Iterative Dichotomiser) 3 can be used for generation of a decision tree by the feature quantity selection unit 22 or a decision tree generation unit 23 described later. In addition, a decision tree branch condition can be generated by calculating the Gini coefficient and entropy.

ここでは、例えば、図１４に示すような決定木が得られたものとする。なお、図１４に示す決定木の分岐条件３０１〜３０７において、変数Ｘ、Ｙ、Ｚは特徴量Ｆ１−３〜Ｆ１１−３のいずれかである。 Here, for example, it is assumed that a decision tree as shown in FIG. 14 is obtained. In the decision tree branch conditions 301 to 307 shown in FIG. 14, the variables X, Y, and Z are any one of the feature amounts F1-3 to F11-3.

次に、特徴量選択部２２は、生成された決定木の分岐条件（図１４の例では、分岐条件３０１〜３０７）に含まれる各説明変数（特徴量）それぞれの品質（Quality）を算出する（ステップＳ１０３）。具体的には、特徴量選択部２２は、下記の式（２）を用いて、各説明変数のジニ係数Ｇを算出する。

但し、式（２）のＰｉはクラスｉの確率を示す。 Next, the feature quantity selection unit 22 calculates the quality of each explanatory variable (feature quantity) included in the branch condition (branch conditions 301 to 307 in the example of FIG. 14) of the generated decision tree. (Step S103). Specifically, the feature quantity selection unit 22 calculates the Gini coefficient G of each explanatory variable using the following equation (2).

In the equation (2), Pi represents the probability of class i.

ジニ係数Ｇを計算した特徴量選択部２２は、ジニ係数Ｇが最小となる変数Ｘの値を最適な分割点（Best Split Point）Ｘｓと定める。あるいは、特徴量選択部２２は、例えば、図１４に示すような決定木生成の際にジニ係数を利用していれば、算出したジニ係数を利用して分割点Ｘｓを特定してもよい。 The feature quantity selection unit 22 that has calculated the Gini coefficient G determines the value of the variable X that minimizes the Gini coefficient G as the best split point (Best Split Point) Xs. Alternatively, for example, if the Gini coefficient is used when generating a decision tree as illustrated in FIG. 14, the feature amount selection unit 22 may specify the division point Xs using the calculated Gini coefficient.

特徴量選択部２２は、下記の式（３）を用いて、分割点Ｘｓにおける変数Ｘの品質Ｑ（Ｘ、Ｘｓ）を算出する。

・・・（３） The feature quantity selection unit 22 calculates the quality Q (X, Xs) of the variable X at the dividing point Xs using the following equation (3).

... (3)

式（３）のＮは変数の総数、Ｎ_Ｌは左側の子ノードに分類される変数の数、Ｎ_Ｒは右側の子ノードに分類される変数の数を示す。式（３）のＩ_｛Ａ｝は指示関数（Indicator Function）を表し、条件Ａが成立する場合に「１」、それ以外（条件Ａが不成立）の場合に「０」を出力する関数である。 In Expression (3), N represents the total number of variables, N _L represents the number of variables classified into the left child node, and N _R represents the number of variables classified into the right child node. I _{{A} in} Equation (3) represents an indicator function, and is a function that outputs “1” when the condition A is satisfied, and outputs “0” when the condition A is not satisfied (the condition A is not satisfied). .

式（３）に示されたＩ_{｛Ｃｉ＝Ｎｅｇ｝}は、クラス（カテゴリ）ＣｉがＮｅｇａｔｉｖｅの場合に「１」、それ以外の場合に「０」を出力する指示関数である。また、Ｉ_{｛Ｘｉ＞Ｘｓ｝}は、ＸｉがＸｓよりも大きい場合に「１」、それ以外の場合に「０」を出力する指示関数である。式（３）の記載、Ｉ_｛Ａ｝Ｉ_｛Ｂ｝は条件Ａと条件Ｂが同時に成立する場合に「１」を出力し、それ以外の場合には「０」を出力することを意味する。従って、式（３）のＩ_{｛Ｃｉ＝Ｎｅｇ｝}Ｉ_{｛Ｘｉ≦Ｘｓ｝}の総和は、Ｎｅｇａｔｉｖｅクラスに属し、且つ、その特徴量がＸｓ以下のデータに関する和となる。式（３）に示される他の指示関数の積も同様の意味を有する。例えば、式（３）のＩ_{｛Ｃｉ＝Ｐｏｓ｝}Ｉ_{｛Ｘｉ≦Ｘｓ｝}の総和は、Ｐｏｓｉｔｉｖｅクラスに属し、且つ、その特徴量がＸｓ以下のデータに関する和となる。 I _{{Ci = Neg}} shown in Expression (3) is an instruction function that outputs “1” when the class (category) Ci is negative, and outputs “0” otherwise. I _{{Xi> Xs}} is an instruction function that outputs “1” when Xi is larger than Xs, and outputs “0” otherwise. In the expression (3), I _{A} I _{B} means that “1” is output when the conditions A and B are satisfied simultaneously, and “0” is output otherwise. . Therefore, the total sum of I _{{Ci = Neg}} I _{{Xi ≦ Xs}} in Expression (3) is a sum related to data belonging to the Negative class and having a feature quantity equal to or less than Xs. The product of other indicator functions shown in Equation (3) has the same meaning. For example, the total sum of I _{{Ci = Pos}} I _{{Xi ≦ Xs}} in Expression (3) is a sum relating to data belonging to the Positive class and having a feature quantity equal to or less than Xs.

例えば、図１５に示すように、変数Ｘが分割点Ｘｓにより最適に分割されているものとする。この場合、Ｎ＝１０、ＮＬ＝５、ＮＲ＝５であるので、式（３）を適用すると、分割点Ｘｓにおける変数Ｘの品質Ｑ（Ｘ、Ｘｓ）は、Ｑ（Ｘ、Ｘｓ）＝（４^２＋１^２）／５＋（０^２＋５^２）／５＝８．４と計算される。 For example, as shown in FIG. 15, it is assumed that the variable X is optimally divided by the division point Xs. In this case, since N = 10, NL = 5, and NR = 5, the quality Q (X, Xs) of the variable X at the dividing point Xs is obtained by applying Q (X, Xs) = ( 4 ² +1 ² ) / 5 + (0 ² +5 ² ) /5=8.4.

図１４に示す例では、分岐条件３０１〜３０７それぞれに用いられている変数（Ｘ、Ｙ、Ｚ）の品質が算出される。なお、図１４では、各分岐条件での品質Ｑを、当該分岐条件にて用いられている変数とその符号により、分岐条件内に併記している。例えば、分岐条件３０１では、変数Ｘが用いられているので、分岐条件３０１における品質ＱをＱ（Ｘ、３０１）と表記している。 In the example illustrated in FIG. 14, the quality of variables (X, Y, Z) used for the branch conditions 301 to 307 is calculated. In FIG. 14, the quality Q under each branch condition is shown in the branch condition by the variable used in the branch condition and its sign. For example, since the variable X is used in the branch condition 301, the quality Q in the branch condition 301 is expressed as Q (X, 301).

次に、特徴量選択部２２は、品質が算出された特徴量それぞれの重要度（Importance）を算出する（ステップＳ１０４）。具体的には、決定木の分岐条件それぞれの品質の総和に対する各変数（特徴量）の品質の割合から特徴量の重要度が算出される。例えば、図１４に示す例では、変数Ｘの重要度は式（４）、変数Ｙの重要度は式（５）、変数Ｚの重要度は式（６）によりそれぞれ算出できる。

Next, the feature quantity selection unit 22 calculates the importance (Importance) of each feature quantity whose quality has been calculated (step S104). Specifically, the importance of the feature quantity is calculated from the ratio of the quality of each variable (feature quantity) to the sum of the quality of each branch condition of the decision tree. For example, in the example shown in FIG. 14, the importance of the variable X can be calculated by Expression (4), the importance of the variable Y can be calculated by Expression (5), and the importance of the variable Z can be calculated by Expression (6).

次に、特徴量選択部２２は、第２の選択ポリシに含まれる「絞り込み方法」に従い、特徴量の絞り込みを行う（ステップＳ１０５）。例えば、特徴量選択部２２が図１１に示す学習データに対して決定木を作成し、各変数の重要度を降順（重要度が高い順）に並べた結果が図１６（ａ）のとおりであるとすると、下位２つの特徴量Ｆ２−３、Ｆ１１−３が削除される。なお、図１６において、灰色にて色づけされた行は、特徴量選択部２２による絞り込みにより削除される行である。このように、特徴量選択部２２は、先のステップにて算出された重要度に基づき、学習データに含まれる複数の特徴量から所定の数の特徴量を削除して新たな学習データを生成する。 Next, the feature quantity selection unit 22 narrows down the feature quantity according to the “squeezing method” included in the second selection policy (step S105). For example, the feature quantity selection unit 22 creates a decision tree for the learning data shown in FIG. 11 and arranges the importance of each variable in descending order (in descending order of importance) as shown in FIG. If there is, the lower two feature quantities F2-3 and F11-3 are deleted. In FIG. 16, rows colored in gray are rows that are deleted by narrowing down by the feature amount selection unit 22. As described above, the feature amount selection unit 22 generates new learning data by deleting a predetermined number of feature amounts from a plurality of feature amounts included in the learning data based on the importance calculated in the previous step. To do.

次に、特徴量選択部２２は、第２の選択ポリシに含まれる「終了条件」に、上記の新たな学習データが合致するか否かを判定する（ステップＳ１０６）。ここでは、「終了条件＝特徴量の数が４」であるので、特徴量選択部２２は、特徴量の数が４つにまで絞り込めているか否かを判定する。 Next, the feature amount selection unit 22 determines whether or not the new learning data matches the “end condition” included in the second selection policy (step S106). Here, since “end condition = number of feature quantities is four”, the feature quantity selection unit 22 determines whether the number of feature quantities has been narrowed down to four.

新たな学習データが終了条件を満たしていなければ（ステップＳ１０６、Ｎｏ分岐）、特徴量選択部２２は、ステップＳ１０２に戻り処理を継続する。即ち、特徴量選択部２２は、特徴量が絞り込まれた新たな学習データを使って、再び決定木を作成し、当該決定木の分岐条件をなす変数の品質、重要度を算出し、重要度の低い特徴量を削除する。 If the new learning data does not satisfy the termination condition (No at Step S106), the feature amount selection unit 22 returns to Step S102 and continues the processing. That is, the feature quantity selection unit 22 creates a decision tree again using new learning data in which the feature quantity has been narrowed down, calculates the quality and importance of the variables that constitute the branch condition of the decision tree, Delete low feature quantities.

新たな学習データが終了条件を満たしていれば（ステップＳ１０６、Ｙｅｓ分岐）、特徴量選択部２２は処理を終了する。 If the new learning data satisfies the end condition (step S106, Yes branch), the feature amount selection unit 22 ends the process.

上記のような絞り込みの結果、図１６（ａ）に示す特徴量は、図１６（ｂ）、図１６（ｃ）のように絞り込まれていき、最終的に図１６（ｄ）に示す特徴量（上から４つの特徴量）となる。 As a result of the narrowing down as described above, the feature quantity shown in FIG. 16A is narrowed down as shown in FIGS. 16B and 16C, and finally the feature quantity shown in FIG. 16D. (Four feature values from the top).

特徴量選択部２２により２段階の特徴量の絞り込みが行われた結果の学習データは、図１７のとおりとなる。特徴量選択部２２は、第１の選択処理及び第２の選択処理の実施により特徴量が絞り込まれた学習データ（例えば、図１７に示す学習データ）を、決定木生成部２３に引き渡す。 The learning data obtained as a result of narrowing down the two-stage feature quantity by the feature quantity selection unit 22 is as shown in FIG. The feature quantity selection unit 22 hands over the learning data (for example, the learning data shown in FIG. 17) whose feature quantity has been narrowed down by performing the first selection process and the second selection process to the decision tree generation unit 23.

決定木生成部２３は、取得した学習データに基づき、識別規則を生成する。具体的には、決定木生成部２３は、図１７に示す学習データに基づき、決定木を生成する。決定木生成部２３は、決定木を生成する際、不純度が「０」となるまで、あるいは、予め定めた深さに決定木の分岐が到達するまで、分割する変数の選択と、データの部分集合の分割と、を繰り返す。決定木生成部２３は、生成した決定木を出力部２４に引き渡す。 The decision tree generator 23 generates an identification rule based on the acquired learning data. Specifically, the decision tree generator 23 generates a decision tree based on the learning data shown in FIG. When the decision tree is generated, the decision tree generation unit 23 selects variables to be divided until the impurity becomes “0” or until the branch of the decision tree reaches a predetermined depth. Repeat the division of the subset. The decision tree generation unit 23 delivers the generated decision tree to the output unit 24.

出力部２４は、例えば、取得した決定木を「Ｉｆ−Ｔｈｅｎ」の形式にて外部装置（例えば、識別装置３０）に出力する。あるいは、出力部２４は、「Ｉｆ−Ｔｈｅｎ」の形式を、例えば図１４のように可視化し、画像データとして出力してもよい。 For example, the output unit 24 outputs the acquired decision tree to an external device (for example, the identification device 30) in the form of “If-Then”. Alternatively, the output unit 24 may visualize the “If-Then” format as shown in FIG. 14, for example, and output the image data.

なお、第１の選択ポリシや第２の選択ポリシは、ユーザが任意にその内容を変更可能に構成されていることが望ましい。決定木生成部２３による決定木の生成の際に利用する特徴量が異なると、分岐条件（グレーディングの根拠、理由）や分類結果（識別結果、グレーディング）もまた異なるものとなる。そのため、同じ細胞画像から抽出された特徴量を含む学習データ（例えば、６０個の特徴量を含む学習データ）をデータ処理装置２０に入力したとしても、決定木の生成に利用する特徴量を変更することで、学習データの基礎となったサンプル（注視領域画像データを抽出したサンプル）に対する多角的、多面的な研究、解析が実現可能となる。 The first selection policy and the second selection policy are preferably configured so that the user can arbitrarily change the contents thereof. If the feature quantity used when the decision tree is generated by the decision tree generator 23 is different, the branching conditions (grading grounds and reasons) and the classification results (identification results, grading) also differ. Therefore, even if learning data including feature amounts extracted from the same cell image (for example, learning data including 60 feature amounts) is input to the data processing device 20, the feature amount used for generation of the decision tree is changed. By doing this, it becomes possible to realize multi-faceted and multi-faceted research and analysis for the sample that is the basis of the learning data (sample from which the gaze area image data is extracted).

上述のデータ処理装置２０の動作をまとめると図１８に示すとおりとなる。 The operations of the data processing apparatus 20 described above are summarized as shown in FIG.

ステップＳ０１において、データ処理装置２０は、学習データを学習データ生成装置１０から入力する。 In step S <b> 01, the data processing device 20 inputs learning data from the learning data generation device 10.

ステップＳ０２において、データ処理装置２０は、第１及び第２の選択処理の実行することにより、入力した学習データに含まれる特徴量の絞り込みを行う。 In step S02, the data processing device 20 narrows down the feature amount included in the input learning data by executing the first and second selection processes.

ステップＳ０３において、データ処理装置２０は、絞り込まれた特徴量を含む学習データを用いて、決定木を生成する。 In step S03, the data processing device 20 generates a decision tree using the learning data including the narrowed feature amount.

ステップＳ０４において、データ処理装置２０は、決定木を外部に出力する。 In step S04, the data processing device 20 outputs the decision tree to the outside.

［適用例］
次に、第１の実施形態にて説明した決定木の生成方法を適用した場合の例について説明する。ここでは、１１０５人の患者の肝細胞から生成した注視領域画像データ（細胞画像の一部）から特徴量ベクトルを生成し、最終的に４つの特徴量に絞り込んだ学習データ（図１９参照）から決定木を生成した場合を説明する。なお、図１９において、細胞核の大きさに関する特徴量Ｆ１−３は、細胞核領域をなす画素数を用いている。 [Application example]
Next, an example where the decision tree generation method described in the first embodiment is applied will be described. Here, a feature vector is generated from gaze area image data (part of a cell image) generated from hepatocytes of 1105 patients, and finally from learning data (see FIG. 19) narrowed down to four feature values. A case where a decision tree is generated will be described. In FIG. 19, the feature quantity F1-3 regarding the size of the cell nucleus uses the number of pixels forming the cell nucleus region.

図２０は、図１９に示す学習データから得られる決定木の一例を示す図である。なお、決定木の算出にあたり、決定木の深さを「４」としている。また、図２０以降に示す決定木において、分岐条件を満たす場合には左側に分岐し、満たさない場合には右側に分岐するものとする。さらに、同じグレードであっても異なる分類結果に振り分けられることがあるので、分類結果のクラスラベル（グレードＧ０〜Ｇ４）を区別する目的でアルファベットを付与している。例えば、同じグレードＧ２であっても、分類結果４０１〜４０５に分類され得るので、これらを区別するためにＧ２ａ〜Ｇ２ｅを分類結果に表記している。 FIG. 20 is a diagram illustrating an example of a decision tree obtained from the learning data illustrated in FIG. In calculating the decision tree, the depth of the decision tree is “4”. Further, in the decision trees shown in FIG. 20 and subsequent figures, when the branch condition is satisfied, the branch is made on the left side, and when not satisfied, the branch is made on the right side. Furthermore, since even the same grade may be assigned to different classification results, alphabets are added for the purpose of distinguishing class labels (grades G0 to G4) of the classification results. For example, even the same grade G2 can be classified into the classification results 401 to 405, so that G2a to G2e are written in the classification results in order to distinguish them.

図２０を参照すると、グレードがＧ３未満か否かは、ルートノードからの最初の分岐条件にて用いられる細胞核の円形度（特徴量Ｆ３−３）に大きく依存することが分かる。また、上述のように同じグレードＧ２であっても、５種類の分類結果に振り分けられることが分かる。換言するならば、同じグレードであっても、異なる分類結果に属する注視領域画像データは異なる特徴を有すると言える。 Referring to FIG. 20, it can be seen that whether or not the grade is less than G3 largely depends on the circularity (feature amount F3-3) of the cell nucleus used in the first branch condition from the root node. Further, as described above, it can be seen that even the same grade G2 is distributed to five types of classification results. In other words, it can be said that gaze area image data belonging to different classification results have different characteristics even with the same grade.

このように、決定木により示される識別規則は「Ｉｆ−Ｔｈｅｎ」の形式により表現されるので、図２０に示すような可視化が容易である。そのため、医師等が可視化された決定木を参照することで、グレーディングの理由や根拠を容易に理解できる。例えば、図２０に接した医師等は、円形度が高いのでグレードが低く与えられている、細胞核が大きいので高いグレードが与えられている、と言ったグレーディングの根拠、理由を得ることができる。あるいは、葉（クラスラベル）のノードからルートノードに向けて分岐条件を確認（決定木の流れを遡るように確認）することで、医師等は、各クラスラベルの特徴を把握することができる。 In this way, the identification rule indicated by the decision tree is expressed in the “If-Then” format, so that visualization as shown in FIG. 20 is easy. Therefore, the reason and basis of grading can be easily understood by referring to the decision tree visualized by a doctor or the like. For example, doctors in contact with FIG. 20 can obtain the grounds and reasons for grading that a high grade is given because the grade is high because the degree of circularity is high, and a high grade is given because the cell nucleus is large. Alternatively, by confirming the branch condition from the leaf (class label) node toward the root node (confirming the flow of the decision tree), the doctor or the like can grasp the characteristics of each class label.

決定木生成部２３が生成する決定木の深さは、深いほど分類の精度は高くなる。図２１は、図１９に示す学習データから、決定木の深さを２０まで許容した場合のグレーディング結果（図２１（ａ））と、決定木の深さを４まで許容した場合のグレーディング結果（図２１（ｂ））と、を示す図である。図２１に示すグレードＧ０ｔ〜Ｇ４ｔは医師による判断（ラベル；真値、True）を示し、グレードＧ０ｐ〜Ｇ４ｐは生成された決定木を適用することで得られるグレードの予測値（Prediction）を示す。 As the depth of the decision tree generated by the decision tree generator 23 increases, the classification accuracy increases. FIG. 21 shows a grading result when the depth of the decision tree is allowed up to 20 from the learning data shown in FIG. 19 (FIG. 21A) and a grading result when the depth of the decision tree is allowed up to 4 ( It is a figure which shows (b) of FIG. Grades G0t to G4t shown in FIG. 21 indicate judgments by a doctor (label; true value, True), and grades G0p to G4p indicate grade predicted values (Prediction) obtained by applying the generated decision tree.

図２１の縦と横のグレーディングが交差する箇所（図の灰色の箇所）は、医師による判定と決定木による予測が一致していることを示し、当該交差箇所に含まれる数が多いほど当該決定木によるグレーディングの精度が高いことを示す。具体的には、決定木の深さを「２０」に設定した場合には、その精度は９６．２％となる。一方、決定木の深さを「４」に設定した場合には、その精度は５６．７％となる。 A portion where the vertical and horizontal grading in FIG. 21 intersect (gray portion in the figure) indicates that the determination by the doctor and the prediction by the decision tree match, and the more the number included in the intersection, the more the determination. This indicates that the accuracy of grading with trees is high. Specifically, when the depth of the decision tree is set to “20”, the accuracy is 96.2%. On the other hand, when the depth of the decision tree is set to “4”, the accuracy is 56.7%.

このように、決定木の深さを深くするほどグレーディングの精度は向上するが、生成された決定木の深さが深ければ深いほど、決定木によるグレーディングの根拠は医師等にとって理解しがたいものとなる。つまり、決定木によるグレーディングの精度と、決定木によるグレーディングの根拠、理由の理解容易性には、トレードオフの関係が存在する。従って、精度と理解容易性の関係が最適となるような深さにより決定木を生成することが望ましい。 In this way, the grading accuracy improves as the depth of the decision tree increases, but as the depth of the generated decision tree increases, the basis for grading by the decision tree is difficult for doctors to understand. It becomes. In other words, there is a trade-off relationship between the accuracy of grading by a decision tree and the ease of understanding the grounds and reasons for grading by a decision tree. Therefore, it is desirable to generate a decision tree with a depth that optimizes the relationship between accuracy and ease of understanding.

以上のように、第１の実施形態に係るデータ処理装置２０は、識別規則の生成に利用する特徴量の影響度（重要度）を把握する目的で決定木を利用している。また、データ処理装置２０は、複数の特徴量のうち、グレーディング結果に大きな影響を与える特徴量を残しつつ、影響の小さい特徴量を削除することで、最終的に利用する特徴量を絞り込んでいる。特徴量を絞り込むことで、決定木生成部２３が生成する決定木のサイズを小さくし、グレーディングの根拠や理由に対する理解容易性を高めている。 As described above, the data processing apparatus 20 according to the first embodiment uses the decision tree for the purpose of grasping the influence (importance) of the feature quantity used for generating the identification rule. In addition, the data processing device 20 narrows down the feature amount to be finally used by deleting the feature amount having a small influence while leaving the feature amount having a large influence on the grading result among the plurality of feature amounts. . By narrowing down the feature amount, the size of the decision tree generated by the decision tree generator 23 is reduced, and the ease of understanding the grounds and reasons for grading is enhanced.

また、データ処理装置２０は、決定木の生成、特徴量の評価、特徴量の絞り込みという手順を１度に限り行うのではなく、同じ手順を複数回行うことで特徴量の絞り込みを行っている。このような複数回の絞り込みを行う理由は、特徴量の間に存在する複雑な関係の影響を可能な限り排除し、グレーディングの精度を高めるためである。例えば、特徴量Ａと特徴量Ｂが、細胞核の同じ特徴を表現する場合には、これらの特徴量を同時に決定木の生成に利用する必要性は低い。例えば、特徴量Ａを優先的に利用するとすれば、特徴量Ｂの結果に対する影響は低くなり、特徴量Ｂは削除しても影響は少ない。対して、特徴量Ａと特徴量Ｂが同時に利用されることで、分類の精度が高くなることもある。この場合、特徴量Ａが利用される場合には特徴量Ｂの影響度も高くなるが、特徴量Ａが利用されなければ特徴量Ｂの利用価値（結果に対する影響度）も低くなる。このように、特徴量の重要性は他の特徴量の存在に左右されるため、特徴量の組み合わせごとに各特徴量の重要度は変化する。例えば、図１６（ａ）を参照すると、特徴量Ｆ３−３の重要度は５番目となっている。一方、特徴量を順次絞り込んでいった結果の図１６（ｄ）では、特徴量Ｆ３-３の重要度は１番目となっている。つまり、使用する特徴量の数が少ない場合には特徴量Ｆ３−３の影響は大きいと言える。図１６（ａ）の段階で重要度の高い４つの特徴量を選択すると、特徴量Ｆ３−３は除外され、少数の特徴量にて影響度の高い特徴量Ｆ３−３が用いられないという不都合が生じる。このような不都合を回避するため、データ処理装置２０では、決定木の生成、特徴量の評価、特徴量の絞り込みという手順を繰り返しているのである。 In addition, the data processing apparatus 20 does not perform the procedures of decision tree generation, feature amount evaluation, and feature amount narrowing down only once, but narrows down feature amounts by performing the same procedure multiple times. . The reason for performing such a plurality of times of narrowing down is to eliminate the influence of a complicated relationship existing between feature quantities as much as possible and to improve the accuracy of grading. For example, when the feature quantity A and the feature quantity B represent the same feature of the cell nucleus, it is less necessary to use these feature quantities at the same time for generating a decision tree. For example, if the feature amount A is used preferentially, the influence on the result of the feature amount B is low, and even if the feature amount B is deleted, the influence is small. On the other hand, when the feature amount A and the feature amount B are used at the same time, the classification accuracy may be increased. In this case, when the feature amount A is used, the influence amount of the feature amount B is high, but when the feature amount A is not used, the utility value (the influence degree on the result) of the feature amount B is also low. As described above, since the importance of the feature quantity depends on the presence of other feature quantities, the importance of each feature quantity changes for each combination of feature quantities. For example, referring to FIG. 16A, the importance of the feature amount F3-3 is fifth. On the other hand, in FIG. 16D as a result of sequentially narrowing down the feature amount, the importance amount of the feature amount F3-3 is the first. That is, it can be said that the influence of the feature amount F3-3 is large when the number of feature amounts to be used is small. When four highly important feature amounts are selected in the stage of FIG. 16A, the feature amount F3-3 is excluded, and the feature amount F3-3 having a high influence level is not used with a small number of feature amounts. Occurs. In order to avoid such inconvenience, the data processing apparatus 20 repeats the procedures of generating a decision tree, evaluating feature amounts, and narrowing down feature amounts.

決定木には、目的変数を非線形に分離可能であり、決定木の深さを十分にとれば高い精度が得られる利点がある。また、決定木による識別規則は容易に可視化が可能であり、分類結果に対する根拠、理由の理解が容易という利点もある。これらの利点は、他の分析モデル、学習モデル（例えば、サポートベクターマシン（ＳＶＭ；Support Vector Machine））には存在しない、又は希薄なものである。第１の実施形態に係るデータ処理装置２０は、提供される学習データに基づき、決定木を識別規則として生成することで、分類の精度と理解容易性の両立をなしている。 The decision tree has the advantage that the objective variable can be separated non-linearly, and high accuracy can be obtained if the depth of the decision tree is sufficiently large. In addition, the identification rule based on the decision tree can be easily visualized, and there is an advantage that it is easy to understand the basis and reason for the classification result. These advantages are absent or sparse in other analytical models, learning models (eg, Support Vector Machine (SVM)). The data processing device 20 according to the first embodiment generates a decision tree as an identification rule based on the provided learning data, thereby achieving both classification accuracy and ease of understanding.

［第２の実施形態］
続いて、第２の実施形態について図面を参照して詳細に説明する。 [Second Embodiment]
Next, a second embodiment will be described in detail with reference to the drawings.

第１の実施形態では、学習データから決定木を生成することを説明したが、第２の実施形態では、上記決定木のさらなる活用について説明する。 In the first embodiment, generation of a decision tree from learning data has been described. In the second embodiment, further utilization of the decision tree will be described.

図２２は、第２の実施形態に係る病理画像処理システムの構成の一例を示す図である。図２２を参照すると、学習データ生成装置１０ａは、注視領域ＩＤにより関連付けられた注視領域画像データの付随情報を、注視領域画像データ及びラベル情報に加えて、入力する。学習データ生成装置１０ａは、第１の実施形態にて説明した方法により学習データを生成し、データ処理装置２０ａに出力する。学習データ生成装置１０ａが取得した付随情報は、学習データと共にデータ処理装置２０ａに提供される。 FIG. 22 is a diagram illustrating an example of a configuration of a pathological image processing system according to the second embodiment. Referring to FIG. 22, the learning data generation device 10a inputs the accompanying information of the gaze area image data associated with the gaze area ID in addition to the gaze area image data and the label information. The learning data generation device 10a generates learning data by the method described in the first embodiment and outputs the learning data to the data processing device 20a. The accompanying information acquired by the learning data generation device 10a is provided to the data processing device 20a together with the learning data.

データ処理装置２０ａは、第１の実施形態にて説明した方法により、決定木を生成する。データ処理装置２０ａは、生成された決定木による分類結果それぞれが有する特徴を、付随情報に基づき解析する機能を有する。具体的には、データ処理装置２０ａは、決定木、その分類結果及び付随情報を利用して、種々の解析データや解析画像を解析結果として生成し、出力する。 The data processing device 20a generates a decision tree by the method described in the first embodiment. The data processing device 20a has a function of analyzing the characteristics of each classification result based on the generated decision tree based on the accompanying information. Specifically, the data processing device 20a generates and outputs various analysis data and analysis images as analysis results using the decision tree, its classification result, and accompanying information.

図２３は、付随情報の一例を示す図である。なお、図２３には理解の容易のため、ラベルも併記している。図２３に示す付随情報は、注視領域ＩＤにて関連付けられる注視領域画像データの元になった肝病理画像を採取した患者に投与した抗癌剤と当該抗癌剤の効果（＋は効果あり、−は効果なし）に関する情報を含むものである。学習データ生成装置１０ａの学習データ出力部１３は、学習データに上記付随情報を添えてデータ処理装置２０ａに出力する。なお、図２３以降に示す抗癌剤Ａ〜Ｄやその効果は、データ処理装置２０ａの動作を説明するための仮想的な事例（データ）である。 FIG. 23 is a diagram illustrating an example of accompanying information. In FIG. 23, labels are also shown for easy understanding. The accompanying information shown in FIG. 23 includes the anticancer drug administered to the patient who collected the liver pathological image that is the basis of the gaze area image data associated with the gaze area ID, and the effect of the anticancer drug (+ is effective, − is ineffective ). The learning data output unit 13 of the learning data generation device 10a adds the accompanying information to the learning data and outputs it to the data processing device 20a. Note that the anticancer agents A to D and the effects thereof shown in FIG. 23 and subsequent figures are virtual cases (data) for explaining the operation of the data processing device 20a.

図２４は、第２の実施形態に係るデータ処理装置２０ａの内部構成の一例を示す図である。第１の実施形態に係るデータ処理装置２０とデータ処理装置２０ａの相違点は、データ処理装置２０ａの各部が付随情報を扱えるように構成されている点と、解析部２６を備える点と、生成された決定木による分類結果が解析部２６に引き渡される点である。 FIG. 24 is a diagram illustrating an example of an internal configuration of the data processing device 20a according to the second embodiment. Differences between the data processing device 20 and the data processing device 20a according to the first embodiment are that each unit of the data processing device 20a is configured to handle incidental information, a point having an analysis unit 26, and generation The classification result by the determined decision tree is handed over to the analysis unit 26.

図２５は、データ処理装置２０ａの決定木生成部２３が生成する決定木による分類結果の一例を示す図である。図２５に示すように、決定木生成部２３は、生成された決定木による各分類結果（各クラスラベル）それぞれに属する注視領域ＩＤの一覧を、分類結果として解析部２６に引き渡す。図２３と図２５を参照すると、注視領域ＩＤにより各分類結果に属する注視領域画像データと、当該注視領域画像データを提供した患者に投与した抗癌剤の効果と、が関係づけられる。例えば、注視領域ＩＤ＝１に対応する患者から取得した注視領域画像データは「Ｇ２ａ」のグレードに分類されると共に、当該患者に投与した抗癌剤のうち、少なくとも抗癌剤Ａ、Ｂ、Ｄは有効であることが、図２３及び図２５から理解される。 FIG. 25 is a diagram illustrating an example of a classification result based on a decision tree generated by the decision tree generation unit 23 of the data processing device 20a. As illustrated in FIG. 25, the decision tree generation unit 23 delivers a list of gaze area IDs belonging to each classification result (each class label) based on the generated decision tree to the analysis unit 26 as a classification result. Referring to FIG. 23 and FIG. 25, the gaze area image data belonging to each classification result by the gaze area ID is related to the effect of the anticancer agent administered to the patient who provided the gaze area image data. For example, gaze area image data acquired from a patient corresponding to gaze area ID = 1 is classified into a grade of “G2a”, and at least anticancer drugs A, B, and D are effective among anticancer drugs administered to the patient. This can be understood from FIGS. 23 and 25.

解析部２６は、上記の情報（決定木、分類結果、付随情報）に基づき、決定木による分類結果それぞれが有する特徴を解析する手段である。例えば、解析部２６は、分類結果それぞれに振り分けられた注視領域ＩＤに対応する患者への各種抗癌剤の有効性を解析する。具体的には、解析部２６は、以下の手順により抗癌剤の有効性を解析する。 The analysis unit 26 is a means for analyzing the characteristics of each classification result based on the decision tree based on the above information (decision tree, classification result, and accompanying information). For example, the analysis unit 26 analyzes the effectiveness of various anticancer agents for the patient corresponding to the gaze area ID assigned to each classification result. Specifically, the analysis unit 26 analyzes the effectiveness of the anticancer agent by the following procedure.

初めに、解析部２６は、分類結果それぞれに含まれる注視領域ＩＤを取得する。次に、解析部２６は、付随情報を参照し、上記取得した注視領域ＩＤごとの各抗癌剤の効果を取得する。その後、解析部２６は、分類結果（グレード；クラスラベル）及び抗癌剤ごとに、抗癌剤が有効であることを示すデータの割合を計算し、その割合が閾値（例えば、５０％以上；多数決）であれば、その抗癌剤は有効であると判定する。 First, the analysis unit 26 acquires a gaze area ID included in each classification result. Next, the analysis unit 26 refers to the accompanying information and acquires the effect of each anticancer agent for each acquired gaze area ID. Thereafter, the analysis unit 26 calculates a ratio of data indicating that the anticancer drug is effective for each classification result (grade; class label) and anticancer drug, and the ratio is a threshold value (for example, 50% or more; majority decision). For example, it is determined that the anticancer drug is effective.

例えば、図２５の分類結果に示されたグレードＧ２ａを例にとると、当該グレードには注視領域ＩＤ＝１、２、３により特定される注視領域画像データが少なくとも含まれる。次に、図２３に示す付随情報を参照すると、注視領域ＩＤ＝１、２、３に関する抗癌剤投与の結果が得られる。例えば、抗癌剤Ａを例に取ると、３人の患者（注視領域ＩＤ＝１〜３に対応する患者）のうち２人の患者に有効（＋が２個存在）であるので、抗癌剤Ａが有効であることを示すデータの割合は６６．６％と計算される。従って、グレードＧ２ａに属する注視領域ＩＤから特定される患者に対し、抗癌剤Ａは有効であると判定される。 For example, taking the grade G2a shown in the classification result of FIG. 25 as an example, the grade includes at least gaze area image data specified by the gaze area ID = 1, 2, and 3. Next, referring to the accompanying information shown in FIG. 23, the results of anticancer drug administration for gaze area ID = 1, 2, 3 are obtained. For example, taking anticancer agent A as an example, it is effective for two patients out of three patients (patients corresponding to gaze area ID = 1 to 3) (two + exist), so anticancer agent A is effective. The ratio of data indicating that is 66.6%. Therefore, it is determined that the anticancer agent A is effective for the patient specified from the gaze area ID belonging to the grade G2a.

なお、解析部２６が、抗癌剤の有効性を判断する際の閾値（上記の例では５０％）は、全ての抗癌剤に共通するものであっても良いし、個別に閾値を設定してもよい。例えば、抗癌剤Ａに対する有効性の判断を慎重にしたい場合には、閾値を高めに（例えば、８０％等）に設定してもよい。あるいは、解析部２６は、グレードごとの抗癌剤の有効性を算出した結果、当該抗癌剤が有効であることを示すデータが所定の範囲内（例えば、４０％〜６０％等の範囲）にある場合には、当該抗癌剤の効果は「不明」としてもよい。 Note that the threshold (50% in the above example) when the analysis unit 26 determines the effectiveness of the anticancer agent may be common to all anticancer agents, or may be set individually. . For example, when it is desired to carefully determine the effectiveness of anticancer drug A, the threshold value may be set higher (for example, 80%). Alternatively, when the analysis unit 26 calculates the effectiveness of the anticancer agent for each grade, the data indicating that the anticancer agent is effective is within a predetermined range (for example, a range of 40% to 60%, etc.). The effect of the anticancer agent may be “unknown”.

解析部２６は、上記のような判定を、分類結果の各グレード及び抗癌剤ごとに実施し、図２６に示すような解析結果を得る。解析部２６は、当該解析結果と決定木を出力部２４に引き渡す。 The analysis unit 26 performs the determination as described above for each grade and anticancer agent of the classification result, and obtains an analysis result as shown in FIG. The analysis unit 26 passes the analysis result and the decision tree to the output unit 24.

出力部２４は、決定木と解析結果を用いて、決定木による各分類結果（グレード）に振り分けられた患者（注視領域ＩＤにより関連付けられた患者の集合）に対する抗癌剤の有効性を示すデータを生成し、外部装置や表示デバイスに出力する。 The output unit 24 uses the decision tree and the analysis result to generate data indicating the effectiveness of the anticancer agent for the patients (a group of patients associated with the gaze area ID) assigned to each classification result (grade) based on the decision tree. Output to an external device or display device.

例えば、出力部２４は、図２７に示すような画像データを生成し、外部に出力する。なお、図２７では、理解の容易のためグレードＧ２ａとグレードＧ２ｂに関する抗癌剤の有効性に限り図示している。図２７を参照すると、同じグレードＧ２に振り分けられる患者であっても、Ｇ２ａとＧ２ｂとでは、抗癌剤の有効性に顕著な相違が存在することが確認できる。図２７に示すような情報に接した医師等は、Ｇ２ａのグレードに振り分けられる患者には抗癌剤Ｂが有効ではないこと、Ｇ２ｂに振り分けられる患者には抗癌剤Ｄが有効であること、等の所見を得ることができる。 For example, the output unit 24 generates image data as shown in FIG. 27 and outputs it to the outside. In FIG. 27, for the sake of easy understanding, only the effectiveness of anticancer agents related to grade G2a and grade G2b is illustrated. Referring to FIG. 27, it can be confirmed that there is a significant difference in the effectiveness of anticancer agents between G2a and G2b even in patients who are assigned to the same grade G2. Doctors who have contacted the information as shown in FIG. 27 have found that anticancer drug B is not effective for patients assigned to G2a grade, and that anticancer drug D is effective for patients assigned to G2b. Can be obtained.

このように、データ処理装置２０ａに提供される付随情報が、注視領域ＩＤに対応する患者に対する抗癌剤の有効性に関する結果である場合には、データ処理装置２０ａは、分類結果それぞれに含まれる注視領域ＩＤに対応する患者への抗癌剤の有効性を示す解析結果を出力することができる。 As described above, when the accompanying information provided to the data processing device 20a is a result regarding the effectiveness of the anticancer agent for the patient corresponding to the gaze region ID, the data processing device 20a includes the gaze region included in each classification result. An analysis result indicating the effectiveness of the anticancer agent for the patient corresponding to the ID can be output.

なお、データ処理装置２０ａによるデータの解析は、抗癌剤の有効性に限定されない。付随情報の内容を変更することで、他の解析を行うことも可能である。例えば、付随情報として、図２８に示される情報がデータ処理装置２０ａに入力されたものとする。図２８に示す付随情報は、注視領域ＩＤにより関連付けられた患者に癌が再発した日数を含むものである。 The data analysis by the data processing device 20a is not limited to the effectiveness of the anticancer agent. Other analyzes can be performed by changing the content of the accompanying information. For example, it is assumed that the information shown in FIG. 28 is input to the data processing device 20a as the accompanying information. The accompanying information shown in FIG. 28 includes the number of days that cancer has recurred in the patient associated with the gaze area ID.

解析部２６は、分類結果のグレードごとに、時間経過に伴う癌再発の確率を計算し、解析結果として算出する。具体的には、解析部２６は、図２９に示すようなグラフに係るデータを解析結果として算出し、出力部２４に引き渡す。 The analysis unit 26 calculates the probability of cancer recurrence over time for each grade of the classification result, and calculates it as an analysis result. Specifically, the analysis unit 26 calculates data relating to a graph as illustrated in FIG. 29 as an analysis result, and passes it to the output unit 24.

図２９を参照すると、「Ｇ１ａ」のグレードに振り分けられた患者は、採取されたサンプルの範囲内では癌再発の可能性がないことが分かる。また、「Ｇ２ａ」と「Ｇ２ｂ」のグレードに振り分けられた患者の癌再発の傾向は、それぞれ異なることが分かる。具体的には、日数が１０００日未満であれば、グレードＧ２ａとＧ２ｂそれぞれに割り振られた患者の癌再発率に顕著な差はないが、日数が１０００日を越えると両者の間の癌再発率に顕著な相違が認められる。 Referring to FIG. 29, it can be seen that patients assigned to the grade of “G1a” have no possibility of cancer recurrence within the range of collected samples. Moreover, it turns out that the tendency of the cancer recurrence of the patients classified into the grade of "G2a" and "G2b" is different. Specifically, if the number of days is less than 1000 days, there is no significant difference in the cancer recurrence rate of patients assigned to grades G2a and G2b, but if the number of days exceeds 1000 days, the cancer recurrence rate between the two A marked difference is observed.

このように、データ処理装置２０ａに提供される付随情報が、注視領域ＩＤに対応する患者が癌を再発するまでの期間に関する情報である場合には、分類結果それぞれに含まれる注視領域ＩＤに対応する患者の癌再発に関する傾向を解析結果として出力することができる。 As described above, when the accompanying information provided to the data processing device 20a is information on a period until the patient corresponding to the gaze area ID recurs cancer, it corresponds to the gaze area ID included in each classification result. The tendency regarding the recurrence of cancer in the patient can be output as the analysis result.

以上のように、第２の実施形態に係る病理画像処理システムでは、付随情報を解析することで、決定木による分類結果（グレード）それぞれに顕著な特徴を示す情報を、医師等に提供できる。 As described above, in the pathological image processing system according to the second embodiment, by analyzing the accompanying information, it is possible to provide a doctor or the like with information that shows remarkable features in each classification result (grade) based on the decision tree.

なお、上記実施形態にて説明した病理画像処理システムの構成は例示であって、システムの構成を限定する趣旨ではない。例えば、データ処理装置２０の機能の一部が学習データ生成装置１０に組み込まれていてもよい。例えば、第１の実施形態にて説明したデータ処理装置２０での特徴量の絞り込みの全部又は一部を学習データ生成装置１０にて実行してもよい。あるいは、学習データ生成装置１０に替えて、注視領域画像データから特徴量ベクトルを抽出する装置を用意すると共に、データ処理装置２０にラベル情報を直接入力し、データ処理装置２０の内部にて学習データを生成してもよい。あるいは、識別装置３０の機能がデータ処理装置２０に含まれていてもよい。この場合、図３０に示すように、データ処理装置２０ｂは、決定木生成部２３が生成する決定木を用いて、サンプルデータの予測を行う識別部２７を備えることになる。また、入力部２１は、サンプルデータを入力し、出力部２４は識別結果を出力する。 The configuration of the pathological image processing system described in the above embodiment is an example, and is not intended to limit the configuration of the system. For example, some of the functions of the data processing device 20 may be incorporated in the learning data generation device 10. For example, the learning data generation apparatus 10 may execute all or part of the feature amount narrowing down in the data processing apparatus 20 described in the first embodiment. Alternatively, in place of the learning data generation device 10, a device that extracts a feature vector from gaze area image data is prepared, and label information is directly input to the data processing device 20, and learning data is stored inside the data processing device 20. May be generated. Alternatively, the function of the identification device 30 may be included in the data processing device 20. In this case, as illustrated in FIG. 30, the data processing device 20b includes an identification unit 27 that performs prediction of sample data using the decision tree generated by the decision tree generation unit 23. The input unit 21 inputs sample data, and the output unit 24 outputs the identification result.

上記実施形態では、注視領域画像データのグレードをラベル情報として用いているが、ラベルは注視領域画像データ（細胞画像）のグレードに限定されるものではない。例えば、ラベルとして患者の癌再発に関する情報を用いてもよい。例えば、図３１に示すように、注視領域ＩＤに対応する患者の癌再発情報（長期再発なし、早期再発）をラベルとして用いてもよい。この場合、第１の実施形態にて説明した特徴量の抽出、特徴量の絞り込み、決定木の作成により、注視領域画像データの細胞核が有する特徴（例えば、細胞核の大きさ、円形度等）を分岐条件とする癌再発に関する決定木（予測モデル）を得ることができる（図２０に相当する決定木を得ることができる）。また、第２の実施形態にて説明した方法と同じ手順により、当該ラベルと患者が癌を再発するまでの日数を付随情報とすることで、決定木の各分類結果に含まれる患者の癌再発までの傾向に関する情報を得ることができる（図２９に相当するグラフを得ることができる）。 In the above embodiment, the grade of the gaze area image data is used as the label information, but the label is not limited to the grade of the gaze area image data (cell image). For example, information regarding cancer recurrence of a patient may be used as a label. For example, as shown in FIG. 31, the cancer recurrence information (no long-term recurrence, early recurrence) of the patient corresponding to the gaze area ID may be used as a label. In this case, the features (for example, the size of the cell nucleus, the circularity, etc.) possessed by the cell nucleus of the gaze region image data are obtained by extracting the feature amount, narrowing down the feature amount, and creating the decision tree described in the first embodiment. A decision tree (prediction model) relating to cancer recurrence as a branching condition can be obtained (a decision tree corresponding to FIG. 20 can be obtained). In addition, according to the same procedure as the method described in the second embodiment, by using the label and the number of days until the patient relapses as incidental information, the cancer recurrence of the patient included in each classification result of the decision tree Can be obtained (a graph corresponding to FIG. 29 can be obtained).

上記実施形態では、注視領域画像データから特徴量ベクトルを算出し、学習データを生成する学習データ生成装置を含むシステム構成（図２、図２２）を説明したが、特徴量ベクトルの算出は学習データ生成装置（情報処理装置、コンピュータ）によるものに限定されない。例えば、医師等により算出された特徴量（特徴量ベクトル）を利用しても良いし、装置が算出した特徴量と医師等が算出した特徴量を組み合わせてもよい。即ち、データ処理装置２０に提供される学習データには複数のサンプルそれぞれを特徴付ける特徴量ベクトルが含まれていれば、当該特徴量ベクトルの生成手法等はどのようなものであってもよい。 In the above embodiment, the system configuration (FIGS. 2 and 22) including the learning data generation device that calculates the feature amount vector from the gaze area image data and generates the learning data has been described. The present invention is not limited to a generation device (information processing device, computer). For example, a feature amount (feature amount vector) calculated by a doctor or the like may be used, or a feature amount calculated by the apparatus and a feature amount calculated by a doctor or the like may be combined. That is, as long as the learning data provided to the data processing device 20 includes a feature vector that characterizes each of a plurality of samples, any method for generating the feature vector may be used.

また、識別装置３０にて利用する決定木（識別規則）もデータ処理装置２０が生成する決定木に限定されるものではない。即ち、上記実施形態にて説明した手法、手順により生成された決定木であれば、その生成主体は情報処理装置（コンピュータ）に限定されずどのようなものであってもよい。即ち、学習データ（細胞画像の識別子、ラベル、特徴量を含むデータ）を用意し、当該学習データから生成された決定木を用いることで、サンプルのグレーディングを行うことができる。 Further, the decision tree (identification rule) used in the identification device 30 is not limited to the decision tree generated by the data processing device 20. That is, as long as the decision tree is generated by the method and procedure described in the above embodiment, the generation subject is not limited to the information processing apparatus (computer), and may be anything. That is, it is possible to perform sample grading by preparing learning data (data including cell image identifiers, labels, and feature amounts) and using a decision tree generated from the learning data.

上記実施形態では、１２種類の特徴量を算出する場合について説明したが、算出する特徴量の種類を限定する趣旨ではない。例えば、細胞核領域のテクスチャを示す特徴量として、コントラスト（特徴量Ｆ９）や一様性（特徴量Ｆ１０）を示したが、フーリエ変換やウェーブレット変換等によるテクスチャ解析により得られる特徴量を用いてもよい。 In the above embodiment, the case where twelve types of feature values are calculated has been described. However, the type of feature values to be calculated is not limited. For example, contrast (feature amount F9) and uniformity (feature amount F10) have been shown as feature amounts indicating the texture of the cell nucleus region, but feature amounts obtained by texture analysis such as Fourier transform or wavelet transform may also be used. Good.

上記実施形態では、特徴量ベクトル生成部１２は、複数の特徴量の累積分布から得られるパーセンタイル値を、当該特徴量を代表する統計値として算出する場合について説明した。しかし、他の統計値を用いることができるのは当然である。例えば、複数の特徴量から得られる分散値、最頻値等の統計値を用いてもよい。また、データ処理装置２０の特徴量選択部２２は、同じ種類の特徴量から１つの特徴量（統計値）を選択する場合について説明したが、同じ種類の特徴量から複数の特徴量を選択してもよい。例えば、細胞核の大きさに関する特徴量の中間値（特徴量Ｆ１−３）と細胞核の大きさの分散値が選択されてもよい。 In the above-described embodiment, the case has been described in which the feature vector generation unit 12 calculates the percentile value obtained from the cumulative distribution of a plurality of feature amounts as a statistical value representing the feature amount. However, it will be appreciated that other statistics can be used. For example, statistical values such as variance values and mode values obtained from a plurality of feature amounts may be used. Further, although the case has been described in which the feature amount selection unit 22 of the data processing device 20 selects one feature amount (statistical value) from the same type of feature amount, a plurality of feature amounts are selected from the same type of feature amount. May be. For example, an intermediate value (feature value F1-3) of the feature amount related to the size of the cell nucleus and a variance value of the size of the cell nucleus may be selected.

また、学習データ生成装置１０の特徴量ベクトル生成部１２や、データ処理装置２０の特徴量選択部２２、決定木生成部２３等の各部が行う処理は、これらの装置（学習データ生成装置１０、データ処理装置２０）に搭載されたコンピュータに、そのハードウェアを用いて、上述した各処理を実行させるコンピュータプログラムにより実現できる。つまり、上記各部が行う機能を何らかのハードウェア、及び／又は、ソフトウェアで実行する手段があればよい。 Further, the processing performed by each unit such as the feature vector generation unit 12 of the learning data generation device 10, the feature amount selection unit 22 of the data processing device 20, and the decision tree generation unit 23 is performed by these devices (the learning data generation device 10, The present invention can be realized by a computer program that causes a computer mounted on the data processing device 20) to execute the above-described processes using its hardware. That is, it is sufficient if there is a means for executing the functions performed by the above-described units with some hardware and / or software.

さらに、コンピュータの記憶部に、コンピュータプログラムをインストールすることにより、コンピュータを学習データ生成装置１０、データ処理装置２０、識別装置３０として機能させることができる。さらにまた、上述したコンピュータプログラムをコンピュータに実行させることにより、コンピュータにより学習データ生成方法、決定木生成方法、決定木による予測方法等を実行することができる。また、そのプログラムは、ネットワークを介してダウンロードするか、或いは、プログラムを記憶した記憶媒体を用いて、更新することができる。 Furthermore, by installing a computer program in the storage unit of the computer, the computer can function as the learning data generation device 10, the data processing device 20, and the identification device 30. Furthermore, by causing a computer to execute the above-described computer program, a learning data generation method, a decision tree generation method, a prediction method using a decision tree, and the like can be executed by the computer. The program can be downloaded via a network or updated using a storage medium storing the program.

上記の実施形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限られない。 A part or all of the above embodiments can be described as in the following supplementary notes, but is not limited thereto.

［付記１］
上述の第１の視点に係るデータ処理装置のとおりである。
［付記２］
前記複数の特徴量のなかから、前記決定木生成部による決定木の生成に用いられる特徴量を選択する、特徴量選択部をさらに備える、付記１のデータ処理装置。
［付記３］
前記特徴量選択部は、
前記学習データに基づく決定木の生成と、
前記生成された決定木の分岐条件に含まれる特徴量それぞれの品質の算出と、
前記品質が算出された特徴量それぞれの重要度の算出と、
前記算出された重要度に基づき、前記学習データに含まれる複数の特徴量から所定の数の特徴量を削除して新たな学習データを生成することと、
前記新たな学習データに含まれる特徴量が、所定の条件を満たすか否かの判定と、
を繰り返すことで、前記決定木生成部による決定木の生成に用いられる特徴量の選択を行う、付記２のデータ処理装置。
［付記４］
前記入力部は、前記学習データと共に、前記細胞画像を識別する識別子により前記細胞画像に関連付けられた付随情報を入力し、
前記決定木生成部による決定木による分類結果それぞれが有する特徴を、前記付随情報に基づき解析する、解析部をさらに備える、付記１乃至３のいずれか一に記載のデータ処理装置。
［付記５］
前記解析部は、
前記付随情報が、前記細胞画像の識別子に対応する患者に対する抗癌剤の有効性に関する結果である場合には、前記分類結果それぞれに含まれる前記細胞画像の識別子に対応する患者への抗癌剤の有効性を解析結果として出力する、付記４のデータ処理装置。
［付記６］
前記解析部は、
前記付随情報が、前記細胞画像の識別子に対応する患者が癌を再発するまでの期間に関する情報である場合には、前記分類結果それぞれに含まれる前記細胞画像の識別子に対応する患者の癌再発に関する傾向を解析結果として出力する、付記４のデータ処理装置。
［付記７］
前記決定木生成部が生成する決定木には、前記細胞画像に含まれる細胞核の大きさ、円形度、一様性及びコントラストのうち少なくとも１つが分岐条件に含まれる、付記１乃至６のいずれか一に記載のデータ処理装置。
［付記８］
前記決定木生成部が生成する決定木は、ルートノードからの最初の分岐条件に、前記細胞画像に含まれる細胞核の円形度を含む、付記１乃至７のいずれか一に記載のデータ処理装置。
［付記９］
前記細胞画像は、肝細胞から得られる画像であり、前記細胞画像に与えられたラベルは前記肝細胞の癌に関するグレード又は患者の癌再発に関する情報である、付記１乃至８のいずれか一に記載のデータ処理装置。
［付記１０］
上述の第２の視点に係る決定木生成方法のとおりである。
［付記１１］
上述の第３の視点に係る識別装置のとおりである。
［付記１２］
上述の第４の視点に係るプログラムのとおりである。
なお、付記１０〜１２の形態は、付記１の形態と同様に、付記２の形態〜付記９の形態に展開することが可能である。 [Appendix 1]
The data processing apparatus according to the first aspect described above.
[Appendix 2]
The data processing apparatus according to appendix 1, further comprising a feature quantity selection unit that selects a feature quantity used for generation of a decision tree by the decision tree generation unit from among the plurality of feature quantities.
[Appendix 3]
The feature amount selection unit includes:
Generation of a decision tree based on the learning data;
Calculating the quality of each feature quantity included in the branch condition of the generated decision tree;
Calculating the importance of each feature quantity for which the quality has been calculated;
Deleting a predetermined number of feature amounts from a plurality of feature amounts included in the learning data based on the calculated importance, and generating new learning data;
Determining whether or not the feature amount included in the new learning data satisfies a predetermined condition;
The data processing apparatus according to appendix 2, wherein the feature quantity used for generation of the decision tree by the decision tree generation unit is selected by repeating the above.
[Appendix 4]
The input unit inputs accompanying information associated with the cell image by an identifier for identifying the cell image together with the learning data,
The data processing apparatus according to any one of appendices 1 to 3, further comprising an analysis unit that analyzes the characteristics of each of the classification results obtained by the decision tree by the decision tree generation unit based on the accompanying information.
[Appendix 5]
The analysis unit
When the accompanying information is a result regarding the effectiveness of the anticancer agent for the patient corresponding to the identifier of the cell image, the effectiveness of the anticancer agent for the patient corresponding to the identifier of the cell image included in each of the classification results is determined. The data processing apparatus according to appendix 4, which is output as an analysis result.
[Appendix 6]
The analysis unit
When the accompanying information is information regarding a period until the patient corresponding to the identifier of the cell image recurs cancer, it relates to cancer recurrence of the patient corresponding to the identifier of the cell image included in each of the classification results. The data processing apparatus according to appendix 4, which outputs a trend as an analysis result.
[Appendix 7]
Any one of appendices 1 to 6, wherein the decision tree generated by the decision tree generation unit includes at least one of the size, circularity, uniformity, and contrast of a cell nucleus included in the cell image as a branching condition. A data processing apparatus according to one.
[Appendix 8]
The data processing apparatus according to any one of appendices 1 to 7, wherein the decision tree generated by the decision tree generation unit includes a circularity of a cell nucleus included in the cell image in a first branch condition from a root node.
[Appendix 9]
The cell image is an image obtained from hepatocytes, and the label given to the cell image is information regarding the grade of cancer of the hepatocytes or cancer recurrence of the patient, according to any one of appendices 1 to 8. Data processing equipment.
[Appendix 10]
This is the same as the decision tree generation method according to the second viewpoint described above.
[Appendix 11]
This is as the identification device according to the third viewpoint described above.
[Appendix 12]
It is as the program which concerns on the above-mentioned 4th viewpoint.
Note that the form of Supplementary Notes 10 to 12 can be expanded to the form of Supplementary Note 2 to the form of Supplementary Note 9 as with the form of Supplementary Note 1.

なお、引用した上記の特許文献等の各開示は、本書に引用をもって繰り込むものとする。本発明の全開示（請求の範囲を含む）の枠内において、さらにその基本的技術思想に基づいて、実施形態ないし実施例の変更・調整が可能である。また、本発明の全開示の枠内において種々の開示要素（各請求項の各要素、各実施形態ないし実施例の各要素、各図面の各要素等を含む）の多様な組み合わせ、ないし、選択が可能である。すなわち、本発明は、請求の範囲を含む全開示、技術的思想にしたがって当業者であればなし得るであろう各種変形、修正を含むことは勿論である。特に、本書に記載した数値範囲については、当該範囲内に含まれる任意の数値ないし小範囲が、別段の記載のない場合でも具体的に記載されているものと解釈されるべきである。 Each disclosure of the cited patent documents and the like cited above is incorporated herein by reference. Within the scope of the entire disclosure (including claims) of the present invention, the embodiments and examples can be changed and adjusted based on the basic technical concept. In addition, various combinations or selections of various disclosed elements (including each element in each claim, each element in each embodiment or example, each element in each drawing, etc.) within the scope of the entire disclosure of the present invention. Is possible. That is, the present invention of course includes various variations and modifications that could be made by those skilled in the art according to the entire disclosure including the claims and the technical idea. In particular, with respect to the numerical ranges described in this document, any numerical value or small range included in the range should be construed as being specifically described even if there is no specific description.

１０、１０ａ学習データ生成装置
１１、２１、１０１入力部
１２特徴量ベクトル生成部
１３学習データ出力部
１４、２５、３４記憶部
２０、２０ａ、２０ｂデータ処理装置
２２特徴量選択部
２３、１０２決定木生成部
２４出力部
２６解析部
２７識別部
３０識別装置
１００データ処理装置
２０１、２０２細胞核領域
２１１境界線
２１２長軸長
２１３短軸長
３０１〜３０７分岐条件
４０１〜４０５分類結果 10, 10a Learning data generation device 11, 21, 101 Input unit 12 Feature amount vector generation unit 13 Learning data output unit 14, 25, 34 Storage unit 20, 20a, 20b Data processing device 22 Feature amount selection unit 23, 102 Decision tree Generation unit 24 Output unit 26 Analysis unit 27 Identification unit 30 Identification device 100 Data processing device 201, 202 Cell nucleus region 211 Boundary line 212 Long axis length 213 Short axis length 301-307 Branching conditions 401-405 Classification result

Claims

An input unit for inputting learning data including a label given to the cell image and a plurality of feature amounts extracted from the cell image;
A decision tree generating unit for generating a decision tree for identifying information corresponding to the label of the sample based on the learning data;
A data processing apparatus.

The data processing apparatus according to claim 1, further comprising a feature quantity selection unit that selects a feature quantity used for generation of a decision tree by the decision tree generation unit from the plurality of feature quantities.

The feature amount selection unit includes:
Generation of a decision tree based on the learning data;
Calculating the quality of each feature quantity included in the branch condition of the generated decision tree;
Calculating the importance of each feature quantity for which the quality has been calculated;
Deleting a predetermined number of feature amounts from a plurality of feature amounts included in the learning data based on the calculated importance, and generating new learning data;
Determining whether or not the feature amount included in the new learning data satisfies a predetermined condition;
The data processing apparatus according to claim 2, wherein the feature amount used for generation of the decision tree by the decision tree generation unit is selected by repeating the above.

The input unit inputs accompanying information associated with the cell image by an identifier for identifying the cell image together with the learning data,
4. The data processing apparatus according to claim 1, further comprising an analysis unit that analyzes characteristics of each classification result obtained by the decision tree by the decision tree generation unit based on the accompanying information. 5.

The analysis unit
When the accompanying information is a result regarding the effectiveness of the anticancer agent for the patient corresponding to the identifier of the cell image, the effectiveness of the anticancer agent for the patient corresponding to the identifier of the cell image included in each of the classification results is determined. The data processing device according to claim 4, wherein the data processing device outputs the result as an analysis result.

The analysis unit
When the accompanying information is information regarding a period until the patient corresponding to the identifier of the cell image recurs cancer, it relates to cancer recurrence of the patient corresponding to the identifier of the cell image included in each of the classification results. The data processing apparatus according to claim 4, wherein the tendency is output as an analysis result.

7. The decision tree generated by the decision tree generation unit includes at least one of the size, circularity, uniformity, and contrast of cell nuclei included in the cell image as a branching condition. A data processing apparatus according to claim 1.

The data processing according to any one of claims 1 to 7, wherein the decision tree generated by the decision tree generation unit includes a circularity of a cell nucleus included in the cell image in a first branch condition from a root node. apparatus.

The cell image is an image obtained from a hepatocyte, and a label given to the cell image is a grade relating to cancer of the hepatocyte or information relating to cancer recurrence of a patient. The data processing apparatus described in 1.

Inputting learning data including one set of a label given to the cell image and a plurality of feature amounts extracted from the cell image;
Generating a decision tree for identifying information corresponding to the label of a sample based on the learning data;
A decision tree generation method including:

An identification apparatus that identifies a sample using the decision tree generated by the decision tree generation method according to claim 10.

A process of inputting learning data including a label given to the cell image and a plurality of feature amounts extracted from the cell image;
Processing for generating a decision tree for identifying information corresponding to the label of the sample based on the learning data;
A program for causing a computer mounted on a data processing apparatus to execute the program.