JPWO2019073962A1

JPWO2019073962A1 - Image processing apparatus and program

Info

Publication number: JPWO2019073962A1
Application number: JP2019517853A
Authority: JP
Inventors: 亮朝岡; 博史村田; 正樹谷戸; 直人柴田
Original assignee: QUEUE INC.; University of Tokyo NUC
Current assignee: QUEUE INC.; University of Tokyo NUC
Priority date: 2017-10-10
Filing date: 2018-10-09
Publication date: 2019-11-14
Anticipated expiration: 2038-10-09
Also published as: JP6734475B2; WO2019073962A1

Abstract

眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とを互いに関連付けた情報を含む学習用データを用いて、眼底写真の画像データと、目の症状に関する情報との関係を学習した状態にある機械学習結果を保持し、処理の対象となる眼底写真の画像データを受け入れ、受け入れた画像データに基づく入力データと、機械学習結果とを用いて、処理の対象となった眼底写真に係る目についての目の症状に関する情報を推定する画像処理装置である。Learn the relationship between fundus photo image data and information about eye symptoms using learning data that includes information that correlates image data of fundus photos and information about eye symptoms corresponding to each fundus photo Holds the machine learning result in the processed state, accepts the image data of the fundus photo to be processed, and uses the input data based on the received image data and the machine learning result to process the fundus photo An image processing apparatus that estimates information related to eye symptoms for the eyes.

Description

本発明は、眼科医療用の画像を処理する画像処理装置及びプログラムに関する。 The present invention relates to an image processing apparatus and a program for processing an image for ophthalmic medicine.

緑内障など、非可逆的な視機能の喪失を伴う疾病については、早期の発見が求められる。ところが緑内障では、その確定診断のための検査が一般に時間のかかるものであり、負担の大きいものであるため、予めスクリーニングによって緑内障の可能性の有無を簡便に検出する方法が求められている。 Early detection is required for diseases with irreversible loss of visual function, such as glaucoma. However, in glaucoma, since the examination for the definitive diagnosis is generally time-consuming and burdensome, there is a need for a method for simply detecting the possibility of glaucoma by screening in advance.

なお、眼底の三次元計測結果を利用して診断用の情報を提供する装置が、特許文献１に開示されている。 An apparatus that provides information for diagnosis using a three-dimensional measurement result of the fundus is disclosed in Patent Document 1.

特開２０１７−７４３２５号公報JP 2017-74325 A

しかしながら上記従来例の装置等においては、眼底の情報に基づいて緑内障症状の有無を検出する場合の条件設定が難しい。これは、当該医師が診断に用いる情報が視神経乳頭部分の乳頭内・乳頭周囲の色調の相互関係、リム菲薄化，陥凹部の深化，ラミナドットサイン，乳頭血管の鼻側偏位，PPA（乳頭周囲網脈絡膜萎縮），乳頭縁出血，網膜神経線維層欠損等の情報の総合的な判断によるものであるためである（「緑内障性視神経乳頭・網膜神経線維層変化判定ガイドライン」，日眼会誌,vol. 110, No.10, p810-，（平成１８年））。また、近年では無散瞳で眼底写真を撮影する機器もあるが、このような機器では三次元的情報を得ることは困難であり、三次元的な情報が必ずしも得られるとは限らない。 However, it is difficult to set conditions for detecting the presence or absence of glaucoma symptoms based on fundus information in the above-described conventional devices. This is because the information used by the doctor for the diagnosis is the interrelationship between the color of the optic disc in and around the nipple, thinning of the rim, deepening of the recess, laminar dot sign, nasal deviation of the papillary vessels, PPA (papillae This is because it is based on comprehensive judgment of information such as peripheral network choroidal atrophy), papillary marginal hemorrhage, retinal nerve fiber layer defect, etc. ("Glaucomatous optic nerve head / retinal nerve fiber layer change guidelines" vol. 110, No. 10, p810-, (2006)). In recent years, there is a device that takes a fundus photograph with a non-mydriatic, but it is difficult to obtain three-dimensional information with such a device, and three-dimensional information is not always obtained.

また、眼底写真は撮影の条件や、対象の個体差によってその色味や血管形状などが大きく異なる。このため、例えば単に画素値等に基づくセグメンテーション処理等は、視神経乳頭陥凹縁のような、緑内障の診断に役立つ画像部分の検出には現実的ではない。 In addition, the color and blood vessel shape of fundus photographs vary greatly depending on the shooting conditions and the individual differences of the target. For this reason, for example, segmentation processing based simply on pixel values or the like is not practical for detecting an image portion that is useful for diagnosing glaucoma, such as a concave edge of the optic disc.

本発明は上記実情に鑑みて為されたもので、眼底画像に基づいて、比較的簡便に緑内障の可能性の有無を検出できる画像処理装置及びプログラムを提供することを、その目的の一つとする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide an image processing apparatus and a program capable of detecting the presence or absence of glaucoma relatively easily based on a fundus image. .

上記従来例の問題点を解決する本発明は、画像処理装置であって、眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とを互いに関連付けた情報を含む学習用データを用いて、眼底写真の画像データと、目の症状に関する情報との関係を機械学習した状態にある機械学習結果を保持する保持手段と、処理の対象となる眼底写真の画像データを受け入れる受入手段と、前記受け入れた画像データに基づく入力データと、前記機械学習結果とを用いて、処理の対象となった眼底写真に係る目についての前記目の症状に関する情報を推定する推定手段と、当該推定の結果を出力する手段と、を含むこととしたものである。 The present invention that solves the problems of the conventional example is an image processing device, and includes learning data including information that correlates image data of fundus photographs and information about eye symptoms corresponding to each fundus photograph. A holding means for holding a machine learning result in a state in which the relationship between image data of a fundus photograph and information on eye symptoms is machine-learned, and a receiving means for receiving image data of a fundus photograph to be processed Using the input data based on the received image data and the machine learning result, estimating means for estimating information about the eye symptom about the eye related to the fundus photograph that has been processed, and the estimation And means for outputting the result.

本発明によると、眼底画像に基づいて、比較的簡便に緑内障の可能性の有無を検出できる。 According to the present invention, the possibility of glaucoma can be detected relatively easily based on the fundus image.

本発明の実施の形態に係る画像処理装置の構成例を表すブロック図である。It is a block diagram showing the example of a structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置が処理する眼底写真の画像データの例を表す説明図である。It is explanatory drawing showing the example of the image data of the fundus photograph which the image processing apparatus which concerns on embodiment of this invention processes. 本発明の実施の形態に係る画像処理装置の例を表す機能ブロック図である。It is a functional block diagram showing the example of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置が出力する画像の例を表す説明図である。It is explanatory drawing showing the example of the image which the image processing apparatus which concerns on embodiment of this invention outputs.

本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る画像処理装置１は、図１に例示するように、制御部１１と、記憶部１２と、操作部１３と、表示部１４と、入出力部１５とを含んで構成される。 Embodiments of the present invention will be described with reference to the drawings. As illustrated in FIG. 1, the image processing apparatus 1 according to the embodiment of the present invention includes a control unit 11, a storage unit 12, an operation unit 13, a display unit 14, and an input / output unit 15. Composed.

ここで制御部１１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部１２に格納されたプログラムに従って動作する。本実施の形態の例では、この制御部１１は、眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とが互いに関連付けられた情報を含む学習用データを用いて、眼底写真の画像データと、目の症状に関する情報との関係を学習した状態にある機械学習結果とを用いた処理を行う。 Here, the control unit 11 is a program control device such as a CPU, and operates according to a program stored in the storage unit 12. In the example of the present embodiment, the control unit 11 uses the learning data including information in which the image data of the fundus photograph and the information related to the eye symptom corresponding to each fundus photograph are associated with each other. The processing using the machine learning result in a state in which the relationship between the image data and the information regarding the eye symptoms is learned is performed.

なお、学習用データは、必ずしもすべての情報が、眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とを互いに関連付けた情報でなくてもよく、一部に対応する目の症状に関する情報が関連付けられていない、眼底写真の画像データが含まれてもよい。 Note that the learning data does not necessarily have to be information that correlates image data of fundus photographs and information about eye symptoms corresponding to each fundus photograph. Image data of fundus photographs that are not associated with information on symptoms may be included.

また機械学習結果は、例えばニューラルネットワークやＳＶＭ（Support Vector Machine）、あるいはベイズ法、樹木構造を基本とする方法などを応用して機械学習した結果であってもよいし、また、半教師つき学習によって得られるモデル等であってもよい。 The machine learning result may be a result of machine learning using, for example, a neural network, SVM (Support Vector Machine), a Bayesian method, or a method based on a tree structure, or semi-supervised learning. The model etc. which are obtained by these may be sufficient.

具体的に制御部１１は、処理の対象となる眼底写真の画像データを受け入れ、当該受け入れた画像データに基づく入力データを入力したときのニューラルネットワークの出力を取得するなど、入力データと機械学習結果とを用いて処理の対象となった眼底写真に係る目についての所定の症状に関する情報を推定する。そして制御部１１は、当該推定の結果を出力する。この制御部１１の詳しい動作の内容については後に説明する。 Specifically, the control unit 11 receives the image data of the fundus photograph to be processed, acquires the output of the neural network when the input data based on the received image data is input, and the input data and the machine learning result Is used to estimate information about a predetermined symptom about the eye related to the fundus photograph that is the target of processing. Then, the control unit 11 outputs the estimation result. Details of the operation of the control unit 11 will be described later.

記憶部１２は、ディスクデバイスやメモリデバイス等であり、制御部１１によって実行されるプログラムを保持する。このプログラムは、コンピュータ可読かつ非一時的な記録媒体に格納されて提供され、この記憶部１２に格納されたものであってもよい。また、本実施の形態では、この記憶部１２は、眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とが互いに関連付けられた学習用データを用いて、眼底写真の画像データと、目の症状に関する情報との関係を学習した状態にある機械学習結果を保持する保持手段としても機能する。この機械学習結果の詳細についても後に述べる。 The storage unit 12 is a disk device, a memory device, or the like, and holds a program executed by the control unit 11. The program may be provided by being stored in a computer-readable non-transitory recording medium and stored in the storage unit 12. Further, in the present embodiment, the storage unit 12 uses the learning data in which the image data of the fundus photograph and the information regarding the eye symptom corresponding to each fundus photograph are associated with each other, and the image data of the fundus photograph It also functions as a holding means for holding a machine learning result in a state in which the relationship between the eye symptom and information related to eye symptoms is learned. Details of the machine learning result will be described later.

操作部１３は、キーボードやマウス等であり、利用者の指示を受け入れて、制御部１１に出力する。表示部１４は、ディスプレイ等であり、制御部１１から入力される指示に従い、情報を表示出力する。 The operation unit 13 is a keyboard, a mouse, or the like, and accepts a user instruction and outputs it to the control unit 11. The display unit 14 is a display or the like, and displays and outputs information in accordance with an instruction input from the control unit 11.

入出力部１５は、例えばＵＳＢ（Universal Serial Bus）等の入出力インタフェースであり、本実施の形態の一例では、処理の対象となる眼底写真の画像データを外部の装置（例えば撮影装置やカードリーダー等）から受け入れて制御部１１に出力する。 The input / output unit 15 is an input / output interface such as USB (Universal Serial Bus), for example. In one example of the present embodiment, the fundus photo image data to be processed is transferred to an external device (for example, a photographing device or a card reader). Etc.) and output to the control unit 11.

次に本実施の形態の制御部１１が利用する機械学習結果について説明する。本実施の形態の一例では、この制御部１１が利用する機械学習結果は、ニューラルネットワークである。この例の制御部１１が利用するニューラルネットワークは、眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とが互いに関連付けられた学習用データを用いて、眼底写真の画像データと、目の症状に関する情報との関係を機械学習した状態としたものである。 Next, the machine learning result used by the control unit 11 of the present embodiment will be described. In an example of the present embodiment, the machine learning result used by the control unit 11 is a neural network. The neural network used by the control unit 11 in this example uses the learning data in which the image data of the fundus photograph and the information related to the eye symptom corresponding to each fundus photograph are associated with each other, This is a machine learning state of the relationship with information about eye symptoms.

一例として、このニューラルネットワークは、残差（Residual）ネットワーク（ResNet：Kaiming He, et.al., Deep Residual Learning for Image Recognition, https://arxiv.org/pdf/1512.03385v1.pdf）を用いて形成される。このニューラルネットワークの学習処理は、一般的なコンピュータを用いて行うことができる。 As an example, this neural network uses a Residual network (ResNet: Kaiming He, et.al., Deep Residual Learning for Image Recognition, https://arxiv.org/pdf/1512.03385v1.pdf). It is formed. This neural network learning process can be performed using a general computer.

具体的に学習処理は、この残差ネットワークに対し、学習データに含まれる、図２（ａ）に例示するような眼底写真の画像データ（眼底写真であることがわかっている画像データ、例えば人為的に事前に収集されたものでよい）を入力して行う。この眼底写真は、散瞳または無散瞳で撮像された二次元の眼底写真でよいが、少なくとも視神経乳頭の像Ｙが含まれるものとする。この視神経乳頭の像Ｙは、他の部分（視神経乳頭に相当する部分以外の部分）の像よりも比較的明度の高い領域として撮像されるのが一般的である。なお、眼底写真には多くの血管Ｂが撮像される。 More specifically, the learning process is performed on the residual network, such as image data of a fundus photograph (image data that is known to be a fundus photograph, for example, human artifacts) included in the learning data, as illustrated in FIG. That can be collected in advance). This fundus photograph may be a two-dimensional fundus photograph taken with mydriasis or non-mydriasis, but at least an image Y of the optic disc is included. The image Y of the optic nerve head is generally imaged as a region having a relatively higher brightness than the image of other parts (parts other than the part corresponding to the optic nerve head). Note that many blood vessels B are imaged in the fundus photograph.

また学習データに含める、目の症状に関する情報として、当該眼底写真を参照した眼科医による緑内障の疑いの有無を表す情報を用いる。この場合、例えば学習処理において入力する画像データである各眼底写真について、当該眼底写真を参照した複数人の眼科医のうち、緑内障の疑いがあると診断した眼科医の割合をもって、当該眼底写真の目の症状に関する情報として、当該眼底写真の画像データに関連付けて学習データとしてもよい。 In addition, as information regarding eye symptoms to be included in the learning data, information indicating the presence or absence of suspected glaucoma by an ophthalmologist referring to the fundus photograph is used. In this case, for example, for each fundus photograph that is image data input in the learning process, among the plurality of ophthalmologists referring to the fundus photograph, the ratio of the ophthalmologist diagnosed as suspected of glaucoma is used. Information relating to eye symptoms may be learning data in association with image data of the fundus photograph.

本実施の形態のこの例では、例えば残差ネットワーク等のニューラルネットワークの出力ベクトルの次元Ｎを「１」とし、緑内障の疑いの程度を表すパラメータとする。そして、各学習データの眼底写真の画像データを入力したときのニューラルネットワークの出力と、当該入力した眼底写真の画像データに関連付けられている緑内障の疑いの有無を表す情報とを用いて学習処理する。この学習処理は、広く知られた、バックプロパゲーションの処理等を用いた処理により行うことができるので、ここでの詳しい説明を省略する。 In this example of the present embodiment, for example, the dimension N of the output vector of a neural network such as a residual network is set to “1”, which is a parameter indicating the degree of suspected glaucoma. Then, learning processing is performed using the output of the neural network when the fundus photo image data of each learning data is input, and information indicating the presence or absence of suspected glaucoma associated with the input fundus photo image data. . Since this learning process can be performed by a widely known process using a back-propagation process or the like, a detailed description thereof is omitted here.

また、本実施の形態の一例では、緑内障の疑いの有無を直接判断するのではなく、その支援のための情報を提示してもよい。例えば本実施の形態の一例では視神経乳頭陥凹縁の位置を描画して提示する。この場合は、各眼底写真内に、医師（眼科医）が人為的に描画した、視神経乳頭陥凹縁の位置を表す曲線（図２（ａ）のＸ；一般に視神経乳頭陥凹縁は視神経乳頭の像Ｙ内の閉曲線となる）を表す情報を、学習データに含める、目の症状に関する情報として用いる。この曲線を表す情報は、曲線自体が通過する眼底写真の画像データ内の画素の位置を表す座標情報の組であってもよい。 Further, in the example of the present embodiment, information for support may be presented instead of directly determining whether or not there is a suspicion of glaucoma. For example, in the example of the present embodiment, the position of the optic disc recess edge is drawn and presented. In this case, in each fundus photograph, a curve (X in FIG. 2 (a); generally, the optic disc recessed edge is the optic disc, which is artificially drawn by a doctor (ophthalmologist) and represents the position of the optic disc recessed edge. Information representing a closed curve in the image Y) is used as information regarding eye symptoms included in the learning data. The information representing the curve may be a set of coordinate information representing the position of the pixel in the image data of the fundus photograph through which the curve itself passes.

なお、緑内障の疑いの有無を判断する際の支援情報としては、視神経乳頭陥凹縁の位置を表す曲線のほか、網膜神経繊維層の楔状欠損部分等の位置等、緑内障の診断に用いられる画像上の他の特徴的部分を示すものであってもよい。 In addition, as support information for determining the presence or absence of suspected glaucoma, images used for diagnosis of glaucoma, such as the position of the wedge-shaped defect part of the retinal nerve fiber layer, as well as the curve representing the position of the optic disc recessed edge It may also indicate other characteristic parts above.

本実施の形態のこの例では、例えば残差ネットワークの出力ベクトルの次元Ｎを、画像データの画素数Ｎに一致させ、視神経乳頭陥凹縁を表す、人為的に描画された閉曲線が通過する画素について「１」、そうでない画素について「０」とするベクトルを正解として、残差ネットワークを学習処理する。この学習処理もまた、広く知られた、バックプロパゲーションの処理等を用いた処理により行うことができるので、ここでの詳しい説明を省略する。 In this example of the present embodiment, for example, a pixel through which an artificially drawn closed curve that passes through the dimension N of the output vector of the residual network matches the number N of pixels of the image data and represents the optic nerve crevice edge passes. The residual network is subjected to a learning process, with a vector that is “1” for “0” and “0” for the other pixels as a correct answer. This learning process can also be performed by a widely known process such as a back-propagation process, and a detailed description thereof will be omitted here.

また、本実施の形態の、眼底写真の画像データと、各眼底写真に対応する目の症状に関する情報とが互いに関連付けられた学習用データを用いて、眼底写真の画像データと、目の症状に関する情報との関係を学習した状態にあるニューラルネットワークは、残差ネットワークに限られない。 Further, using the learning data in which the image data of the fundus photograph and the information about the eye condition corresponding to each fundus photograph of the present embodiment are associated with each other, the image data of the fundus photograph and the eye condition The neural network that has learned the relationship with the information is not limited to the residual network.

このニューラルネットワークは、一般的な畳み込みネットワーク（ＣＮＮ）であってもよい。また、一般的な畳み込みネットワークの最終層をＳＶＭ（Support Vector Machine）としてもよい。さらに一般的な畳み込みネットワークにおいて活性化関数をLeaky Relu（活性化関数φ（ｘ）＝max(0.01ｘ,ｘ)としたもの。ただし、max（ａ，ｂ）は、ａ，ｂのうち大きい方の値）としたものを用いてもよい。また、Dense Net（Gao Huang, et. al., Densely Connected Convolutional Network, arXiv:1608.06993）を用いてもよい。 This neural network may be a general convolutional network (CNN). Further, the final layer of a general convolution network may be SVM (Support Vector Machine). Further, in a general convolution network, the activation function is Leaky Relu (activation function φ (x) = max (0.01x, x), where max (a, b) is the larger of a and b. May be used. Alternatively, Dense Net (Gao Huang, et. Al., Densely Connected Convolutional Network, arXiv: 1608.06993) may be used.

さらに本実施の形態では、このようにして形成したニューラルネットワークを参照ネットワークとして用いて、比較的層の少ない、例えば２層の全結合層を有した別のニューラルネットワークを、同じ学習データを入力した場合の参照ネットワークの出力を正解として用いて学習した、いわゆる「蒸留」の処理により得られたネットワークであってもよい。 Furthermore, in this embodiment, the neural network formed in this way is used as a reference network, and the same learning data is input to another neural network having a relatively small number of layers, for example, two fully connected layers. The network obtained by the so-called “distillation” process learned using the output of the reference network as a correct answer may be used.

さらに本実施の形態では、一つの学習データＡについて、当該学習データＡである画像データＰａを反転、あるいは回転して得た画像データや、当該画像データＰａにノイズ（ランダムなドット等）を付加した画像データを、画像データＰａとは異なる画像データとして入力してもよい。 Further, in the present embodiment, for one piece of learning data A, image data Pa obtained by inverting or rotating the learning data A, or noise (random dots or the like) is added to the image data Pa. The processed image data may be input as image data different from the image data Pa.

この場合、正解に相当する緑内障の疑いの有無を表す情報は、画像データＰａを反転、あるいは回転して得た画像データを学習データとしたときも、ノイズを付加したときにも、元の学習データに対応する情報をそのまま用いるものとする。 In this case, the information indicating the presence or absence of suspicion of glaucoma corresponding to the correct answer is the original learning whether the image data obtained by inverting or rotating the image data Pa is used as learning data or when noise is added. Information corresponding to the data is used as it is.

また、このように学習データＡである画像データＰａを反転、あるいは回転して得た画像データや、当該画像データＰａにノイズ（ランダムなドット等）を付加した画像データを、画像データＰａとは異なる画像データとして入力する場合において、目の症状に関する情報として、視神経乳頭陥凹縁を表す、人為的に描画された閉曲線を用いるときには、次のようにする。 In addition, image data Pa obtained by reversing or rotating the image data Pa, which is the learning data A, or image data obtained by adding noise (random dots or the like) to the image data Pa is referred to as image data Pa. In the case of inputting as different image data, when using an artificially drawn closed curve representing the optic papular depression edge as the information about the eye symptom, the following is performed.

すなわち正解に相当する視神経乳頭陥凹縁を表す、人為的に描画された閉曲線は、画像データＰａを反転、あるいは回転して得た画像データを学習データとしたときには、当該画像データＰａと同じ変形（反転あるいは回転等）を施したものを用いるものとし、ノイズを付加したときには、元の学習データに対応する情報をそのまま用いるものとする。 In other words, an artificially drawn closed curve representing the optic disc dip corresponding to the correct answer is the same deformation as the image data Pa when image data obtained by inverting or rotating the image data Pa is used as learning data. The information subjected to (inversion or rotation) is used, and when noise is added, the information corresponding to the original learning data is used as it is.

本実施の形態では、このような学習処理後の状態にあるニューラルネットワークの情報を、記憶部１２に格納しておく。このニューラルネットワークの情報は、例えばネットワークを介して、あるいはコンピュータ可読かつ非一時的な記録媒体に格納されて提供され、記憶部１２に格納される。 In the present embodiment, information on the neural network in such a state after the learning process is stored in the storage unit 12. The information of the neural network is provided via, for example, a network or stored in a computer-readable and non-transitory recording medium, and stored in the storage unit 12.

［機械学習結果の他の例］
また本実施の形態の機械学習結果の情報は、ニューラルネットワークの例に限られない。ここでの機械学習結果の情報は、例えばＳＶＭ（Support Vector Machine）やベイズの事後分布の情報や樹木構造を基本とする方法を応用して機械学習したものであってもよい。[Other examples of machine learning results]
Further, the machine learning result information of the present embodiment is not limited to an example of a neural network. The machine learning result information here may be information learned by applying a method based on SVM (Support Vector Machine), Bayesian posterior distribution information or tree structure, for example.

これらＳＶＭやベイズの事後分布の情報や樹木構造を基本とする方法を応用して機械学習したものを機械学習結果とする場合、入力する情報としては眼底写真の画像データの画素値を配列したベクトル、または眼底写真の画像データから得られた所定の特徴量の情報（例えば視神経乳頭陥凹縁で囲まれた面積の大きさ等）を用い、また眼底写真に係る目についての所定の症状に関する情報として、緑内障の疑いの有無を表す情報を用いる。 When machine learning results are obtained by applying machine learning based on a method based on SVM or Bayesian posterior distribution information or a tree structure, a vector in which pixel values of fundus photographic image data are arrayed is input information. Or information on a predetermined characteristic amount obtained from image data of the fundus photograph (for example, the size of the area surrounded by the ridge of the optic disc), and information on a predetermined symptom of the eye related to the fundus photograph Information indicating whether or not there is a suspicion of glaucoma.

これらの情報を学習データとして用いることで、例えばＳＶＭの機械学習により、緑内障の疑いの有無を識別する識別境界面を特定する情報を得る。また、ベイズ推定の事後分布を、緑内障の疑いの確率として得るためのパラメータや、樹木構造によって緑内障の疑いがあるか否かを判定するクラスタリングのモデルを機械学習する。 By using these pieces of information as learning data, for example, information for identifying an identification boundary surface that identifies the presence or absence of suspicion of glaucoma is obtained by machine learning of SVM. Also, machine learning is performed on a parameter for obtaining a posterior distribution of Bayesian estimation as a probable probability of glaucoma and a clustering model for determining whether or not there is a suspicion of glaucoma based on a tree structure.

［半教師あり学習］
また、本実施の形態の一例では、学習データの一部には、眼底写真に係る目についての所定の症状に関する情報が関連付けられていない、眼底写真の画像データが含まれてもよい。[Semi-supervised learning]
Further, in an example of the present embodiment, part of the learning data may include image data of a fundus photograph that is not associated with information related to a predetermined symptom about the eye related to the fundus photograph.

このような学習データを用いた機械学習方法は、半教師あり学習（semi-supervised Learning）として広く知られているので、ここでの詳しい説明を省略する。 Since the machine learning method using such learning data is widely known as semi-supervised learning, detailed description thereof is omitted here.

［制御部の動作］
次に、本実施の形態の制御部１１の動作について説明する。本実施の形態に係る制御部１１は、図３に例示するように、受入部２１と、推定部２２と、出力部２３とを機能的に含む。[Operation of control unit]
Next, operation | movement of the control part 11 of this Embodiment is demonstrated. As illustrated in FIG. 3, the control unit 11 according to the present embodiment functionally includes a reception unit 21, an estimation unit 22, and an output unit 23.

受入部２１は、処理の対象となる眼底写真の画像データを受け入れ、推定部２２に出力する。ここで受入部２１が受け入れる眼底写真の画像データもまた、散瞳または無散瞳で撮像された二次元の眼底写真でよいが、少なくとも視神経乳頭の像が含まれるものとする。なお、眼底写真は、少なくとも視神経乳頭の像が含まれるのであれば、医療用の専門のカメラではなく、一般的なカメラ（スマートフォン等のカメラを含む）にて撮像されたものであってもよい。 The accepting unit 21 accepts image data of a fundus photograph to be processed and outputs it to the estimating unit 22. Here, the image data of the fundus photograph received by the receiving unit 21 may also be a two-dimensional fundus photograph captured with a mydriatic or non-mydriatic pupil, but at least an image of the optic disc is included. Note that the fundus photograph may be taken with a general camera (including a camera such as a smartphone) instead of a specialized medical camera as long as at least an image of the optic disc is included. .

推定部２２は、受入部２１が受け入れた画像データに基づく入力データを入力したときの、記憶部１２に格納されている機械学習結果を利用した出力を取得する。推定部２２は、ここで取得した機械学習結果を利用した出力に基づいて、処理の対象となった眼底写真に係る目についての、所定の症状に関する情報を推定する。 The estimation unit 22 acquires an output using the machine learning result stored in the storage unit 12 when input data based on the image data received by the reception unit 21 is input. Based on the output using the machine learning result acquired here, the estimation unit 22 estimates information related to a predetermined symptom about the eye related to the fundus photograph that is the target of processing.

一例として、既に述べた例のように機械学習結果がニューラルネットワークであり、所定の症状に関する情報が視神経乳頭陥凹縁の位置を表す曲線とする場合、推定部２２は、視神経乳頭陥凹縁の位置を表す曲線が通過する画素の群を推定することとなる。 As an example, when the machine learning result is a neural network as in the example described above, and the information on the predetermined symptom is a curve representing the position of the optic disc recessed edge, the estimation unit 22 determines the optic disc recessed edge. The group of pixels through which the curve representing the position passes is estimated.

出力部２３は、当該推定の結果を出力する。具体的にこの出力部２３は、推定部２２が視神経乳頭陥凹縁の位置を表す曲線が通過する画素の群を推定した場合、受入部２１が受け入れた眼底写真の画像データに、推定の結果となった画素の群に含まれる各画素を強調表示した像を重ね合わせて表示する（図４）。なお、図４では視神経乳頭部を拡大した例を示している。 The output unit 23 outputs the estimation result. Specifically, when the estimating unit 22 estimates a group of pixels through which a curve representing the position of the optic disc recessed edge passes, the output unit 23 adds the result of the estimation to the image data of the fundus photograph received by the receiving unit 21. An image in which each pixel included in the pixel group is highlighted is displayed in a superimposed manner (FIG. 4). FIG. 4 shows an example in which the optic papilla is enlarged.

また推定部２２が、機械学習結果を利用した出力に基づいて、処理の対象となった眼底写真に係る目についての、所定の症状に関する情報として、緑内障と診断される確率を表す情報を推定する場合、出力部２３は、当該推定の結果である数値を出力することとすればよい。 In addition, the estimation unit 22 estimates information representing the probability of diagnosing glaucoma as information about a predetermined symptom about the eye related to the fundus photograph that is the target of processing based on the output using the machine learning result. In this case, the output unit 23 may output a numerical value that is a result of the estimation.

［動作］
本実施の形態の画像処理装置１は、基本的には以上の構成を備えており、次のように動作する。本実施の形態の画像処理装置１のある例では、残差ネットワークを用い、図２（ａ）に例示した眼底写真の画像データと、当該画像データに対して人為的に描画された視神経乳頭陥凹縁を表す閉曲線が通過する各画素の位置を表すデータとを学習データとし、眼底写真の画像データを入力したときの出力として、当該閉曲線の各画素の位置を表す情報がベクトルとして得られるよう学習処理したニューラルネットワークを記憶部１２に格納している。[Operation]
The image processing apparatus 1 according to the present embodiment basically has the above configuration and operates as follows. In an example of the image processing apparatus 1 according to the present embodiment, using a residual network, image data of the fundus photograph illustrated in FIG. 2A and an optic disc defect artificially drawn on the image data. The data representing the position of each pixel through which the closed curve representing the concave edge passes is used as learning data, and the information representing the position of each pixel of the closed curve is obtained as a vector as an output when inputting image data of the fundus photograph. The learned neural network is stored in the storage unit 12.

利用者が推定の処理の対象（緑内障症状の有無を推定する処理の対象）となる眼底写真の画像データを画像処理装置１に入力すると、画像処理装置１は、当該画像データを受け入れ、当該受け入れた画像データに基づく入力データを入力したときの、記憶部１２に格納されているニューラルネットワークの出力を取得する。 When a user inputs image data of a fundus photograph that is an object of estimation processing (an object of processing for estimating the presence or absence of glaucoma symptoms) to the image processing apparatus 1, the image processing apparatus 1 accepts the image data and accepts the image data. When the input data based on the image data is input, the output of the neural network stored in the storage unit 12 is acquired.

画像処理装置１は、ここで取得したニューラルネットワークの出力として、処理の対象となった眼底写真に係る目についての、視神経乳頭陥凹縁の位置を表す曲線が通過する画素の群を推定した結果を得る。そして画像処理装置１は、受け入れた眼底写真の画像データに、推定の結果となった画素の群に含まれる各画素を強調表示した像を重ね合わせて表示出力する（図４）。 As a result of the neural network acquired here, the image processing apparatus 1 estimates a group of pixels through which a curve representing the position of the optic disc recessed edge passes through the eye related to the fundus photograph that is the processing target. Get. Then, the image processing apparatus 1 superimposes and displays an image in which each pixel included in the group of pixels obtained as a result of estimation is superimposed on the received fundus photographic image data (FIG. 4).

この表示された画像に表された視神経乳頭陥凹縁の形状により、利用者が緑内障の症状の有無を判断する。 The user determines the presence or absence of a glaucoma symptom based on the shape of the depressed edge of the optic nerve head shown in the displayed image.

本実施の形態のこの例によると、二次元的な眼底画像に基づいて、比較的簡便に緑内障の可能性の有無を検出できる。 According to this example of the present embodiment, the presence or absence of the possibility of glaucoma can be detected relatively easily based on the two-dimensional fundus image.

［前処理］
また本実施の形態の画像処理装置１の記憶部１２が保持する機械学習結果は、次のように学習処理されたものであってもよい。すなわち学習データとして図２（ｂ）に例示するように、眼底写真の画像データのうち、視神経乳頭部に相当する画像の範囲を特定し、当該特定した画像範囲を含む部分画像データを抽出したものを用いてもよい。[Preprocessing]
Further, the machine learning result held by the storage unit 12 of the image processing apparatus 1 according to the present embodiment may be a learning process that is performed as follows. That is, as illustrated in FIG. 2B as learning data, a range of an image corresponding to the optic papilla in the fundus photographic image data is specified, and partial image data including the specified image range is extracted. May be used.

具体的に、眼底写真において視神経乳頭部は、他の部分（視神経乳頭に相当する部分以外の部分）の像よりも比較的明度の高い領域として撮像されるため、学習処理を行うコンピュータは、互いに隣接する一対の画素の画素値同士を比較して、予め定めたしきい値より大きい差となる一対の画素を見いだす（いわゆる輪郭線検出処理）。そして学習処理を行うコンピュータは、当該見いだした画素のうち輝度が比較的低い側の画素を視神経乳頭部の輪郭線として検出することとしてもよい。 Specifically, in the fundus photograph, the optic papilla is captured as an area having a relatively higher brightness than the image of other parts (parts other than the part corresponding to the optic papilla). The pixel values of a pair of adjacent pixels are compared to find a pair of pixels that are larger than a predetermined threshold (so-called contour detection processing). And the computer which performs learning processing is good also as detecting the pixel of the comparatively low brightness | luminance among the found pixels as an outline of an optic nerve head.

学習処理を行うコンピュータでは、検出した視神経乳頭部の輪郭線に外接する正方形の情報を生成し、この正方形内の画像部分が所定のサイズ（例えば３２×３２画素）となるよう画像データを縮小・拡大変換する。次に、生成した正方形の中心を中心とした、上記所定のサイズより大きいサイズの正方形の範囲（例えば６４×６４画素の範囲）を切り出す。このとき、切り出そうとする部分に、元の画像データに含まれない部分がある場合は、当該部分は黒色の画素でパディングして、入力データとする（図２（ｂ））。 The computer that performs the learning process generates square information circumscribing the detected outline of the optic papilla, and reduces the image data so that the image portion within the square has a predetermined size (for example, 32 × 32 pixels). Enlarge conversion. Next, a square range (for example, a range of 64 × 64 pixels) having a size larger than the predetermined size centered on the center of the generated square is cut out. At this time, if there is a portion that is not included in the original image data in the portion to be cut out, the portion is padded with black pixels and used as input data (FIG. 2B).

この学習処理を行うコンピュータはさらに、コントラストを正規化するため、変換後の画像データ（パディング前の画像データとする）について平均画素値を演算して変換後の画像データの各画素の値から差し引く処理を行ってもよい。 In order to normalize the contrast, the computer that performs the learning process further calculates an average pixel value of the converted image data (image data before padding) and subtracts it from the value of each pixel of the converted image data. Processing may be performed.

また、学習処理を行うコンピュータは、上記処理を行った眼底写真の画像データに対する緑内障の疑いの有無を表す情報や、人為的に描画された視神経乳頭陥凹縁を表す閉曲線の情報など、眼底写真に係る目についての、所定の症状に関する情報を得る。 In addition, the computer that performs the learning process is a fundus photograph such as information indicating the presence or absence of suspected glaucoma in the image data of the fundus photograph that has been subjected to the above process, and information of a closed curve that represents an artificially drawn optic disc recessed edge. Get information about a given symptom about the eye.

なお、眼底写真に係る目についての、所定の症状に関する情報として、人為的に描画された視神経乳頭陥凹縁を表す閉曲線のように、眼底写真の画像データ上に重ね合わせて描画できる情報を用いる場合、当該眼底写真に係る目についての、所定の症状に関する情報である画像データについても、眼底写真の画像データの拡大縮小変換及び切り出しと同じ変換を行う。例えば、人為的に描画された視神経乳頭陥凹縁を表す閉曲線が通過する各画素の位置を表すデータについても、画像データの拡大縮小変換及び切り出しと同じ変換を行って、変換後の画像データの対応する画素の位置のデータに変換する。この変換処理は、拡大縮小変換、及び切り出し処理という広く知られた方法により行うことができるものであるため、ここでの詳しい説明は省略する。 In addition, information that can be overlaid on the image data of the fundus photographic image is used as information about a predetermined symptom regarding the eye related to the fundus photographic image, such as a closed curve representing an artificially drawn optic disc dip. In this case, the same conversion as the enlargement / reduction conversion and cut-out of the image data of the fundus photo is performed on the image data that is information about a predetermined symptom of the eye related to the fundus photo. For example, for the data representing the position of each pixel through which a closed curve representing an artificially drawn optic disc recessed edge passes, the same conversion as the enlargement / reduction conversion and extraction of the image data is performed, and the converted image data Conversion into corresponding pixel position data. Since this conversion process can be performed by widely known methods such as enlargement / reduction conversion and cut-out process, detailed description thereof is omitted here.

そして学習処理を行うコンピュータは、例えば６４×６４次元のデータを入力とし、３２×３２次元のベクトルを出力する残差ネットワークに対して、入力データとした、視神経乳頭部に相当する画像を含む部分画像データを入力し、その出力を、入力データに対応する視神経乳頭陥凹縁を表す閉曲線の情報で学習処理するなど、学習処理を実行する。このような学習処理も、機械学習の態様に応じて、バックプロパゲーション等の広く知られた処理などにより行うことができる。 The computer that performs the learning process includes, for example, a portion that includes an image corresponding to the optic nerve head as input data for a residual network that receives 64 × 64-dimensional data and outputs a 32 × 32-dimensional vector. The image data is input, and the learning process is executed, for example, the output is subjected to the learning process with information of the closed curve representing the optic disc depression edge corresponding to the input data. Such a learning process can also be performed by a widely known process such as backpropagation according to the machine learning mode.

この例により学習処理して得られた残差ネットワークは、視神経乳頭部に相当する画像を含む部分画像データを入力したときに、視神経乳頭部に相当する画像を含む部分画像データ内で、視神経乳頭陥凹縁が通過する画素を推定した結果を出力するものとなる。 The residual network obtained by the learning process according to this example, when the partial image data including the image corresponding to the optic papilla is input, the partial image data including the image corresponding to the optic papilla in the partial image data including the image corresponding to the optic papilla The result of estimating the pixel through which the recessed edge passes is output.

この例では、制御部１１は、推定部２２の処理として次のような動作を行う。すなわち、本実施の形態のこの例の推定部２２では、受入部２１が受け入れた画像データから、視神経乳頭部に相当する画像の範囲を特定し、当該特定した画像範囲を含む、受け入れた画像データの一部である部分画像データを抽出し、当該抽出した部分画像データを入力データとして、記憶部１２に格納したニューラルネットワークに入力して、視神経乳頭陥凹縁を表す閉曲線の情報の推定結果を得る。 In this example, the control unit 11 performs the following operation as the processing of the estimation unit 22. That is, in the estimation unit 22 of this example of the present embodiment, an image range corresponding to the optic papilla is specified from the image data received by the receiving unit 21, and the received image data including the specified image range. The partial image data that is a part of the image is extracted, and the extracted partial image data is input as input data to the neural network stored in the storage unit 12, and the estimation result of the information on the closed curve representing the optic disc recessed edge is obtained. obtain.

またこのとき、特定した画像範囲が、画像内の予め定めた位置となるよう、部分画像データを抽出する。 At this time, the partial image data is extracted so that the specified image range is a predetermined position in the image.

すなわち、推定部２２は、受入部２１が受け入れた眼底写真の画像データについて輪郭線検出処理を行い、互いに隣接する一対の画素であって、各画素値の差が予め定めたしきい値より大きい差となる一対の画素を見いだし、当該見いだした画素のうち輝度が比較的低い側の画素を視神経乳頭部の輪郭線として検出する。 That is, the estimation unit 22 performs a contour detection process on the image data of the fundus photograph received by the receiving unit 21, and is a pair of pixels adjacent to each other, and the difference between the pixel values is greater than a predetermined threshold value. A pair of pixels as a difference is found, and a pixel having a relatively low luminance among the found pixels is detected as an outline of the optic papilla.

そして推定部２２は、検出した視神経乳頭部の輪郭線に外接する正方形の情報を生成し、この正方形内の画像部分が所定のサイズ（例えば３２×３２画素）となるよう画像データを縮小・拡大変換する。次に、生成した正方形の中心を中心とした、上記所定のサイズより大きいサイズの正方形の範囲（例えば６４×６４画素の範囲）を切り出す。このとき、切り出そうとする部分に、元の画像データに含まれない部分がある場合は、当該部分は黒色の画素でパディングして、入力データとする（図２（ｂ）と同様）。これにより、視神経乳頭部に相当する画像の範囲を含み、当該範囲が画像内の予め定めた位置となる部分画像データが抽出される。 The estimation unit 22 generates square information circumscribing the detected outline of the optic papilla, and reduces or enlarges the image data so that the image portion in the square has a predetermined size (for example, 32 × 32 pixels). Convert. Next, a square range (for example, a range of 64 × 64 pixels) having a size larger than the predetermined size centered on the center of the generated square is cut out. At this time, if there is a portion that is not included in the original image data in the portion to be cut out, the portion is padded with black pixels and used as input data (similar to FIG. 2B). Thereby, the partial image data including the range of the image corresponding to the optic nerve head and in which the range is a predetermined position in the image is extracted.

推定部２２は、ここで抽出した部分画像データを、記憶部１２に格納されているニューラルネットワークに入力して、その出力を取得する。推定部２２は、ここで取得したニューラルネットワークの出力に基づいて、処理の対象となった眼底写真に係る目についての、所定の症状に関する情報を推定する。ここでの例では、所定の症状に関する情報は、視神経乳頭陥凹縁の位置を表す曲線としているので、推定部２２は、視神経乳頭陥凹縁の位置を表す曲線が通過する画素の群を推定することとなる。出力部２３は、当該推定の結果を出力する。具体的にこの出力部２３は、受入部２１が受け入れた眼底写真の画像データに、推定の結果となった画素の群に含まれる各画素を強調表示した像を重ね合わせて表示する（図４）。 The estimation unit 22 inputs the partial image data extracted here to the neural network stored in the storage unit 12 and acquires the output. Based on the output of the neural network acquired here, the estimation unit 22 estimates information related to a predetermined symptom about the eye related to the fundus photograph that is the target of processing. In this example, since the information regarding the predetermined symptom is a curve representing the position of the optic disc recessed edge, the estimation unit 22 estimates a group of pixels through which the curve representing the position of the optic disc recessed edge passes. Will be. The output unit 23 outputs the estimation result. Specifically, the output unit 23 superimposes and displays an image in which each pixel included in the group of pixels as a result of estimation is highlighted on the image data of the fundus photograph received by the receiving unit 21 (FIG. 4). ).

また、視神経乳頭陥凹縁を表す閉曲線の情報ではなく、緑内障と診断される確率を表す情報を用いて学習処理を行うコンピュータは、例えば６４×６４次元のデータを入力とし、１次元のスカラ量を出力するニューラルネットワークに対して、入力データとした、視神経乳頭部に相当する画像を含む部分画像データを入力し、その出力を、入力データに対応する緑内障と診断される確率を表す情報（例えば複数人の眼科医に対して、当該入力データである眼底写真を提示したときに、緑内障と診断した医師の割合）で学習処理するなどして、学習処理を実行する。この学習処理も、機械学習の態様に応じて、バックプロパゲーション等の広く知られた処理などにより行うことができる。 In addition, a computer that performs learning processing using information representing the probability of being diagnosed with glaucoma instead of information about a closed curve representing the optic disc recessed edge receives, for example, 64 × 64-dimensional data as a one-dimensional scalar quantity. Is input to the neural network that outputs the partial image data including the image corresponding to the optic nerve head, and the output is information indicating the probability of diagnosing glaucoma corresponding to the input data (for example, The learning process is executed, for example, by performing a learning process on the number of doctors diagnosed with glaucoma when a fundus photograph as the input data is presented to a plurality of ophthalmologists. This learning process can also be performed by a well-known process such as back propagation according to the machine learning mode.

この例により学習処理して得られたニューラルネットワークは、視神経乳頭部に相当する画像を含む部分画像データを入力したときに、緑内障と診断される確率を推定した結果を出力するものとなる。 The neural network obtained by the learning process according to this example outputs a result of estimating the probability of diagnosing glaucoma when partial image data including an image corresponding to the optic nerve head is input.

この例では、制御部１１は、推定部２２の処理として次のような動作を行う。すなわち、本実施の形態のこの例の推定部２２では、受入部２１が受け入れた画像データから、視神経乳頭部に相当する画像の範囲を特定し、当該特定した画像範囲を含む、受け入れた画像データの一部である部分画像データを抽出し、当該抽出した部分画像データを入力データとして、記憶部１２に格納したニューラルネットワークに入力して、緑内障と診断される確率の推定結果を得る。 In this example, the control unit 11 performs the following operation as the processing of the estimation unit 22. That is, in the estimation unit 22 of this example of the present embodiment, an image range corresponding to the optic papilla is specified from the image data received by the receiving unit 21, and the received image data including the specified image range. Is extracted, and the extracted partial image data is input as input data to the neural network stored in the storage unit 12 to obtain an estimation result of the probability of diagnosing glaucoma.

またこのときも、特定した画像範囲が、画像内の予め定めた位置となるよう、部分画像データを抽出する。 Also at this time, partial image data is extracted so that the specified image range is a predetermined position in the image.

推定部２２は、ここで抽出した部分画像データを、記憶部１２に格納されているニューラルネットワークに入力して、その出力を取得する。推定部２２は、ここで取得したニューラルネットワークの出力に基づいて、処理の対象となった眼底写真に係る目についての、所定の症状に関する情報を推定する。ここでの例では、所定の症状に関する情報は、緑内障と診断される確率としているので、推定部２２は、緑内障と診断される確率を推定することとなる。出力部２３は、当該推定の結果である数値を表示出力する。 The estimation unit 22 inputs the partial image data extracted here to the neural network stored in the storage unit 12 and acquires the output. Based on the output of the neural network acquired here, the estimation unit 22 estimates information related to a predetermined symptom about the eye related to the fundus photograph that is the target of processing. In this example, the information related to the predetermined symptom is the probability that glaucoma is diagnosed, and therefore the estimation unit 22 estimates the probability that glaucoma is diagnosed. The output unit 23 displays and outputs a numerical value that is a result of the estimation.

［前処理の他の例］
また学習データとして視神経乳頭部に相当する画像の範囲を特定する例に代えて、視神経乳頭部だけでなく、視神経乳頭部と黄斑部とを含む範囲を特定して、当該特定した画像範囲を含む部分画像データを抽出したものを学習データと、処理対象の入力データとに用いてもよい。[Other examples of pre-processing]
Further, instead of an example of specifying a range of an image corresponding to the optic papilla as learning data, not only the optic papilla but also a range including the optic papilla and the macula is specified, and the specified image range is included. The partial image data extracted may be used as learning data and input data to be processed.

一般に緑内障に対応する眼底の変化は視神経乳頭部から黄斑部へ向けて拡大するので、このように視神経乳頭部と黄斑部とを含む範囲を切り出して学習処理の対象とし、また、その学習結果であるニューラルネットワークなど、機械学習結果の入力データとして当該視神経乳頭部と黄斑部とを含む範囲を切り出した画像データを入力して推定処理を実行することで、より多くの情報に基づく推定が可能となる。 In general, changes in the fundus corresponding to glaucoma increase from the optic papilla to the macula, so that the range including the optic papilla and the macula is cut out as the target of the learning process. It is possible to estimate based on more information by inputting image data obtained by cutting out the range including the optic papilla and macula as input data of machine learning results, such as a neural network Become.

さらに、本実施の形態のある例では、学習データに含まれる画像データと、処理対象とする入力データの画像データとにおいて、血管部分を強調する処理を施してもよい。血管部分は、例えば連続した輪郭線から線分の抽出の処理を行って、強調処理を行う。このように血管部分が強調処理された画像データを学習データや入力データとして用いると、視神経乳頭陥凹部近傍での血管の二次元的形状（平面に投影した血管の像の形状）の情報が、緑内障と診断される確率や、視神経乳頭陥凹縁の推定に供されることとなり、学習効率、及び推定結果の正解率を向上できる。 Furthermore, in an example of the present embodiment, processing for emphasizing a blood vessel portion may be performed on image data included in learning data and image data of input data to be processed. For example, the blood vessel part is subjected to enhancement processing by performing line segment extraction processing from a continuous contour line. When the image data in which the blood vessel part is emphasized is used as learning data or input data, information on the two-dimensional shape of the blood vessel (the shape of the blood vessel image projected on the plane) in the vicinity of the optic disc recess is obtained. The probability of diagnosing glaucoma and the estimation of the optic disc recess edge will be used to improve the learning efficiency and the accuracy of the estimation result.

［三次元眼底写真を用いる例］
また、ここまでの説明では、眼底写真の画像データとして二次元の画像データを用いることとしていたが、本実施の形態はこれに限られない。[Example using 3D fundus photography]
In the description so far, two-dimensional image data is used as the image data of the fundus photograph, but the present embodiment is not limited to this.

すなわち本実施の形態のある例では、機械学習の学習用データとして、眼底写真の画像データとともに、眼底の三次元的な情報（膜厚などの情報）を含んでもよい。この例では、例えば眼底写真の画像データと眼底の三次元的な情報（膜厚などの情報）とをニューラルネットワークに入力し、当該眼底写真の画像データと眼底の三次元的な情報を参照した眼科医が当該情報に基づいて緑内障であると診断する確率（複数人の眼科医のうち、緑内障と診断した眼科医の人数の割合など）を正解として、ニューラルネットワークを学習処理する。 In other words, in an example of the present embodiment, three-dimensional information (information such as film thickness) of the fundus may be included together with image data of the fundus photograph as learning data for machine learning. In this example, for example, fundus photograph image data and fundus three-dimensional information (film thickness information) are input to a neural network, and the fundus photograph image data and fundus three-dimensional information are referenced. Based on the information, the probability that the ophthalmologist diagnoses glaucoma (the ratio of the number of ophthalmologists diagnosed with glaucoma among a plurality of ophthalmologists) is used as a correct answer to learn the neural network.

［受入部における前処理］
また本実施の形態の一例に係る画像処理装置１は、推定の処理の対象となる画像データについて、推定の処理に十分な画質を有していないものや、そもそも視神経乳頭等の眼底構造が撮影されていないものなど、推定ができないと判断される画像データであるか否かを判断し、推定ができないと判断した場合には推定の処理を行わずに、あるいは推定の処理を行ってその結果とともに、十分な推定ができない旨の情報を利用者に提示してもよい。[Pre-processing in the receiving department]
The image processing apparatus 1 according to an example of the present embodiment captures image data that is an object of estimation processing that does not have sufficient image quality for estimation processing, or a fundus structure such as an optic nerve head in the first place. If it is determined that the image data cannot be estimated, such as data that has not been estimated, and if it is determined that estimation cannot be performed, the estimation process is not performed or the estimation process is performed and the result is obtained. At the same time, information indicating that sufficient estimation cannot be performed may be presented to the user.

一例として本実施の形態の画像処理装置１は、上記の受入部２１の処理として、処理の対象となる画像データの入力を受けて、当該画像データが画質が十分であるかや視神経乳頭等の眼底構造が撮影されているかなどを、クラスタリング処理によって判断する。そして、画質が十分であり、眼底写真である、など、十分な推定ができると判断した場合に、当該入力された画像データを眼底写真の画像データとして受け入れて、推定部２２に出力することとする。 As an example, the image processing apparatus 1 according to the present embodiment receives input of image data to be processed as the processing of the receiving unit 21, and whether the image data has sufficient image quality, an optic disc or the like. Whether or not the fundus structure is photographed is determined by clustering processing. When it is determined that the image quality is sufficient and the fundus photograph can be sufficiently estimated, the input image data is received as image data of the fundus photograph and output to the estimation unit 22; To do.

また、十分な推定ができないと判断すると、受入部２１は、推定ができない旨の情報を出力する。 If it is determined that sufficient estimation cannot be performed, the receiving unit 21 outputs information indicating that estimation cannot be performed.

ここで画質は、Ｓ／Ｎ比、二値化したときに白に近い領域が全体に占める割合等を計測することにより判断できる。具体的にＳ／Ｎ比は例えばＰＳＮＲ（Peak Signal-to-Noise Ratio）であり、受入部２１は、入力された画像データと、当該画像データに対して所定のノイズ除去処理を施した画像との間の平均二乗誤差を演算し、最大画素値（０から２５５の２５６段階で画像が表現されているのであれば、２５５）の二乗値をこの平均二乗誤差で除したものの常用対数値（またはその定数倍）として求める（具体的な演算方法は広く知られているので、ここでの詳しい説明は省略する）。そして受入部２１は、平均二乗誤差が「０」またはＰＳＮＲが所定のしきい値（例えば上記常用対数の値が０．８以上に相当する値）を超えるならば、Ｓ／Ｎ比に関わる画質が十分であると判断する。 Here, the image quality can be determined by measuring the S / N ratio, the ratio of the area close to white when binarized, and the like. Specifically, the S / N ratio is, for example, PSNR (Peak Signal-to-Noise Ratio), and the receiving unit 21 receives input image data and an image obtained by performing predetermined noise removal processing on the image data. The mean square error is calculated, and the common logarithm of the maximum pixel value (or 255 if the image is expressed in 256 levels from 0 to 255) divided by this mean square error (or (The specific calculation method is widely known, and a detailed description thereof is omitted here). If the mean square error is “0” or the PSNR exceeds a predetermined threshold (for example, the value of the common logarithm is equal to or greater than 0.8), the receiving unit 21 determines the image quality related to the S / N ratio. Is determined to be sufficient.

また二値化したときに白に近い領域が全体に占める割合は、つまり、輝度が高くなりすぎている画素の、全体に占める割合を求めるもので、受入部２１は入力された画像データを、公知の方法でグレイスケールに変換し、さらに最大画素値のα倍（０＜α＜１）の点をしきい値として、当該しきい値よりも大きい（白に近い）画素値となっている画素の画素値を最大画素値（白色）に設定する。このときαの値として比較的１に近い値、例えば０．９５より大きい値とすることで、白に近い領域のみを白色に設定する。また、当該しきい値を下回る画素値となっている画素の画素値は最低画素値（黒色）とする。この受入部２１は、この二値化の結果に含まれる、白色に設定された画素の数を、画像データ全体の画素の数で除して、白に近い領域が全体に占める割合を求める。受入部２１はここで求めた割合の値が、予め定めたしきい値を下回る場合に、画質が十分であると判断する。 Further, the ratio of the area close to white when binarized occupies the entire area, that is, the ratio of the pixels whose luminance is excessively high to the entire area. The receiving unit 21 determines the input image data as The pixel value is converted to a gray scale by a known method, and the pixel value is larger (closer to white) than the threshold value, with a point that is α times the maximum pixel value (0 <α <1) as a threshold value. The pixel value of the pixel is set to the maximum pixel value (white). At this time, by setting the value of α to a value relatively close to 1, for example, a value larger than 0.95, only the region close to white is set to white. Further, the pixel value of the pixel having a pixel value lower than the threshold value is set to the lowest pixel value (black). The receiving unit 21 divides the number of pixels set to white included in the binarization result by the number of pixels of the entire image data, and obtains the ratio of the area close to white to the whole. The receiving unit 21 determines that the image quality is sufficient when the value of the ratio obtained here is below a predetermined threshold value.

また受入部２１は、上記の処理を、入力された画像データの色調を正規化する補正を行ってから実行してもよい。ここで正規化は例えば、（公知の方法で）グレイスケールに変換したときの最大画素値（白色）に最も近い画素値を最大画素値に、最低画素値（黒色）に最も近い画素値を最低画素値にそれぞれ対応するように画素値を変換することで行う。この色補正の方法は広く知られているので、詳しい説明を省略する。 The receiving unit 21 may execute the above-described processing after performing correction for normalizing the color tone of the input image data. Here, normalization is performed by, for example, a pixel value closest to the maximum pixel value (white) when converted to gray scale (using a known method) as the maximum pixel value, and a pixel value closest to the minimum pixel value (black) as the minimum. This is done by converting the pixel values so as to correspond to the pixel values. Since this color correction method is widely known, detailed description thereof is omitted.

また視神経乳頭等の眼底構造が撮影されているか否かの判断は、例えば視神経乳頭部の像が画像データ中に含まれるか否かにより行うことができる。具体的には、受入部２１は、入力された画像データに対して輪郭線抽出の処理を施したうえで、抽出した輪郭線の画像からハフ変換等の方法を用いて円を検出する。ここで輪郭線抽出や円の検出処理は広く知られた方法を採用できる。 Whether or not the fundus structure such as the optic nerve head has been photographed can be determined based on, for example, whether or not an image of the optic nerve head is included in the image data. Specifically, the receiving unit 21 performs a contour extraction process on the input image data, and then detects a circle from the extracted contour image using a method such as Hough transform. Here, well-known methods can be adopted for the contour line extraction and circle detection processing.

受入部２１は検出した円の数及び大きさが、予め定めた条件を満足するかを調べる。具体的には、受入部２１は、検出した円の数が「１」（視神経乳頭と考えられる円形のみ）であり、大きさ（例えば外接矩形の短辺、つまり円の短径）が所定の値の範囲にあるときに、視神経乳頭の像が含まれると判断する。または受入部２１は、検出した円の数が「２」（視神経乳頭と考えられる円形と、視野全体の境界線と考えられる円形）であり、検出した円のうち、比較的小さい円が、比較的大きい円の内部に内包されており、かつ、比較的小さい円の大きさ（例えば外接矩形の短辺、つまり円の短径）が比較的大きい円の大きさ（例えば外接矩形の短辺、つまり円の短径）に対して所定の比の値の範囲にあるときに、視神経乳頭の像が含まれる（視神経乳頭等の眼底構造が撮影されている）と判断する。 The receiving unit 21 checks whether the number and size of the detected circles satisfy a predetermined condition. Specifically, the receiving unit 21 has a number of detected circles of “1” (only a circle considered to be an optic nerve head), and a size (for example, a short side of a circumscribed rectangle, that is, a short axis of the circle) is a predetermined value. When it is within the range of values, it is determined that an image of the optic disc is included. Alternatively, the receiving unit 21 has the number of detected circles “2” (a circle that is considered to be an optic disc and a circle that is considered to be a boundary line of the entire visual field), and among the detected circles, a relatively small circle is compared. The size of a circle (for example, the short side of a circumscribed rectangle) that is contained within a relatively large circle and that has a relatively small circle size (for example, the short side of the circumscribed rectangle, that is, the minor axis of the circle), That is, it is determined that an image of the optic disc is included (a fundus structure such as the optic disc is photographed) when it is within a range of a predetermined ratio with respect to the minor axis of the circle.

あるいは、受入部２１は、入力された画像データから血管の画像が検出できるか否かにより、視神経乳頭等の眼底構造が撮影されているか否かを判断してもよい。この血管の画像の検出は、例えば杉尾一晃ほか，「眼底写真における血管解析に関する研究 −血管とその交叉部の抽出−」医用画像情報学会雑誌，Vol.16，No.3 (1999)などの方法を採用できる。本実施の形態では受入部２１は、上記の方法等広く知られた方法によって入力された画像データから血管の画像の抽出を試みる。そして抽出を試みた結果、得られた画像に含まれる有意な画素の数（血管の像と判断される画素の数）が、全体の画素数に対して占める割合が予め定めた値の範囲にあるときに、血管が検出可能であると判断して、視神経乳頭等の眼底構造が撮影されていると推定する。 Alternatively, the receiving unit 21 may determine whether or not a fundus structure such as an optic nerve head has been imaged based on whether or not a blood vessel image can be detected from the input image data. For example, Kazuaki Sugio et al., “Research on blood vessel analysis in fundus photos-Extraction of blood vessels and their intersections”, Journal of Medical Image Information Society, Vol.16, No.3 (1999) Can be adopted. In the present embodiment, the accepting unit 21 attempts to extract a blood vessel image from image data input by a widely known method such as the above method. As a result of the extraction, the ratio of the number of significant pixels included in the obtained image (the number of pixels determined to be blood vessel images) to the total number of pixels is within a predetermined value range. At some point, it is determined that a blood vessel can be detected, and it is estimated that a fundus structure such as an optic nerve head is captured.

さらに受入部２１は、視神経乳頭等の眼底構造が撮影されているか否かを判別するよう機械学習したニューラルネット（以下、事前判定用ニューラルネットと呼ぶ）を用いてもよい。一例としてこのような事前判定用ニューラルネットはＣＮＮ（畳み込みネットワーク）や残差ネットワーク等を用いて実現され、視神経乳頭等の眼底構造が撮影されている眼底写真の画像データと。視神経乳頭等の眼底構造が撮影されていない画像データ（例えば眼底写真でない画像データ）とをそれぞれ複数入力し、視神経乳頭等の眼底構造が撮影されているときには視神経乳頭等の眼底構造が撮影されている旨の出力を行い、視神経乳頭等の眼底構造が撮影されていないときには視神経乳頭等の眼底構造が撮影されていない旨の出力を行うように教師つきの機械学習をしておく（このような機械学習処理は広く知られた方法を採用できるため、詳細な説明は省略する）。そして受入部２１は、入力された画像データを、この事前判定用ニューラルネットに入力可能なデータに変換し（サイズを変える等）、当該変換したデータをこの事前判定用ニューラルネットに入力して、その出力を参照し、当該出力が視神経乳頭等の眼底構造が撮影されている旨の出力であるときに、入力された画像データが眼底写真であると判断して受け入れることとしてもよい。 Further, the receiving unit 21 may use a neural network that has been machine-learned so as to determine whether or not a fundus structure such as an optic nerve head has been imaged (hereinafter referred to as a prior determination neural network). As an example, such a prior determination neural network is realized by using a CNN (convolution network), a residual network, or the like, and image data of a fundus photograph in which a fundus structure such as an optic nerve head is photographed. A plurality of pieces of image data (for example, image data that is not a fundus photograph) in which the fundus structure such as the optic disc is not photographed are input, and when the fundus structure such as the optic disc is photographed, the fundus structure such as the optic disc is photographed. If the fundus structure such as the optic nerve head is not photographed, supervised machine learning is performed so that the fundus structure such as the optic nerve head is not photographed (such a machine Since the learning process can adopt a widely known method, detailed description is omitted). The receiving unit 21 converts the input image data into data that can be input to the pre-determining neural network (changes the size, etc.), and inputs the converted data to the pre-determining neural network. With reference to the output, when the output is an output indicating that the fundus structure such as the optic nerve head is photographed, the input image data may be determined to be a fundus photograph and accepted.

また、ここでは推定部２２が用いるニューラルネットワークのほかに、事前判定用ニューラルネットを別途用いる場合を例としたが、推定部２２が用いるニューラルネットワークが事前判定用ニューラルネットを兼ねてもよい。 In addition, here, in addition to the neural network used by the estimation unit 22, an example in which a prior determination neural network is used separately has been described as an example, but the neural network used by the estimation unit 22 may also serve as the prior determination neural network.

この場合、推定部２２が用いるニューラルネットワークは、予め眼底写真と分かっている画像データを複数入力し、当該画像データのそれぞれが目について所定の症状にある旨の確率（例えば緑内障であると診断する医師の割合等）と、所定の症状にはない旨の確率（例えば緑内障でないと診断する医師の割合）とを教師データとして機械学習しておくものとする。このようにすると、当該ニューラルネットワークは、入力された画像データに対して当該画像データが目の眼底写真であった場合に、所定の症状にある旨の確率と、所定の症状にはない旨の確率とをともに推定することとなる。 In this case, the neural network used by the estimation unit 22 inputs a plurality of image data that are known in advance as fundus photographs, and diagnoses that each of the image data has a predetermined symptom (for example, glaucoma). It is assumed that machine learning is performed using teacher data such as a ratio of doctors and the like and a probability of not having a predetermined symptom (for example, a ratio of doctors diagnosed as not having glaucoma) as teacher data. In this way, the neural network has a probability that the image data is a fundus photograph of the eye with respect to the input image data, and a probability that the image is not in the predetermined symptom. Both probabilities are estimated.

この例では、受入部２１は、入力された画像データをそのまま推定部２２に出力し、推定部２２が出力する情報である、所定の症状にある旨の確率Ｐpと、所定の症状にはない旨の確率Ｐnとを参照し、これらの確率Ｐp，Ｐnがいずれも予め定めたしきい値を下回る場合（例えばいずれも４０％未満である場合）や、これらの確率の差の絶対値｜Ｐp−Ｐn｜が所定のしきい値を下回る（つまりこれらの確率Ｐp，Ｐnの差が所定のしきい値より小さい）場合など、予め定めた条件を満足することとなる場合に、推定ができないと判断して、その旨の情報を出力することとしてもよい。 In this example, the accepting unit 21 outputs the input image data as it is to the estimating unit 22, and the probability Pp that there is a predetermined symptom, which is information output by the estimating unit 22, and the predetermined symptom are not present. The probability Pn of the effect is referred to, and when these probabilities Pp and Pn are both below a predetermined threshold (for example, both are less than 40%), or the absolute value | Pp of the difference between these probabilities -Pn | is less than a predetermined threshold value (that is, the difference between these probabilities Pp and Pn is smaller than the predetermined threshold value). It is good also as judging and outputting the information to that effect.

またこの予め定めた条件が満足されない場合（推定ができたと判断できる場合）は、推定部２２の出力を、出力部２３に出力することとしてもよい。 Further, when the predetermined condition is not satisfied (when it can be determined that the estimation can be performed), the output of the estimation unit 22 may be output to the output unit 23.

さらに受入部２１は、これらの判断を組み合わせて用いてもよい。例えば、受入部２１は、入力された画像データのＳ／Ｎ比を調べ、Ｓ／Ｎ比が予め定めた値よりも大きい（ノイズが比較的少ない）と判断されるときに、さらに当該画像データのうちから視神経乳頭部が抽出できるかを試みる。そして受入部２１は、視神経乳頭部が抽出できたと判断できたときに、推定部２２に対して当該画像データを出力することとしてもよい。 Further, the receiving unit 21 may use a combination of these determinations. For example, the receiving unit 21 checks the S / N ratio of the input image data, and when it is determined that the S / N ratio is larger than a predetermined value (with relatively little noise), the image data is further reduced. Try to extract the optic nerve head from among them. The receiving unit 21 may output the image data to the estimating unit 22 when it is determined that the optic nerve head has been extracted.

このように、本実施の形態の一例では、推定部２２による推定を出力する前に、推定部２２が十分な推定ができる画像データを受け入れたかを判断し、十分な推定ができると判断できた場合に、推定部２２による推定結果を出力することとしているので、推定が十分にできない画像データに対しての推定結果を出力してしまうことがなくなる。 As described above, in the example of the present embodiment, before outputting the estimation by the estimation unit 22, it is determined whether the estimation unit 22 has received image data that can be sufficiently estimated, and it can be determined that sufficient estimation can be performed. In this case, since the estimation result by the estimation unit 22 is output, the estimation result for image data that cannot be sufficiently estimated is not output.

１画像処理装置、１１制御部、１２記憶部、１３操作部、１４表示部、１５入出力部、２１受入部、２２推定部、２３出力部。

DESCRIPTION OF SYMBOLS 1 Image processing apparatus, 11 Control part, 12 Storage part, 13 Operation part, 14 Display part, 15 Input / output part, 21 Reception part, 22 Estimation part, 23 Output part

Claims

Using learning data including information that correlates image data of fundus photos and information about eye symptoms corresponding to each fundus photo, the relationship between image data of fundus photos and information about eye symptoms is machine Holding means for holding machine learning results in a learned state;
A receiving means for receiving image data of a fundus photograph to be processed;
Using the input data based on the received image data and the machine learning result, estimating means for estimating information about the eye symptom about the eye related to the fundus photograph that has been processed;
Means for outputting the result of the estimation;
An image processing apparatus.

The image processing apparatus according to claim 1,
The estimation unit specifies a range of an image corresponding to the optic nerve head from the image data received by the receiving unit, and includes partial image data that is a part of the received image data including the specified image range. An image processing apparatus that extracts information and estimates information about the eye symptom about an eye related to a fundus photograph that is a target of processing using the extracted partial image data and the machine learning result.

The image processing apparatus according to claim 2,
The estimation means is an image processing apparatus that extracts the partial image data so that the specified image range is a predetermined position in the image.

The image processing apparatus according to any one of claims 1 to 3,
Information about eye symptoms
An image processing apparatus, which is information of a curve representing a optic papular depression edge in a fundus photograph to be processed.

The image processing apparatus according to any one of claims 1 to 4, wherein:
The receiving means determines whether or not a predetermined condition is satisfied with respect to input image data, and accepts the image data satisfying the condition as image data of a fundus photograph to be processed.

Computer
Using learning data including information that correlates image data of fundus photos and information about eye symptoms corresponding to each fundus photo, the relationship between image data of fundus photos and information about eye symptoms is machine Holding means for holding machine learning results in a learned state;
A receiving means for receiving image data of a fundus photograph to be processed;
Using the input data based on the received image data and the machine learning result, estimating means for estimating information about the eye symptom about the eye related to the fundus photograph that has been processed;
Means for outputting the result of the estimation;
Program to function as.