JP2021069667A

JP2021069667A - Image processing device, image processing method and program

Info

Publication number: JP2021069667A
Application number: JP2019198016A
Authority: JP
Inventors: 高橋　和彦; Kazuhiko Takahashi; 和彦高橋; 律也富田; Ritsuya Tomita
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2021-05-06

Abstract

To effectively reduce noise in a medical image, such as a tomographic image or a front image, which is obtained by OCT or the like.SOLUTION: An image processing device comprises: means for detecting a position of noise in a first medical image obtained by imaging a subject; means for generating a second medical image in which a pixel value at the position of the noise detected in the first medical image is reduced, and an estimation image in which it is estimated that the noise in the first medical image is reduced; means for changing the estimation image using an objective function including a first term which concerns a difference between the second medical image and the estimation image, and a second term which concerns the estimation image; and means for outputting, as a noise reduction image, the changed estimation image, which is obtained so that the object function satisfies a predetermined condition.SELECTED DRAWING: Figure 4

Description

開示の技術は、画像処理装置、画像処理方法及びプログラムに関する。 The disclosed techniques relate to image processing devices, image processing methods and programs.

光干渉断層計（ＯＣＴ；ＯｐｔｉｃａｌＣｏｈｅｒｅｎｃｅＴｏｍｏｇｒａｐｈｙ）は、光の干渉を利用して断層像を撮影する装置であり、被写体は光を透過する物質なら何でもよい。特に眼科用の光干渉断層計は被写体が眼であり、網膜層内部の状態を三次元的に観察することが可能であり、断層画像撮影装置と呼ばれる。この断層画像撮影装置は、疾病の診断をより的確に行うのに有用であることから近年注目を集めている。ＯＣＴの形態として、広帯域な光源とマイケルソン干渉計を組み合わせたＴＤ−ＯＣＴ（ＴｉｍｅｄｏｍａｉｎＯＣＴ）がある。これは、被検眼に光が入射した際に生じる後方散乱光と参照光との干渉光を計測し、参照光の光路長と後方散乱光の光路長が一致するときに反射強度が大きくなることを利用して、被検眼の深さ方向の情報を得るように構成されたものである。ＴＤ−ＯＣＴは１回の計測により深さ方向のある１点の情報しか得られない。このため、ＴＤ−ＯＣＴは、参照光を反射させる参照光ミラーを機械的に動かすことにより、参照光の光路長を変更しながら、深さ方向の複数点の情報を取得する必要がある。そのため、より高速に画像を取得する方法として、ＦＤ−ＯＣＴ（ＦｏｕｒｉｅｒｄｏｍａｉｎＯＣＴ）が開発された。ＦＤ−ＯＣＴは、１回の計測により深さ方向の情報がすべて得られる。このため、ＦＤ−ＯＣＴでは、深さ方向の機械的走査が不要となり、ＴＤ−ＯＣＴより高速に計測することができる。ＦＤ−ＯＣＴには、２種類のＯＣＴが知られている。一つは、広帯域光源を用いて分光器でインターフェログラムを取得してフーリエ空間で情報を得るＳＤ−ＯＣＴ（ＳｐｅｃｔｒａｌｄｏｍａｉｎＯＣＴ）である。もう一つは、光源の発信波長を高速に変化させることにより光波の干渉をフーリエ空間で行うＳＳ−ＯＣＴ（ＳｗｅｐｔＳｏｕｒｃｅＯＣＴ）である。ＦＤ−ＯＣＴの断層画像は、観測信号をフーリエ変換して得られる複素データの絶対値成分である。この絶対値成分の信号は、輝度画像や強度画像と呼ばれている。 An optical coherence tomography (OCT) is a device that photographs a tomographic image using the interference of light, and the subject may be any substance that transmits light. In particular, an optical interference tomometer for ophthalmology is called a tomographic imaging device because the subject is the eye and it is possible to observe the state inside the retinal layer three-dimensionally. This tomographic imaging device has been attracting attention in recent years because it is useful for more accurately diagnosing diseases. As a form of OCT, there is TD-OCT (Time domain OCT) which is a combination of a wide band light source and a Michelson interferometer. This is because the interference light between the backward scattered light and the reference light generated when the light is incident on the subject is measured, and the reflection intensity increases when the optical path length of the reference light and the optical path length of the backward scattered light match. Is configured to obtain information in the depth direction of the eye to be inspected. With TD-OCT, only one point in the depth direction can be obtained by one measurement. Therefore, the TD-OCT needs to acquire information on a plurality of points in the depth direction while changing the optical path length of the reference light by mechanically moving the reference light mirror that reflects the reference light. Therefore, FD-OCT (Fourier domain OCT) has been developed as a method for acquiring images at higher speed. With FD-OCT, all information in the depth direction can be obtained by one measurement. Therefore, in FD-OCT, mechanical scanning in the depth direction becomes unnecessary, and measurement can be performed at a higher speed than TD-OCT. Two types of OCT are known for FD-OCT. One is SD-OCT (Spectral domain OCT), which acquires an interferogram with a spectroscope using a wideband light source and obtains information in Fourier space. The other is SS-OCT (Swept Source OCT), in which light wave interference is performed in Fourier space by changing the transmission wavelength of the light source at high speed. The FD-OCT tomographic image is an absolute value component of complex data obtained by Fourier transforming the observed signal. The signal of this absolute value component is called a luminance image or an intensity image.

ここで、ＯＣＴの画像のノイズ低減方法として、畳み込みフィルタによる方法（例えば、３ｘ３のサイズのガウシアンフィルタを画像に適用する方法）がある。この方法は、局所的な画像領域に対してノイズ低減を適用する方法である。具体的には、画像全体の個々の局所的な部分に対してこのノイズ低減を順次適用することにより、画像全体としてノイズ低減が行われる。また、ＯＣＴの画像のノイズ低減方法として、全変動ノイズ低減法（全変動ノイズ除去法）という方法が知られている（特許文献１）。この方法は、画像全体の変動成分の量を考慮して、ノイズ低減を行う方法である。すなわち、ノイズ低減後の画像全体の変動成分量がある値になるようにノイズ低減が行われる。ここで、変動成分とは、隣の画素との差分の絶対値である。例えば、画像全体の変動成分量がゼロの場合とは、画像の全部の画素値が同じ値の場合である。画像全体の変動成分量をコントロールすることにより、ノイズ低減の効果の強さが決まる。変動成分量が少なくなるようにノイズ低減を行う場合、エッジ成分やランダムノイズ成分のような変動成分が削減されて、出力画像は「のっぺり」した感じになる。反対に、変動成分量が多くなるようにノイズ低減を行う場合、エッジ成分やランダムノイズ成分が削減されないため、ノイズ低減の効果は弱くなる。 Here, as a method for reducing noise in an OCT image, there is a method using a convolution filter (for example, a method of applying a Gaussian filter having a size of 3x3 to an image). This method is a method of applying noise reduction to a local image area. Specifically, by sequentially applying this noise reduction to individual local parts of the entire image, noise reduction is performed for the entire image. Further, as a method for reducing noise of an OCT image, a method called a total variation noise reduction method (total variation noise removal method) is known (Patent Document 1). This method is a method of reducing noise in consideration of the amount of variable components in the entire image. That is, noise reduction is performed so that the amount of variation component of the entire image after noise reduction becomes a certain value. Here, the variable component is an absolute value of the difference from the adjacent pixel. For example, the case where the amount of variation component of the entire image is zero is the case where all the pixel values of the image are the same value. By controlling the amount of variable components in the entire image, the strength of the noise reduction effect is determined. When noise reduction is performed so that the amount of fluctuating components is reduced, fluctuating components such as edge components and random noise components are reduced, and the output image looks "flat". On the contrary, when the noise is reduced so that the amount of the fluctuating component is large, the edge component and the random noise component are not reduced, so that the noise reduction effect is weakened.

特表２０１８−５３１７７３号公報Special Table 2018-531773

ところで、ＯＣＴの画像に従来の全変動ノイズ低減法をそのまま適用しても、黒点ノイズを効果的に低減することが難しいために、効果的にノイズ低減が行われない場合があることが分かった。ここで、黒点ノイズとは、図７（ｂ）のようなものである。なお、図７（ａ）は、ＯＣＴの断層画像であり、この一部を拡大したものが図７（ｂ）である。図７（ｂ）ように黒い点が多く点在しており（以下、黒点と呼ぶ）、これが画質を低減する要因の一つとなっている。特に、構造物である網膜層の上に黒点が存在し、医者が網膜層を観察するときに、この黒点が邪魔をするため、診断を難しくさせる場合があった。 By the way, it has been found that even if the conventional total variation noise reduction method is applied to the OCT image as it is, it is difficult to effectively reduce the black spot noise, so that the noise may not be effectively reduced. .. Here, the black spot noise is as shown in FIG. 7 (b). Note that FIG. 7A is a tomographic image of OCT, and FIG. 7B is an enlarged view of a part of the tomographic image. As shown in FIG. 7B, many black spots are scattered (hereinafter referred to as black spots), which is one of the factors for reducing the image quality. In particular, there are black spots on the retinal layer, which is a structure, and when the doctor observes the retinal layer, the black spots interfere with the diagnosis, which may make the diagnosis difficult.

開示の技術の目的の一つは、上記課題に鑑みてなされたものであり、ＯＣＴ等で得た断層画像や正面画像等の医用画像のノイズを効果的に低減することである。 One of the purposes of the disclosed technique is to effectively reduce the noise of medical images such as tomographic images and frontal images obtained by OCT or the like, which was made in view of the above problems.

なお、上記目的に限らず、後述する発明を実施するための形態に示す各構成により導かれる作用効果であって、従来の技術によっては得られない作用効果を奏することも本件の他の目的の１つとして位置付けることができる。 It should be noted that not only the above-mentioned purpose but also the action and effect derived by each configuration shown in the embodiment for carrying out the invention described later, and the action and effect which cannot be obtained by the conventional technique can be obtained. It can be positioned as one.

開示の画像処理装置の一つは、
被写体を撮像して得た第１の医用画像におけるノイズの位置を検出する手段と、
前記第１の医用画像における前記検出されたノイズの位置の画素値を低減させた第２の医用画像と、前記第１の医用画像におけるノイズが低減されたと推定される推定画像とを生成する手段と、
前記第２の医用画像と前記推定画像との差に関する第１項と前記推定画像に関する第２項とを含む目的関数を用いて、前記推定画像を変更する手段と、
前記目的関数が所定の条件を満たすように変更して得た推定画像をノイズ低減画像として出力する手段と、を有する。 One of the disclosed image processing devices is
A means for detecting the position of noise in a first medical image obtained by imaging a subject, and
A means for generating a second medical image in which the pixel value of the detected noise position in the first medical image is reduced, and an estimated image in which noise in the first medical image is presumed to be reduced. When,
A means for changing the estimated image by using an objective function including a first term relating to the difference between the second medical image and the estimated image and a second term relating to the estimated image.
It has means for outputting an estimated image obtained by changing the objective function so as to satisfy a predetermined condition as a noise reduction image.

開示の技術の一つによれば、ＯＣＴ等で得た断層画像や正面画像等の医用画像のノイズを効果的に低減することができる。 According to one of the disclosed techniques, it is possible to effectively reduce the noise of a medical image such as a tomographic image or a frontal image obtained by OCT or the like.

画像処理システムの構成を示す図。The figure which shows the structure of an image processing system. 眼部の構造と断層画像と眼底画像を説明するための図。The figure for demonstrating the structure of an eye part and a tomographic image and a fundus image. 画像処理システムにおける処理の流れを示すフローチャート。A flowchart showing a processing flow in an image processing system. 画像処理装置におけるノイズ低減部を説明するためのブロック図。The block diagram for demonstrating the noise reduction part in an image processing apparatus. 比率λを変更するＧＵＩの一例。An example of a GUI that changes the ratio λ. 乱数で黒点の位置を予測する場合のノイズ低減部を説明するためのブロック図。A block diagram for explaining a noise reduction part when predicting the position of a black spot with a random number. 黒点ノイズを説明するための図。The figure for demonstrating the black spot noise. 高画質化処理の一例を説明するための図。The figure for demonstrating an example of a high image quality processing. 機械学習モデルとして用いられるニューラルネットワークの構成の一例。An example of the configuration of a neural network used as a machine learning model. 機械学習モデルとして用いられるニューラルネットワークの構成の一例。An example of the configuration of a neural network used as a machine learning model.

（第１の実施形態）
以下、図面を参照しながら、第１の実施形態に係る画像処理装置を備える画像処理システムと、眼の構造ならびに画像処理システムで取得する眼の画像について、詳細を説明する。 (First Embodiment)
Hereinafter, the image processing system including the image processing apparatus according to the first embodiment, the structure of the eye, and the image of the eye acquired by the image processing system will be described in detail with reference to the drawings.

まず、図２は、画像処理システムで取得する眼の構造と画像を示す図である。図２（ａ）に眼球の模式図を示す。図２（ａ）において、Ｃは角膜、ＣＬは水晶体、Ｖは硝子体、Ｍは黄斑部（黄斑の中心部は中心窩を表す）、Ｄは視神経乳頭部を表す。本実施形態にかかる断層画像撮影装置２００は、主に、硝子体、黄斑部、視神経乳頭部を含む網膜の後極部を撮影する場合について説明を行う。なお、本実施形態では説明をしないが、断層画像撮影装置２００は、角膜、水晶体の前眼部を撮影することも可能である。 First, FIG. 2 is a diagram showing an eye structure and an image acquired by an image processing system. FIG. 2A shows a schematic view of the eyeball. In FIG. 2A, C is the cornea, CL is the crystalline lens, V is the vitreous body, M is the macula (the central part of the macula represents the fovea centralis), and D represents the optic nerve head. The tomographic imaging apparatus 200 according to the present embodiment mainly describes a case where the posterior pole of the retina including the vitreous body, the macula, and the optic nerve head is photographed. Although not described in this embodiment, the tomographic imaging apparatus 200 can also image the anterior segment of the cornea and the crystalline lens.

また、図１は、本実施形態に係る画像処理装置３００を備える画像処理システム１００の構成を示す図である。図１に示すように、画像処理システム１００は、画像処理装置３００が、インタフェースを介して断層画像撮影装置（ＯＣＴとも言う）２００、眼底画像撮影装置４００、外部記憶部５００、表示部６００、入力部７００と接続されることにより構成されている。 Further, FIG. 1 is a diagram showing a configuration of an image processing system 100 including an image processing device 300 according to the present embodiment. As shown in FIG. 1, in the image processing system 100, the image processing device 300 has a tomographic image capturing device (also referred to as OCT) 200, a fundus image capturing device 400, an external storage unit 500, a display unit 600, and an input via an interface. It is configured by being connected to the unit 700.

ここで、断層画像撮影装置２００は、不図示の被写体である眼を撮像し、眼網膜層の断層画像を生成する。なお、断層画像は、被写体を撮像して得た第１の医用画像の一例である。このとき、断層画像撮影装置２００は、測定光を照射した被検眼からの戻り光と測定光に対応する参照光とによる干渉光を受光するように構成される。また、断層画像撮影装置２００は、フーリエドメイン型のＯＣＴ装置が好適に用いられる。また、断層画像撮影装置２００は、例えば、干渉光を受光する受光手段がラインセンサ等により構成されるＳＤ−ＯＣＴや、波長掃引光源が用いられるＳＳ−ＯＣＴにより構成される。また、断層画像撮影装置２００は、画像処理装置３００と通信可能に接続されるが、画像処理装置３００を断層画像撮影装置２００の内部に組み込むように構成されていても良い。なお、断層画像撮影装置２００は、既知の装置であるため詳細な説明は省略し、ここでは、画像処理装置３００からの指示により行われる断層画像の撮影について説明を行う。 Here, the tomographic image capturing apparatus 200 images an eye, which is a subject (not shown), and generates a tomographic image of the ocular retinal layer. The tomographic image is an example of a first medical image obtained by imaging a subject. At this time, the tomographic imaging apparatus 200 is configured to receive the interference light due to the return light from the eye to be examined irradiated with the measurement light and the reference light corresponding to the measurement light. Further, as the tomographic imaging apparatus 200, a Fourier domain type OCT apparatus is preferably used. Further, the tomographic imaging apparatus 200 is composed of, for example, an SD-OCT in which a light receiving means for receiving interference light is composed of a line sensor or the like, or an SS-OCT in which a wavelength sweeping light source is used. Further, although the tomographic image capturing apparatus 200 is communicably connected to the image processing apparatus 300, the tomographic imaging apparatus 200 may be configured to be incorporated inside the tomographic imaging apparatus 200. Since the tomographic image capturing apparatus 200 is a known apparatus, detailed description thereof will be omitted, and here, the imaging of the tomographic image performed according to the instruction from the image processing apparatus 300 will be described.

まず、図１において、走査手段の一例であるガルバノミラー２０１は、測定光の眼底における走査を行うためのものであり、ＯＣＴによる眼底の撮影範囲を規定する。また、駆動制御部２０２は、ガルバノミラー２０１の駆動範囲および速度を制御することで、眼底における平面方向の撮影範囲及び走査線数（平面方向の走査速度）を規定する。また、駆動制御部２０２は、被検眼の同一位置で測定光を走査するように走査手段を制御する。ここでは、簡単のためガルバノミラーは一つのユニットとして示したが、実際にはＸスキャン用のミラーとＹスキャン用の２枚のミラーで構成され、眼底上で所望の範囲を測定光で走査できる。 First, in FIG. 1, the galvanometer mirror 201, which is an example of the scanning means, is for scanning the fundus of the measurement light, and defines the imaging range of the fundus by OCT. Further, the drive control unit 202 controls the drive range and speed of the galvano mirror 201 to define the imaging range and the number of scanning lines (scanning speed in the plane direction) in the plane direction on the fundus of the eye. Further, the drive control unit 202 controls the scanning means so as to scan the measurement light at the same position of the eye to be inspected. Here, the galvano mirror is shown as one unit for the sake of simplicity, but it is actually composed of a mirror for X scan and two mirrors for Y scan, and a desired range can be scanned with measurement light on the fundus. ..

また、フォーカス２０３は被写体である眼の前眼部を介し、眼底の網膜層にフォーカスするためのものである。測定光は、非図示のフォーカスレンズにより、被写体である眼の前眼部を介し、眼底の網膜層にフォーカスされる。眼底を照射した測定光は各網膜層で反射・散乱して戻る。なお、硝子体を詳細に観察する場合には、網膜層よりも前眼部側にフォーカスを移動させ、硝子体にフォーカスを合わせる。 Further, the focus 203 is for focusing on the retinal layer of the fundus through the anterior segment of the eye which is the subject. The measurement light is focused on the retinal layer of the fundus through the anterior segment of the eye, which is the subject, by a focus lens (not shown). The measurement light that irradiates the fundus is reflected and scattered by each retinal layer and returned. When observing the vitreous body in detail, the focus is moved to the anterior segment side of the retinal layer to focus on the vitreous body.

また、内部固視灯２０４は、表示部２４１、レンズ２４２で構成される。表示部２４１として複数の発光ダイオード（ＬＤ）がマトリックス状に配置されたものを用いる。発光ダイオードの点灯位置は、駆動制御部２０２の制御により撮影したい部位に合わせて変更される。表示部２４１からの光は、レンズ２４２を介し、被検眼に導かれる。表示部２４１から出射される光は５２０ｎｍで、駆動制御部２０２により所望のパターンが表示される。 Further, the internal fixation lamp 204 includes a display unit 241 and a lens 242. A display unit 241 in which a plurality of light emitting diodes (LDs) are arranged in a matrix is used. The lighting position of the light emitting diode is changed according to the part to be photographed by the control of the drive control unit 202. The light from the display unit 241 is guided to the eye to be inspected through the lens 242. The light emitted from the display unit 241 is 520 nm, and the drive control unit 202 displays a desired pattern.

また、コヒーレンスゲートステージ２０５は、被検眼の眼軸長の相違等に対応するため、駆動制御部２０２により制御されている。コヒーレンスゲートとは、ＯＣＴにおける測定光と参照光の光学距離が等しい位置を表す。さらには、撮影方法としてコヒーレンスゲートの位置を制御することにより、網膜層側か、あるいは網膜層より深部側とする撮影を行うことを制御する。ここで、断層画像撮影装置２００で取得される断層画像について図２（ｂ）を用いて説明する。図２（ｂ）において、Ｖは硝子体、Ｍは黄斑部、Ｄは視神経乳頭部を表す。また、Ｌ１は内境界膜（ＩＬＭ）と神経線維層（ＮＦＬ）との境界、Ｌ２は神経線維層と神経節細胞層（ＧＣＬ）との境界、Ｌ３は視細胞内節外節接合部（ＩＳＯＳ）、Ｌ４は網膜色素上皮層（ＲＰＥ）、Ｌ５はブルッフ膜（ＢＭ）、Ｌ６は脈絡膜を表す。断層画像において、横軸（ＯＣＴの主走査方向）をｘ軸、縦軸（深さ方向）をｚ軸とする。 Further, the coherence gate stage 205 is controlled by the drive control unit 202 in order to cope with the difference in the axial length of the eye to be inspected. The coherence gate represents a position where the optical distances of the measurement light and the reference light in OCT are equal. Furthermore, by controlling the position of the coherence gate as an imaging method, it is possible to control imaging on the retinal layer side or a deeper side than the retinal layer. Here, the tomographic image acquired by the tomographic image capturing apparatus 200 will be described with reference to FIG. 2B. In FIG. 2B, V represents the vitreous body, M represents the macula, and D represents the optic nerve head. L1 is the boundary between the inner limiting membrane (ILM) and the nerve fiber layer (NFL), L2 is the boundary between the nerve fiber layer and the ganglion cell layer (GCL), and L3 is the junction between the inner segment and the outer segment of the photoreceptor (ISOS). ), L4 represents the retinal pigment epithelial layer (RPE), L5 represents the Bruch's membrane (BM), and L6 represents the choroid. In the tomographic image, the horizontal axis (main scanning direction of OCT) is the x-axis, and the vertical axis (depth direction) is the z-axis.

また、眼底画像撮影装置４００は、眼部の眼底画像を撮影する装置であり、当該装置としては、例えば、眼底カメラやＳＬＯ（ＳｃａｎｎｉｎｇＬａｓｅｒＯｐｈｏｔｈａｌｍｏｓｃｏｐｅ）等が挙げられる。図２（ｃ）に眼部の眼底画像を示す。図２（ｃ）において、Ｍは黄斑部、Ｄは視神経乳頭部を表し、太い曲線は網膜の血管を表す。眼底画像において、横軸（ＯＣＴの主走査方向）をｘ軸、縦軸（ＯＣＴの副走査方向）をｙ軸とする。なお、断層画像撮影装置２００と眼底画像撮影装置４００の装置構成は、一体型でもよいし別体型でもよい。 Further, the fundus image capturing device 400 is a device that captures a fundus image of the eye portion, and examples of the device include a fundus camera, an SLO (Scanning Laser Ophothermoscope), and the like. FIG. 2C shows a fundus image of the eye portion. In FIG. 2 (c), M represents the macula, D represents the optic nerve head, and the thick curve represents the blood vessels of the retina. In the fundus image, the horizontal axis (the main scanning direction of the OCT) is the x-axis, and the vertical axis (the sub-scanning direction of the OCT) is the y-axis. The device configuration of the tomographic image capturing device 200 and the fundus imaging device 400 may be an integrated type or a separate type.

また、画像処理装置３００は、画像取得部３０１、記憶部３０２、画像処理部３０３、指示部３０４、表示制御部３０５を備える。画像取得部３０１は、断層画像生成部３１１からなり、断層画像撮影装置２００により撮影された断層画像の信号データを取得し、信号処理を行うことで断層画像の生成を行う。また、眼底画像撮影装置４００により撮影された眼底画像データを取得する。そして、生成した断層画像と眼底画像を記憶部３０２に格納する。画像処理部３０３は、位置合わせ部３３１、検出部３３２、算出部３３３、ノイズ低減部３３４からなる。位置合わせ部３３１は、複数個の断層画像間の断層画像位置合わせや断層画像と眼底画像の位置合わせを行う。検出部３３２では、硝子体境界や硝子体領域を検出する。算出部３３３は、硝子体境界と網膜上層とで規定される領域に関する特徴を数値化する。ノイズ低減部３３４は、算出部３３３で算出を行う領域の指定を行う。 Further, the image processing device 300 includes an image acquisition unit 301, a storage unit 302, an image processing unit 303, an instruction unit 304, and a display control unit 305. The image acquisition unit 301 includes a tomographic image generation unit 311, acquires signal data of a tomographic image taken by the tomographic image capturing apparatus 200, and generates a tomographic image by performing signal processing. In addition, the fundus image data captured by the fundus image capturing device 400 is acquired. Then, the generated tomographic image and fundus image are stored in the storage unit 302. The image processing unit 303 includes an alignment unit 331, a detection unit 332, a calculation unit 333, and a noise reduction unit 334. The alignment unit 331 performs tomographic image alignment between a plurality of tomographic images and alignment of the tomographic image and the fundus image. The detection unit 332 detects the vitreous boundary and the vitreous region. The calculation unit 333 quantifies the characteristics of the region defined by the vitreous boundary and the upper layer of the retina. The noise reduction unit 334 specifies an area for calculation by the calculation unit 333.

また、外部記憶部５００は、被検眼に関する情報（患者の氏名、年齢、性別など）と、撮影した画像データ、撮影パラメータ、画像解析パラメータ、操作者によって設定されたパラメータをそれぞれ関連付けて保持している。 In addition, the external storage unit 500 holds information about the eye to be inspected (patient's name, age, gender, etc.) in association with captured image data, imaging parameters, image analysis parameters, and parameters set by the operator. There is.

また、入力部７００は、例えば、マウス、キーボード、タッチ操作画面などであり、操作者は、入力部７００を介して、画像処理装置３００や断層画像撮影装置２００、眼底画像撮影装置４００へ指示を行う。 Further, the input unit 700 is, for example, a mouse, a keyboard, a touch operation screen, etc., and the operator gives an instruction to the image processing device 300, the tomographic image capturing device 200, and the fundus image capturing device 400 via the input unit 700. Do.

次に、図３と図４を参照して本実施形態の画像処理装置３００の処理手順を示す。図３は、本実施形態における本システム全体の動作処理の流れを示すフローチャートである。図４は、本実施形態における画像処理部３０３のノイズ低減部３３４に関するブロック図である。 Next, the processing procedure of the image processing apparatus 300 of the present embodiment is shown with reference to FIGS. 3 and 4. FIG. 3 is a flowchart showing the flow of operation processing of the entire system in the present embodiment. FIG. 4 is a block diagram relating to the noise reduction unit 334 of the image processing unit 303 in the present embodiment.

＜ステップＳ３０１＞
ステップＳ３０１では、不図示の被検眼情報取得部は、被検眼を同定する情報として被検者識別番号を外部から取得する。そして、被検者識別番号に基づいて、外部記憶部５００が保持している当該被検眼に関する情報を取得して記憶部３０２に記憶する。 <Step S301>
In step S301, the eye-examined information acquisition unit (not shown) acquires the subject identification number from the outside as information for identifying the eye to be examined. Then, based on the subject identification number, the information about the eye to be inspected held by the external storage unit 500 is acquired and stored in the storage unit 302.

＜ステップＳ３０２＞
ステップＳ３０２では被検眼をスキャンして撮影を行う。被検眼のスキャンは、操作者が非図示のスキャン開始を選択すると、断層画像撮影装置２００は、駆動制御部２０２を制御し、ガルバノミラー２０１を動作させて被検眼のスキャンを行う。ガルバノミラー２０１は、水平方向用のＸスキャナと垂直方向用のＹスキャナで構成される。そのため、これらのスキャナの向きをそれぞれ変更すると、装置座標系における水平方向（Ｘ）、垂直方向（Ｙ）それぞれの方向に走査することが出来る。そして、これらのスキャナの向きを同時に変更させることで、水平方向と垂直方向とを合成した方向に走査することが出来るため、眼底平面上の任意の方向に走査することが可能となる。 <Step S302>
In step S302, the eye to be inspected is scanned and an image is taken. When the operator selects to start scanning (not shown), the tomographic imaging apparatus 200 controls the drive control unit 202 and operates the galvano mirror 201 to scan the eye to be inspected. The galvano mirror 201 is composed of an X scanner for the horizontal direction and a Y scanner for the vertical direction. Therefore, if the orientations of these scanners are changed, scanning can be performed in each of the horizontal direction (X) and the vertical direction (Y) in the device coordinate system. Then, by changing the directions of these scanners at the same time, it is possible to scan in the combined direction of the horizontal direction and the vertical direction, so that it is possible to scan in any direction on the fundus plane.

撮影を行うにあたり各種撮影パラメータの調整を行う。具体的には、内部固視灯の位置、スキャン範囲、スキャンパターン、コヒーレンスゲート位置、フォーカスを設定する。駆動制御部２０２は、表示部２４１の発光ダイオードを制御して、黄斑部中心や視神経乳頭に撮影を行うように内部固視灯２０４の位置を制御する。スキャンパターンは、３次元ボリュームを撮影するラスタースキャンや放射状スキャン、クロススキャンなどのスキャンパターンを設定する。本実施形態はＣスキャンとする。コヒーレンスゲート位置は硝子体側とし、フォーカスも硝子体に合わせて撮影を行うとする。 Adjust various shooting parameters when shooting. Specifically, the position of the internal fixation lamp, the scan range, the scan pattern, the coherence gate position, and the focus are set. The drive control unit 202 controls the light emitting diode of the display unit 241 to control the position of the internal fixation lamp 204 so as to take an image at the center of the macula or the optic nerve head. As the scan pattern, a scan pattern such as a raster scan for capturing a three-dimensional volume, a radial scan, or a cross scan is set. This embodiment is a C scan. The coherence gate position is on the vitreous side, and the focus is also on the vitreous body.

これら撮影パラメータの調整終了後、操作者が非図示の撮影開始を選択することで、撮影を行う。撮影を開始すると、Ａスキャン毎に干渉信号を出力する。 After the adjustment of these shooting parameters is completed, the operator selects the shooting start (not shown) to perform shooting. When shooting is started, an interference signal is output for each A scan.

＜ステップＳ３０３＞
ステップＳ３０３では、被写体を撮像して得た第１の医用画像の一例である断層画像の生成を行う。断層画像生成部３１１は、それぞれの干渉信号に対して、一般的な再構成処理を行うことで、断層画像を生成する。 <Step S303>
In step S303, a tomographic image, which is an example of the first medical image obtained by imaging the subject, is generated. The tomographic image generation unit 311 generates a tomographic image by performing a general reconstruction process on each interference signal.

まず、断層画像生成部３１１は、干渉信号から固定パターンノイズ除去を行う。固定パターンノイズ除去は検出した複数のＡスキャン信号を平均することで固定パターンノイズを抽出し、これを入力した干渉信号から減算することで行われる。 First, the tomographic image generation unit 311 removes fixed pattern noise from the interference signal. Fixed pattern noise removal is performed by extracting fixed pattern noise by averaging a plurality of detected A scan signals and subtracting this from the input interference signal.

次に、断層画像生成部３１１は、有限区間でフーリエ変換した場合にトレードオフの関係となる深さ分解能とダイナミックレンジを最適化するために、所望の窓関数処理を行う。次に、ＦＦＴ処理を行った結果に、絶対値の演算を施すことにより、絶対値の断層画像を得る。この絶対値の断層画像に対数演算を行ったものがＯＣＴの断層画像であり、輝度画像や強度画像と呼ばれるものである。 Next, the tomographic image generation unit 311 performs a desired window function processing in order to optimize the depth resolution and the dynamic range, which are in a trade-off relationship when the Fourier transform is performed in a finite interval. Next, an absolute value tomographic image is obtained by performing an absolute value calculation on the result of the FFT process. The tomographic image of the OCT is obtained by performing logarithmic calculation on the tomographic image of the absolute value, and is called a luminance image or an intensity image.

＜ステップＳ３０４＞
ステップＳ３０４では、ノイズ低減部３３４が断層画像のノイズを低減する。図４にノイズ低減部３３４のブロック図を示す。ノイズ低減部は、黒点を考慮した全変動ノイズ低減法（ｔｏｔａｌｖａｒｉａｔｉｏｎｄｅ−ｎｏｉｓｉｎｇ）を使用する。黒点とは、周囲の画素に比較して極端に暗い画素である。 <Step S304>
In step S304, the noise reduction unit 334 reduces the noise of the tomographic image. FIG. 4 shows a block diagram of the noise reduction unit 334. The noise reduction unit uses a total variation noise reduction method (total variation de-noise) in consideration of black spots. A black dot is a pixel that is extremely dark compared to surrounding pixels.

ここで、従来の全変動ノイズ低減法について説明する。従来の全変動ノイズ低減法は、次式で表現される。 Here, a conventional total variation noise reduction method will be described. The conventional total variation noise reduction method is expressed by the following equation.

数式１において、Ｖは評価関数、ｇ（ｘ，ｙ）は入力画像、ｆ（ｘ，ｙ）は出力画像である。出力画像ｆ（ｘ，ｙ）は入力画像ｇ（ｘ，ｙ）のノイズを低減した画像である。入力画像ｇ（ｘ，ｙ）の大きさと出力画像ｆ（ｘ，ｙ）の大きさは同じである。数式１のΣ（）は画素毎に括弧の中を計算して画像全体について総和を計算し、ｄ／ｄｘはｘ方向の微分を計算し、ｄ／ｄｙはｙ方向の微分を計算する。λは、第１項と第２項の割合を決めるための比率である。評価関数Ｖのうち、ｆ（ｘ，ｙ）が変数であり、その他は固定値である。評価関数Ｖの値はゼロ以上の値である。 In Equation 1, V is the evaluation function, g (x, y) is the input image, and f (x, y) is the output image. The output image f (x, y) is an image in which the noise of the input image g (x, y) is reduced. The size of the input image g (x, y) and the size of the output image f (x, y) are the same. Σ () in Equation 1 calculates the sum in parentheses for each pixel to calculate the sum for the entire image, d / dx calculates the derivative in the x direction, and d / dy calculates the derivative in the y direction. λ is a ratio for determining the ratio of the first term and the second term. Of the evaluation functions V, f (x, y) is a variable, and the others are fixed values. The value of the evaluation function V is a value of zero or more.

全変動ノイズ低減法は、入力画像ｇ（ｘ，ｙ）が与えられたときに、評価関数Ｖを最小化するような出力画像ｆ（ｘ，ｙ）を求める方法である。ここで評価関数Ｖの意味について言及する。第１項は入力画像と出力画像の差の二乗であるので、出力画像が入力画像と似ていれば第１項は小さくなり、第２項は出力画像の微分の大きさであるので、出力画像が滑らかであれば第２項は小さくなる。このことから、評価関数Ｖを小さくすることは、「出力画像は入力画像と似ているが、似ているだけでなく出力画像は滑らかである」ということを要請していることになる。第１項と第２項の割合はλの大きさで決定し、λが小さければ全体に占める第１項の割合が大きくなるので出力画像が入力画像と似ていることを強く要請することになる。λが大きければ全体に占める第２項の割合が大きくなるので、出力画像が滑らかであることを強く要請することになる。 The total variation noise reduction method is a method of obtaining an output image f (x, y) that minimizes the evaluation function V when an input image g (x, y) is given. Here, the meaning of the evaluation function V will be mentioned. The first term is the square of the difference between the input image and the output image, so if the output image is similar to the input image, the first term will be smaller, and the second term will be the magnitude of the derivative of the output image, so the output. If the image is smooth, the second term becomes smaller. From this, reducing the evaluation function V requires that "the output image is similar to the input image, but not only is it similar, but the output image is smooth". The ratio of the first term and the second term is determined by the size of λ, and if λ is small, the ratio of the first term to the whole becomes large, so it is strongly requested that the output image resembles the input image. Become. If λ is large, the ratio of the second term to the whole becomes large, so that it is strongly requested that the output image be smooth.

ノイズ除去するなら、第２項だけでもよいように思えるが、もし評価関数が第２項だけで構成されていると、ｆ（ｘ，ｙ）＝５とかｆ（ｘ，ｙ）＝７などの、ｆ（ｘ，ｙ）の全画素が一定値のケースが求解されてしまう。なぜなら、全画素が一定値の場合、第２項はゼロとなり、その結果、評価関数はゼロとなり、最小値となるためである。全画素が一定値の場合、入力画像のノイズ低減の意味をなさない。そのため、第１項と第２項の両方が必要である。第１項を誤差項、第２項を全変動正則化項と呼ぶ。λは、全変動正則化項と前記誤差項の比率を変更するためのパラメータである。 If noise is to be removed, it seems that only the second term is sufficient, but if the evaluation function is composed of only the second term, f (x, y) = 5, f (x, y) = 7, etc. , The case where all the pixels of f (x, y) have constant values is solved. This is because when all the pixels have constant values, the second term becomes zero, and as a result, the evaluation function becomes zero and becomes the minimum value. When all the pixels have constant values, it does not make sense to reduce noise in the input image. Therefore, both the first term and the second term are necessary. The first term is called the error term, and the second term is called the total variation regularization term. λ is a parameter for changing the ratio of the total variation regularization term and the error term.

出力画像ｆ（ｘ，ｙ）を求めるアルゴリズムとしては、最急降下法、ＡＤＭＭ（ＡｌｔｅｒｎａｔｉｎｇＤｉｒｅｃｔｉｏｎＭｅｔｈｏｄｏｆＭｕｌｔｉｐｌｉｅｒｓ）、主−双対近接分離法、ＩｔｅｒａｔｉｖｅｌｙＲｅｗｅｉｇｈｔｅｄＬｅａｓｔＳｑｕａｒｅｓ法などが知られている。これらはいずれも反復法である。反復法であるから評価関数を徐々に最小化していくわけだが、このとき最小化に向かっている途中の状態の出力画像ｆ（ｘ，ｙ）はノイズ低減画像の推定画像と呼ぶ。最小化した状態の出力画像ｆ（ｘ，ｙ）はノイズ低減画像とか、ノイズ低減された画像と呼ぶ。ノイズ低減画像の推定画像は図４の目的関数最小化部３３４４が生成する。なお、目的関数の最小化は、目的関数が満たすべき所定の条件の一例であり、このとき、目的関数が所定の条件を満たすように変更して得た推定画像がノイズ低減画像として出力される。 As an algorithm for obtaining the output image f (x, y), a steepest descent method, an ADMM (Alternating Direction Method of Multipliers), a master-dual proximity separation method, an Iteratively Reduced Last Squares method, and the like are known. All of these are iterative methods. Since it is an iterative method, the evaluation function is gradually minimized. At this time, the output image f (x, y) in the state of being toward minimization is called an estimated image of the noise reduction image. The output image f (x, y) in the minimized state is called a noise-reduced image or a noise-reduced image. The estimated image of the noise reduction image is generated by the objective function minimization unit 3344 of FIG. The minimization of the objective function is an example of a predetermined condition that the objective function should satisfy. At this time, an estimated image obtained by changing the objective function so as to satisfy the predetermined condition is output as a noise reduction image. ..

全変動ノイズ低減法は人物写真や風景写真のノイズ低減に効果的に働く。しかし、上記数式１の入力画像ｇ（ｘ，ｙ）に断層画像を与えて、出力画像ｆ（ｘ，ｙ）を求解した場合、出力画像はノイズが十分に除去されない。理由は断層画像にガウス性のノイズだけでなく、黒点が多く含まれていることが原因である。第１項は出力画像と入力画像とが似ていることを要請しているわけであるが、入力画像の黒点が出力画像の同じ位置に黒点が存在することを要請してしまうためである。そのため、数式１を改良した次の評価関数Ｖ１を採用する方法がよい。本実施形態は、このＶ１を目的関数として使用する。 The total variation noise reduction method works effectively to reduce noise in portraits and landscapes. However, when a tomographic image is given to the input image g (x, y) of the above formula 1 and the output image f (x, y) is solved, noise is not sufficiently removed from the output image. The reason is that the tomographic image contains not only Gaussian noise but also many black spots. The first term requests that the output image and the input image are similar, but the black spots of the input image request that the black spots exist at the same positions of the output image. Therefore, it is preferable to adopt the following evaluation function V1 which is an improvement of the equation 1. In this embodiment, this V1 is used as an objective function.

ここで、ｈ（ｘ，ｙ）は、ＯＣＴ装置が被写体を撮像して得た断層画像または正面画像の一例である第１の医用画像であって、第１の医用画像におけるノイズの位置の画素値を欠損（低減）させた画像であり、第２の医用画像（欠損画像）の一例である。本実施形態では、ｇ（ｘ，ｙ）において黒点の位置の画素をゼロにした画像である。Ａ（ｘ，ｙ）は、黒点の位置の画素をゼロ、その他の画素を１にした画像であり、ノイズの位置について画素値をマスクしたマスク画像である。本実施形態ではＡ（ｘ，ｙ）を観測行列と呼ぶ。 Here, h (x, y) is a first medical image which is an example of a tomographic image or a frontal image obtained by imaging a subject by an OCT device, and is a pixel of a noise position in the first medical image. It is an image in which the value is deleted (reduced), and is an example of a second medical image (defective image). In the present embodiment, it is an image in which the pixel at the position of the black dot in g (x, y) is set to zero. A (x, y) is an image in which the pixel at the black dot position is zero and the other pixels are 1, and is a mask image in which the pixel value is masked with respect to the noise position. In this embodiment, A (x, y) is called an observation matrix.

数式２は、次式でも同じである。 Equation 2 is the same in the following equation.

Ａ（ｘ，ｙ）の導入により、評価関数Ｖ１の第１項は、入力画像の黒点と同じ位置に出力画像の黒点が存在しない場合に評価関数の値が大きくならないようになる。すなわち、検出されたノイズ（黒点）の位置については、入力画像と出力画像との画素毎における画素値の差を考慮しないようにすることができる。このおかげで、出力画像に黒点が存在しなくなる、または黒点の数が軽減されることになり、従来の全変動ノイズ低減法よりも断層画像のノイズ低減の効果が高くなる。すなわち、ノイズ（黒点）の位置の画素値が改善された（ノイズの位置以外の画素値は似ているが、ノイズの位置の画素値は似ていない）出力画像を生成することが可能となる。このＡ（ｘ，ｙ）の生成は、ノイズ低減部３３４の内部ブロックの、観測行列生成部３３４２で行われる。 With the introduction of A (x, y), the first term of the evaluation function V1 prevents the value of the evaluation function from increasing when the black spot of the output image does not exist at the same position as the black spot of the input image. That is, regarding the position of the detected noise (black spot), it is possible not to consider the difference in the pixel value for each pixel between the input image and the output image. Thanks to this, there are no black spots in the output image, or the number of black spots is reduced, and the noise reduction effect of the tomographic image is higher than that of the conventional total variation noise reduction method. That is, it is possible to generate an output image in which the pixel value at the noise (black spot) position is improved (the pixel values other than the noise position are similar, but the pixel values at the noise position are not similar). .. The generation of A (x, y) is performed by the observation matrix generation unit 3342 of the internal block of the noise reduction unit 334.

本実施形態では出力画像ｆ（ｘ，ｙ）は、ステップＳ３０３でＦＦＴ処理を行った結果に、絶対値の演算を施して、さらに対数演算を行った画像を考えているが、その対数演算を行わない画像を与えてもよい。この場合、評価関数はＶ２を使用するとよい。 In the present embodiment, the output image f (x, y) considers an image obtained by performing an absolute value calculation on the result of the FFT process in step S303 and further performing a logarithmic calculation. An image that is not performed may be given. In this case, it is preferable to use V2 as the evaluation function.

評価関数Ｖ２において、δは対数演算を行う際に使用する定数である。評価関数Ｖ２を使用する場合は、求解アルゴリズムにより出力画像ｆ（ｘ，ｙ）を求めたのちに、対数演算を施して、断層画像のノイズ低減画像を得る。具体的には In the evaluation function V2, δ is a constant used when performing a logarithmic operation. When the evaluation function V2 is used, the output image f (x, y) is obtained by the solution algorithm, and then logarithmic calculation is performed to obtain a noise-reduced image of the tomographic image. In particular

を演算して対数画像を得て、断層画像のノイズ低減画像を得る。

Is calculated to obtain a logarithmic image, and a noise-reduced image of a tomographic image is obtained.

画像処理システム１００は、ノイズ低減部３３４に、ステップＳ３０３で生成した断層画像と、ステップＳ３０２の撮影パラメータを入力する。ノイズ低減部３３４は、断層画像と撮影パラメータをノイズ検出部３３４１に入力する。また、ノイズ低減部３３４は、断層画像を目的関数最小化部３３４４に入力する。 The image processing system 100 inputs the tomographic image generated in step S303 and the imaging parameters in step S302 to the noise reduction unit 334. The noise reduction unit 334 inputs a tomographic image and imaging parameters to the noise detection unit 3341. Further, the noise reduction unit 334 inputs the tomographic image to the objective function minimization unit 3344.

ノイズ検出部３３４１は、断層画像からノイズの一種である黒点を検出して、黒点の画素位置を特定する。黒点を検出する方法としては、平均画像を作成して、平均画像と断層画像を比較するという方法がある。具体的には、断層画像の画素値が平均画像の画素値より７０％以下の画素を黒点と判断する。なお、この７０％という数字を説明の便宜上、数Ａと呼ぶ。数Ａが７０％とは本実施形態のケースであり、その他の実施形態として、数Ａが７０％より大きくても、また、小さくてもよい。黒点の発生は、撮影パラメータと関係性があるため、黒点が多く発生することが撮影パラメータから判断できるケースは、ノイズ検出部３３４１が数Ａを自動的に大きくするようにしてもよい。例えば、数Ａに初期値があり、黒点が多く発生することが撮影パラメータから判断できるケースは、初期値より大きくするという方法がある。この方法は相対的に設定するケースである。または、黒点が多く発生することが撮影パラメータから判断できるケースにおいて、ノイズ検出部３３４１が数Ａを設定してもよい。この方法は絶対的に設定するケースである。黒点が少なく発生することが撮影パラメータから判断できるケースは自動的に数Ａを小さくする方法を採ってもよい。つまり、数Ａは黒点を検出するための閾値である。 The noise detection unit 3341 detects black spots, which are a type of noise, from the tomographic image and identifies the pixel positions of the black spots. As a method of detecting black spots, there is a method of creating an average image and comparing the average image with the tomographic image. Specifically, a pixel whose pixel value of the tomographic image is 70% or less of the pixel value of the average image is determined to be a black dot. The number 70% is referred to as the number A for convenience of explanation. The number A of 70% is the case of this embodiment, and as another embodiment, the number A may be larger or smaller than 70%. Since the occurrence of black spots is related to the shooting parameters, the noise detection unit 3341 may automatically increase the number A in the case where it can be determined from the shooting parameters that many black spots are generated. For example, in the case where the number A has an initial value and it can be determined from the shooting parameters that many black spots occur, there is a method of making it larger than the initial value. This method is a case of relative setting. Alternatively, the noise detection unit 3341 may set the number A in a case where it can be determined from the photographing parameters that many black spots are generated. This method is a case of absolute setting. In cases where it can be determined from the imaging parameters that there are few black spots, a method of automatically reducing the number A may be adopted. That is, the number A is a threshold value for detecting a black spot.

または、黒点の割合が画素全体の２０％となるように、ノイズ検出部３３４１が数Ａを探索的に決定してもよい。この２０％という数字を説明の便宜上、数Ｂと呼ぶ。数Ｂが大きいと黒点が多く検出することを意味し、数Ｂが小さいと黒点が少なく検出することを意味する。数Ａを決定すると画素の割合が決まる関係にあるので、黒点の割合が画素全体の２０％となるように数Ａを探索的に決定するとは、数Ａを１００％から少しずつ減少させながら、数Ｂが略２０％となる数Ａを見つけることを意味する。なお、本実施形態では、数Ｂは２０％であるが、その他の実施形態としては、２０％より大きくても、また、小さくてもよい。上述したように黒点の発生は、撮影パラメータと関係性があるため、黒点が多く発生することが撮影パラメータから判断できるケースは、ノイズ検出部３３４１が数Ｂを自動的に大きくするようにしてもよい。この場合、数Ａが探索的に決定されて、数Ａは大きくなる。つまり、数Ｂも黒点を検出するための閾値である。 Alternatively, the noise detection unit 3341 may searchly determine the number A so that the ratio of black spots is 20% of the entire pixel. This number of 20% is referred to as the number B for convenience of explanation. A large number B means that many black spots are detected, and a small number B means that few black spots are detected. Since the pixel ratio is determined when the number A is determined, exploratory determination of the number A so that the ratio of black spots is 20% of the entire pixel is to gradually reduce the number A from 100% while gradually reducing the number A. It means to find the number A in which the number B is about 20%. In this embodiment, the number B is 20%, but in other embodiments, it may be larger or smaller than 20%. As described above, the occurrence of black spots is related to the shooting parameters. Therefore, in the case where it can be determined from the shooting parameters that many black spots are generated, the noise detection unit 3341 may automatically increase the number B. Good. In this case, the number A is exploratoryly determined, and the number A becomes large. That is, the number B is also a threshold value for detecting a black spot.

ノイズ検出部３３４１が黒点の画素位置ｘ、ｙを特定したら、ノイズ検出部３３４１は黒点の画素位置を観測行列生成部３３４２へ入力する。観測行列生成部３３４２は入力された黒点の画素位置を使用して、評価関数Ｖ１やＶ２の、Ａ（ｘ，ｙ）を生成する。Ａ（ｘ，ｙ）の値は、黒点の位置の値がゼロで、それ以外が１である。ノイズ低減部３３４は、観測行列生成部３３４２が生成したＡ（ｘ，ｙ）を目的関数最小化部３３４４へ入力する。 After the noise detection unit 3341 specifies the pixel positions x and y of the black point, the noise detection unit 3341 inputs the pixel position of the black point to the observation matrix generation unit 3342. The observation matrix generation unit 3342 uses the input pixel positions of the black dots to generate A (x, y) of the evaluation functions V1 and V2. As for the value of A (x, y), the value at the position of the black dot is zero, and the other values are 1. The noise reduction unit 334 inputs A (x, y) generated by the observation matrix generation unit 3342 to the objective function minimization unit 3344.

さらに、ノイズ低減部３３４は、撮影パラメータを比率λ生成部３３４３に入力する。比率λ生成部３３４３は、評価関数Ｖ１やＶ２の比率λを生成するブロックである。比率λはノイズ低減の効果の強さを決めるパラメータであり、λが大きいほどノイズ低減の効果は大きくなる。ノイズの多さは撮影パラメータと関係があるため、比率λ生成部３３４３は撮影パラメータと関係して比率λを生成する。なお、別の実施形態として、比率λの生成は撮影パラメータに依存しなくてもよい。ノイズ低減部３３４は、比率λ生成部３３４３が生成したλを目的関数最小化部３３４４へ入力する。 Further, the noise reduction unit 334 inputs the shooting parameters to the ratio λ generation unit 3343. The ratio λ generation unit 3343 is a block that generates the ratio λ of the evaluation functions V1 and V2. The ratio λ is a parameter that determines the strength of the noise reduction effect, and the larger the λ, the greater the noise reduction effect. Since the amount of noise is related to the shooting parameter, the ratio λ generation unit 3343 generates the ratio λ in relation to the shooting parameter. As another embodiment, the generation of the ratio λ does not have to depend on the imaging parameters. The noise reduction unit 334 inputs the λ generated by the ratio λ generation unit 3343 to the objective function minimization unit 3344.

目的関数最小化部３３４４は、目的関数を最小化するブロックである。本実施形態の目的関数は評価関数Ｖ１である。目的関数最小化部３３４４は、入力された断層画像を入力画像ｇ（ｘ，ｙ）とし、入力されたＡ（ｘ，ｙ）と入力されたλを使用して、目的関数を最小化する。目的関数最小化部３３４４は、反復を繰り返しても目的関数の値が変化しない状態にあることを検出したときに、目的関数が最小化したと判定する。本実施形態の最小化アルゴリズムは、ＡＤＭＭを使用するが、このアルゴリズムである必要はなく、目的関数を最小化することができるアルゴリズムを使用すればよい。最小化で得られる出力画像ｆ（ｘ，ｙ）がノイズ低減された断層画像である。ノイズ低減部３３４は、目的関数最小化部３３４４が生成したノイズ低減された断層画像をノイズ低減部３３４の外部へ出力する。画像処理システム１００は、ノイズ低減された断層画像を、記憶部３０２で記憶する。 The objective function minimization unit 3344 is a block that minimizes the objective function. The objective function of this embodiment is the evaluation function V1. The objective function minimization unit 3344 uses the input tomographic image as the input image g (x, y) and uses the input A (x, y) and the input λ to minimize the objective function. The objective function minimization unit 3344 determines that the objective function has been minimized when it detects that the value of the objective function does not change even if the iteration is repeated. The minimization algorithm of this embodiment uses ADMM, but it does not have to be this algorithm, and an algorithm capable of minimizing the objective function may be used. The output image f (x, y) obtained by minimization is a tomographic image with noise reduced. The noise reduction unit 334 outputs the noise-reduced tomographic image generated by the objective function minimization unit 3344 to the outside of the noise reduction unit 334. The image processing system 100 stores the noise-reduced tomographic image in the storage unit 302.

なお、比率λは、画像処理システム１００が自動的に決めるのではなく、画像処理システムを操作する人間が変更する手段があってもよい。図５は、表示部と入力部を合わせたＧＵＩである。このＧＵＩは、画像表示部６００１と比率λ入力部７００１がある。画像表示部６００１はノイズ低減された断層画像を表示する領域であり、比率λ入力部７００１は比率λを連続的または離散的に値を変更するためのスライダーである。本実施形態はスライダーであるが、スライダーではなく、ボタンや直接数値を入力するようなＧＵＩであってもよい。図５は、比率λ入力部により比率λを変更すると、それに応じてノイズ低減部３３４が動作して、ノイズ低減された断層画像を画像表示部６００１に表示するＧＵＩの一例である。 The ratio λ is not automatically determined by the image processing system 100, but may be changed by a person who operates the image processing system. FIG. 5 is a GUI in which the display unit and the input unit are combined. This GUI has an image display unit 6001 and a ratio λ input unit 7001. The image display unit 6001 is an area for displaying a noise-reduced tomographic image, and the ratio λ input unit 7001 is a slider for continuously or discretely changing the value of the ratio λ. Although the present embodiment is a slider, it may be a GUI for directly inputting a button or a numerical value instead of the slider. FIG. 5 is an example of a GUI in which when the ratio λ is changed by the ratio λ input unit, the noise reduction unit 334 operates and the noise-reduced tomographic image is displayed on the image display unit 6001.

また、本実施形態では、ノイズ検出部３３４１が黒点を検出したが、その他の実施形態としてノイズ検出部３３４１が白とび画素の位置を検出して観測行列生成部３３４２が白とび画素の位置をマスクするＡ（ｘ，ｙ）を生成してもよい。ここで、白とび画素とは、信号レベルが周辺近傍より極端に大きい画素である。または、その他の実施形態として、ノイズ検出部３３４１は、黒点と白とび画素の両方の位置を検出し、観測行列生成部３３４２が黒点と白とび画素の位置をマスクするＡ（ｘ，ｙ）を生成する構成であってもよい。 Further, in the present embodiment, the noise detection unit 3341 detects a black spot, but as another embodiment, the noise detection unit 3341 detects the position of the overexposed pixel and the observation matrix generation unit 3342 masks the position of the overexposed pixel. A (x, y) may be generated. Here, the overexposed pixel is a pixel whose signal level is extremely higher than that in the vicinity of the periphery. Alternatively, as another embodiment, the noise detection unit 3341 detects the positions of both the black spots and the whiteout pixels, and the observation matrix generation unit 3342 masks the positions of the black spots and the whiteout pixels A (x, y). It may be a configuration to be generated.

＜ステップＳ３０５＞
ステップＳ３０５では、検出部３３２がステップＳ３０３で生成した断層画像、または、ステップＳ３０４で生成したノイズ低減された断層画像から、網膜層の層境界を検出して、層境界線を出力する。本実施形態はステップＳ３０４で生成したノイズ低減された断層画像を使用する。検出部３３２は、まず、ステップＳ３０４でノイズ低減した断層画像にエッジ検出演算を施して、次に線強調フィルタを施す。線強調フィルタは、例えば、ヘッセ行列固有値に基づく強調フィルタなどがある。層検出部３３２は、エッジ検出演算の結果と、線強調フィルタの結果を用いて、層境界の位置を判断する。 <Step S305>
In step S305, the detection unit 332 detects the layer boundary of the retinal layer from the tomographic image generated in step S303 or the noise-reduced tomographic image generated in step S304, and outputs the layer boundary line. The present embodiment uses the noise-reduced tomographic image generated in step S304. The detection unit 332 first performs an edge detection calculation on the noise-reduced tomographic image in step S304, and then applies a line enhancement filter. The line enhancement filter includes, for example, an enhancement filter based on the Hessian matrix eigenvalues. The layer detection unit 332 determines the position of the layer boundary by using the result of the edge detection calculation and the result of the line emphasis filter.

＜ステップＳ３０７＞
ステップＳ３０７では、表示制御部３０５は、ノイズ低減した断層画像を表示部６００に表示させる。すなわち、表示制御部３０５は、ノイズ低減画像を表示部６００に表示させる。このとき、層検出部により検出した層境界線を断層画像に重畳して、表示部６００に表示してもよい。 <Step S307>
In step S307, the display control unit 305 causes the display unit 600 to display the noise-reduced tomographic image. That is, the display control unit 305 causes the display unit 600 to display the noise reduction image. At this time, the layer boundary line detected by the layer detection unit may be superimposed on the tomographic image and displayed on the display unit 600.

＜ステップＳ３０８＞
ステップＳ３０８において、不図示の指示取得部は、画像処理システム１００による断層画像の撮影を終了するか否かの指示を外部から取得する。この指示は、入力部７００を用いて、操作者によって入力される。処理を終了する指示を取得した場合には、画像処理システム１００はその処理を終了する。一方、処理を終了せずに、撮影を続ける場合には、ステップＳ３０２に処理を戻して撮影を続行する。 <Step S308>
In step S308, the instruction acquisition unit (not shown) acquires an instruction from the outside whether or not to end the acquisition of the tomographic image by the image processing system 100. This instruction is input by the operator using the input unit 700. When the instruction to end the process is acquired, the image processing system 100 ends the process. On the other hand, if shooting is to be continued without ending the processing, the processing is returned to step S302 to continue shooting.

以上によって、画像処理システム１００の処理が行われる。 As described above, the processing of the image processing system 100 is performed.

（第２の実施形態）
以下、第２の実施形態について説明する。なお、本実施形態に係る画像処理装置を備える画像処理システムは、第１の実施形態と共通している部分がある。以下では、異なる点についてのみ、説明する。 (Second Embodiment)
Hereinafter, the second embodiment will be described. The image processing system including the image processing device according to the present embodiment has some parts in common with the first embodiment. In the following, only the differences will be described.

第１の実施形態では、１枚の断層画像をスキャンしていた。第２の実施形態では、同じ箇所において複数回スキャンを繰り返して複数個の断層画像を得て、これらを加算平均してノイズ低減を行うとともに、ノイズ低減部３３４のノイズ低減を行う。なお、加算平均による処理は、複数の推定画像を合成する合成処理の一例であり、また、加算平均して得た加算平均画像は、合成画像の一例である。まず、第１の実施形態のステップＳ３０２に加えて、本実施形態では、撮影を行うにあたり各種撮影パラメータの調整を行う際に、加算平均する個数Ｎ（Ｎ≧２）も設定する。また、第１の実施形態のステップＳ３０２のスキャンパターンに関して、３次元ボリュームを撮影するラスタースキャンや放射状スキャン、クロススキャンなどのスキャンパターンを設定するところは同じであるが、本実施形態は、各スキャンパターンにおいて、一つのライン上を繰り返し撮影する点が異なる。加算平均する個数Ｎが繰り返し撮影する回数である。また、第１の実施形態のステップＳ３０３では、１枚のＣスキャンの断層画像を生成するが、本実施形態は、Ｎ枚のＣスキャンの断層画像を生成する。さらに、第１の実施形態のステップＳ３０４はノイズ低減を行うが、本実施形態は、ノイズ低減の前に、ステップＳ２０３で生成したＮ個（Ｎ≧２）のＣスキャンの断層画像を用いて、位置合わせ部３３１が絶対値の断層画像の位置合わせを行う。本実施形態では絶対値の断層画像を使用するが、絶対値の断層画像に対数演算を施した画像を使用してもよい。位置合わせ処理としては、例えば、２つの断層画像の類似度を表す評価関数を事前に定義しておき、この評価関数の値が最も良くなるように絶対値の断層画像を変形する。評価関数としては、画素値で評価する方法が挙げられる（例えば、相関係数を用いて評価を行う方法が挙げられる）。 In the first embodiment, one tomographic image was scanned. In the second embodiment, scanning is repeated a plurality of times at the same location to obtain a plurality of tomographic images, and these are added and averaged to reduce noise and reduce noise in the noise reduction unit 334. The process by addition averaging is an example of a composition process for synthesizing a plurality of estimated images, and the addition averaging image obtained by addition averaging is an example of a composition image. First, in addition to step S302 of the first embodiment, in the present embodiment, the number N (N ≧ 2) to be added and averaged is also set when adjusting various shooting parameters in performing shooting. Further, regarding the scan pattern of step S302 of the first embodiment, the place where scan patterns such as raster scan, radial scan, and cross scan for capturing a three-dimensional volume are set is the same, but in this embodiment, each scan is set. The difference is that the patterns are repeatedly photographed on one line. The number N to be added and averaged is the number of times of repeated shooting. Further, in step S303 of the first embodiment, one C-scan tomographic image is generated, but in this embodiment, N C-scan tomographic images are generated. Further, step S304 of the first embodiment performs noise reduction, but in this embodiment, N (N ≧ 2) C-scan tomographic images generated in step S203 are used before noise reduction. The alignment unit 331 aligns the absolute tomographic image. In the present embodiment, an absolute tomographic image is used, but an image obtained by performing logarithmic calculation on the absolute tomographic image may be used. As the alignment process, for example, an evaluation function representing the similarity between the two tomographic images is defined in advance, and the absolute tomographic image is deformed so that the value of this evaluation function becomes the best. As an evaluation function, a method of evaluating by a pixel value can be mentioned (for example, a method of evaluating by using a correlation coefficient can be mentioned).

Ｃスキャンの絶対値の断層画像同士の類似度を評価することはデータ量が多いため、処理時間がかかる。処理時間の高速化のために、Ｃスキャンの絶対値の断層画像から代表的な複数のＢスキャンの断層画像を取り出して絶対値の断層画像集合を作り、絶対値の断層画像集合毎に類似度を計算する。 Evaluating the similarity between tomographic images of the absolute value of C scan takes a long time because of the large amount of data. In order to speed up the processing time, a plurality of typical B-scan tomographic images are extracted from the C-scan absolute tomographic images to create an absolute tomographic image set, and the similarity is obtained for each absolute tomographic image set. To calculate.

類似度を表す評価関数として相関係数を用いた場合の式を数式６に示す。 Equation 6 shows the formula when the correlation coefficient is used as the evaluation function representing the degree of similarity.

数式６において、１枚目の絶対値の断層画像の領域をｆ（ｘ，ｚ）、２枚目の絶対値の断層画像の領域をｇ（ｘ，ｚ）とする。 In Equation 6, the region of the first absolute value tomographic image is f (x, z), and the region of the second absolute value tomographic image is g (x, z).

は、それぞれ領域ｆ（ｘ，ｚ）と領域ｇ（ｘ，ｚ）の平均を表す。なお数式６のｆとｇは、数式１〜４のｆとｇとは別物である。ここで領域とは位置合わせに用いるための画像領域であり、通常断層画像のサイズ以下の領域が設定され、眼の断層画像においては網膜層領域を含むように設定される事が望ましい。画像の変形処理としては、アフィン変換を用いて並進や回転を行ったり、拡大率を変化させたりする処理が挙げられる。次に、画像処理システム１００は、位置合わせ部３３１が位置合わせした断層画像Ｎ個に対し、ノイズ低減部３３４で１枚ずつノイズ低減を行う。ノイズ低減部３３４は、Ｎ個のノイズ低減された断層画像を出力する。その後、画像処理システム１００が、Ｎ個のノイズ低減された断層画像について、同じ座標のピクセル同士の加算平均を行い、加算平均をした断層画像を生成する。これらの処理を各ラインにおける複数の絶対値の断層画像に対して実行する。加算平均処理を行うことで、ノイズを減らし硝子体や網膜の信号を強調した高画質な画像を生成する事が出来る。

Represents the average of the region f (x, z) and the region g (x, z), respectively. Note that f and g in Equation 6 are different from f and g in Equations 1 to 4. Here, the region is an image region to be used for alignment, and it is desirable that a region smaller than the size of a tomographic image is usually set, and a region including a retinal layer region is included in a tomographic image of the eye. Examples of the image transformation process include a process of performing translation and rotation using an affine transformation, and a process of changing the enlargement ratio. Next, in the image processing system 100, the noise reduction unit 334 reduces noise one by one with respect to N tomographic images aligned by the alignment unit 331. The noise reduction unit 334 outputs N noise-reduced tomographic images. After that, the image processing system 100 performs addition averaging of pixels having the same coordinates on N noise-reduced tomographic images, and generates a tomographic image obtained by the addition averaging. These processes are performed on a plurality of absolute tomographic images in each line. By performing the averaging process, it is possible to generate a high-quality image that reduces noise and emphasizes the signals of the vitreous body and the retina.

本実施形態によれば、加算平均のノイズ低減効果だけでなく、ノイズ低減部３３４によるノイズ低減効果が得られる。そのため、Ｎ回のスキャン回数の加算平均のみで得られるノイズ低減効果と同じ効果を、Ｎより少ないスキャン回数で得ることができる。スキャン回数が少ないと、患者を検査する時間が短くなるので、患者の負担を少なくする効果がある。なお、本実施形態では、スキャンの方法はＣスキャンであるが、他の実施形態として、クロススキャンやマルチクロススキャンであってもよい。また、本実施形態では位置合わせしてからノイズ低減部３３４によるノイズ低減を行ったが、ノイズ低減部３３４によるノイズ低減を行った後に位置合わせして加算平均してもよい。 According to this embodiment, not only the noise reduction effect of addition averaging but also the noise reduction effect by the noise reduction unit 334 can be obtained. Therefore, the same effect as the noise reduction effect obtained only by the addition average of the number of scans of N times can be obtained with the number of scans less than N. When the number of scans is small, the time for examining the patient is shortened, which has the effect of reducing the burden on the patient. In this embodiment, the scanning method is C scan, but as another embodiment, cross scan or multi-cross scan may be used. Further, in the present embodiment, the noise is reduced by the noise reduction unit 334 after the alignment, but the noise may be reduced by the noise reduction unit 334 and then aligned and added and averaged.

本実施形態では、Ｎ回のスキャン回数に対応するＮ個の断層画像に関して、ノイズ低減部３３４によるノイズ低減を行った。しかし、Ｎ個の全部に対してノイズ低減を行うのではなく、そのうち一部についてのみノイズ低減を行うという実施形態であってもよい。ノイズ低減部３３４の実行処理時間が長い場合は、Ｎ個の全部に対してノイズ低減を行うよりも一部に対して行う方が全体の処理時間を短くすることができる。また、本実施形態では、ノイズ検出部３３４１が黒点の位置を検出したが、黒点の位置を検出する時間を節約するために、乱数で黒点の位置を予測してもよい。この場合、乱数で生成した位置が黒点であるとは限らないため、複数回の乱数生成とノイズ低減を実行し、その結果を加算平均することでノイズ低減された断層画像として出力する。図６は、そのブロック図である。図６の画像１は、１回目の処理であり、乱数生成で黒点の位置を決定し、第１の実施形態で説明したように目的関数最小化部で評価関数を最小化して得られた画像である。図６の画像２は、２回目の処理であり、１回目と同様に乱数生成で黒点の位置を決定し、第１の実施形態で説明したように目的関数最小化部で評価関数を最小化して得られた画像である。これをＭ回繰り返して（複数回繰り返して）、画像１〜画像Ｍを得る。最後に画像１〜画像Ｍを加算平均して、ノイズ低減された断層画像として出力する。本実施形態はノイズの位置を予測するときに乱数で生成するが、このとき画素数全体に占めるノイズの割合を固定値とする。その他の実施形態として、ＯＣＴ装置の撮影時の制御パラメータに応じて、画素数全体に占めるノイズの割合を変更してもよい。 In the present embodiment, noise is reduced by the noise reduction unit 334 with respect to N tomographic images corresponding to the number of scans of N times. However, there may be an embodiment in which noise reduction is performed for only a part of the N pieces instead of noise reduction for all of them. When the execution processing time of the noise reduction unit 334 is long, the total processing time can be shortened by performing noise reduction on a part of N noises rather than on all of them. Further, in the present embodiment, the noise detection unit 3341 has detected the position of the black spot, but in order to save the time for detecting the position of the black spot, the position of the black spot may be predicted by a random number. In this case, since the position generated by the random number is not always the black dot, the random number is generated a plurality of times and the noise is reduced, and the results are added and averaged to output as a noise-reduced tomographic image. FIG. 6 is a block diagram thereof. Image 1 of FIG. 6 is the first process, and is an image obtained by determining the position of the black spot by random number generation and minimizing the evaluation function in the objective function minimization unit as described in the first embodiment. Is. Image 2 of FIG. 6 is the second process, in which the position of the black spot is determined by random number generation as in the first process, and the evaluation function is minimized by the objective function minimization unit as described in the first embodiment. It is an image obtained by. This is repeated M times (repeated a plurality of times) to obtain images 1 to M. Finally, images 1 to M are added and averaged and output as a noise-reduced tomographic image. In this embodiment, a random number is generated when predicting the position of noise, and at this time, the ratio of noise to the total number of pixels is set as a fixed value. As another embodiment, the ratio of noise to the total number of pixels may be changed according to the control parameters at the time of photographing by the OCT apparatus.

（第３の実施形態）
以下、第３の実施形態について説明する。なお、本実施形態に係る画像処理装置を備える画像処理システムは、第１の実施形態と第２の実施形態と共通している部分がある。以下では、異なる点についてのみ、説明する。 (Third Embodiment)
Hereinafter, the third embodiment will be described. The image processing system including the image processing device according to the present embodiment has a part in common with the first embodiment and the second embodiment. In the following, only the differences will be described.

第１および第２の実施形態では、目的関数最小化部３３４４で使用した目的関数は数式２のＶ１であるが、本実施形態では出力画像ｆ（ｘ，ｙ）の代わりに、深層学習の分野で知られているＣＡＥ（ＣｏｎｖｏｌｕｔｉｏｎａｌＡｕｔｏｅｎｃｏｄｅｒ：畳み込みオートエンコーダ）を使用して生成する。ＣＡＥに与える入力は、入力画像ｇ（ｘ，ｙ）とする。その他の実施形態として、ＣＡＥの代わりに、画像を入力して画像を出力する画像生成関数であればよく、例えば、ＡＥ（Ａｕｔｏｅｎｃｏｄｅｒ：オートエンコーダ）が考えられる。以下の説明のため、ＣＡＥおよび上述の画像生成関数をまとめて関数ｈ（Ｉｍａｇｅ，ｘ，ｙ）と記す。 In the first and second embodiments, the objective function used in the objective function minimization unit 3344 is V1 of Equation 2, but in this embodiment, instead of the output image f (x, y), the field of deep learning It is generated using a CAE (Functional Autoencoder) known as. The input given to the CAE is the input image g (x, y). As another embodiment, an image generation function that inputs an image and outputs an image may be used instead of CAE, and for example, AE (Autoencoder) can be considered. For the following explanation, CAE and the above-mentioned image generation function are collectively referred to as a function h (Image, x, y).

関数ｈは、画像Ｉｍａｇｅを入力して評価関数Ｖ１の入力画像ｇ（ｘ，ｙ）と同じ大きさの画像を生成する関数である。便宜的にその生成画像を画像ｈ＿ｏｕｔと呼ぶと、画像ｈ＿ｏｕｔの位置ｘ，ｙにおける画素値は、ｈ（Ｉｍａｇｅ，ｘ，ｙ）である。画像ｈ＿ｏｕｔの位置ｘ，ｙにおける画素値は、画像Ｉｍａｇｅの複数の画素値に依存し、かつ、画像Ｉｍａｇｅの複数の画素を用いた非線形関数で計算され、この非線形関数は関数の形を変更するためのパラメータを複数持っている。このように関数ｈは複数のパラメータを有しており、パラメータを変更すると画像ｈ＿ｏｕｔが変更される。本実施形態の場合の目的関数最小化部は、第１および第２の実施形態のように出力画像ｆ（ｘ，ｙ）の画素を変更するのではなく、関数ｈを構成するパラメータを変更することにより、間接的に出力画像ｆ（ｘ，ｙ）の画素を変更する。数式２〜４の出力画像ｆ（ｘ，ｙ）は位置ｘ，ｙに関する値が格納されており、第１および第２の実施形態の目的関数最小化部３３４４は、その格納されている値を変更することにより目的関数を最小化した。これに対し本実施形態は、関数ｈを構成するパラメータを変更することで目的関数を最小化する。ＣＡＥを用いて最小化する方法は深層学習の分野で知られており、各種の方法があるが本実施形態は最急降下法を使用する。本実施形態によれば、ＣＡＥを使用することにより、出力を非負値制約条件、つまりｈ（Ｉｍａｇｅ，ｘ，ｙ）≧０となる条件を付することができるようになる。例えば、ＣＡＥの最終層の活性化関数にシグモイド関数を使うことで実現できる。深層学習における学習とは目的関数を最小化することであるから、本実施形態もその意味において学習していると言えるが、少なくとも以下の事が異なる。まず、第１に、本実施形態は、深層学習で知られているような、多数の教師データを与えて学習するということを行わない。深層学習の教師データに相当するデータは本実施形態ではただ１つであり、入力画像ｇ（ｘ，ｙ）である。また、第２に、深層学習は学習処理ステップと推論処理ステップを持つが、本実施形態は学習処理ステップのみであり推論処理ステップは無い。本実施形態は評価関数Ｖ１を目的関数として使用し、出力画像ｆ（ｘ，ｙ）の代わりに関数ｈ（Ｉｍａｇｅ，ｘ，ｙ）を使用して生成したが、数式４の評価関数Ｖ２を目的関数として使用し、出力画像ｆ（ｘ，ｙ）の代わりに関数ｈ（Ｉｍａｇｅ，ｘ，ｙ）を使用して生成した構成でもよい。 The function h is a function that inputs an image image and generates an image having the same size as the input image g (x, y) of the evaluation function V1. When the generated image is called an image h_out for convenience, the pixel value at the positions x and y of the image h_out is h (Image, x, y). The pixel values at the positions x and y of the image h_out depend on the plurality of pixel values of the image image and are calculated by a nonlinear function using the plurality of pixels of the image image, and this nonlinear function changes the shape of the function. Has multiple parameters for. As described above, the function h has a plurality of parameters, and when the parameters are changed, the image h_out is changed. The objective function minimization unit in the case of the present embodiment does not change the pixels of the output image f (x, y) as in the first and second embodiments, but changes the parameters constituting the function h. This indirectly changes the pixels of the output image f (x, y). The output images f (x, y) of Equations 2 to 4 store the values related to the positions x and y, and the objective function minimization unit 3344 of the first and second embodiments stores the stored values. The objective function was minimized by changing it. On the other hand, in the present embodiment, the objective function is minimized by changing the parameters constituting the function h. The method of minimizing using CAE is known in the field of deep learning, and there are various methods, but this embodiment uses the steepest descent method. According to the present embodiment, by using CAE, it is possible to add a non-negative constraint condition, that is, a condition that h (Image, x, y) ≥ 0. For example, it can be realized by using a sigmoid function for the activation function of the final layer of CAE. Since learning in deep learning is to minimize the objective function, it can be said that this embodiment is also learning in that sense, but at least the following things are different. First, the present embodiment does not perform learning by giving a large amount of teacher data as is known in deep learning. In this embodiment, there is only one data corresponding to the teacher data of deep learning, which is the input image g (x, y). Secondly, deep learning has a learning processing step and an inference processing step, but this embodiment has only a learning processing step and no inference processing step. In this embodiment, the evaluation function V1 is used as the objective function, and the function h (Image, x, y) is used instead of the output image f (x, y). It may be used as a function and generated by using the function h (Image, x, y) instead of the output image f (x, y).

（第４の実施形態）
本実施形態に係る画像処理装置におけるノイズ低減部３３４は、断層画像を高画質化（ノイズ低減）する高画質化手段（ノイズ低減手段）の一例であり、上述した実施形態における加算平均による処理（合成処理）の代わりに、機械学習による高画質化処理を適用するものである。なお、ノイズ低減部３３４は、高画質化部ともよぶ。このとき、高画質化部は、少ない枚数の断層画像から生成した低画質な断層画像を機械学習モデルに入力することにより、多数枚の断層画像から生成した場合と同等の高画質な（低ノイズかつ高コントラストな）断層画像を生成する。ここで、本実施形態に係る機械学習モデルとは、処理対象として想定される所定の撮影条件で取得された低画質な画像である入力データと、入力データに対応する高画質画像である出力データ（正解データ）のペア群で構成された学習データを用いて機械学習を行うことにより生成した関数のことを指す。なお、所定の撮影条件には撮影部位、撮影方式、撮影画角、及び画像サイズ等が含まれる。 (Fourth Embodiment)
The noise reduction unit 334 in the image processing apparatus according to the present embodiment is an example of a high image quality improvement means (noise reduction means) for improving the image quality (noise reduction) of the tomographic image, and is a process by addition averaging in the above-described embodiment (noise reduction means). Instead of (synthesis processing), high image quality processing by machine learning is applied. The noise reduction unit 334 is also referred to as a high image quality unit. At this time, the high image quality section inputs low image quality tomographic images generated from a small number of tomographic images into the machine learning model, so that the image quality is equivalent to that obtained from a large number of tomographic images (low noise). And generate a tomographic image (with high contrast). Here, the machine learning model according to the present embodiment is input data which is a low-quality image acquired under predetermined shooting conditions assumed as a processing target, and output data which is a high-quality image corresponding to the input data. It refers to a function generated by performing machine learning using training data composed of pairs of (correct answer data). The predetermined shooting conditions include a shooting portion, a shooting method, a shooting angle of view, an image size, and the like.

ここで、低画質な断層画像は、例えば、以下のように取得される。まず、操作者が入力部７００を操作して撮影画面（プレビュー画面）中の撮影開始（Ｃａｐｔｕｒｅ）ボタンを押下することにより、操作者からの指示に応じて設定された撮影条件によるＯＣＴ撮影が開始される。このとき、指示部３０４は、断層画像撮影装置２００に対して操作者が指示した設定に基づいてＯＣＴ撮影を実施することを指示し、断層画像撮影装置２００が対応するＯＣＴ断層画像を取得する。また、断層画像撮影装置２００は、ＳＬＯ画像の取得も行い、ＳＬＯ動画像に基づく追尾処理を実行する。ここで、撮影条件の設定は、例えば、１）ＭａｃｕｌａｒＤｉｓｅａｓｅ検査セットの登録、２）ＯＣＴスキャンモードの選択、３）以下の撮影パラメータ、等の設定である。また、撮影パラメータとしては、例えば、３−１）走査パターン：３００Ａスキャン（本）×３００Ｂスキャン（枚）、３−２）走査領域サイズ：６×６ｍｍ、３−３）主走査方向：水平方向、等が設定される。また、撮影パラメータとしては、更に、例えば、３−４）走査間隔：０．０１ｍｍ、３−５）固視灯位置：黄斑（中心窩）、３−６）コヒーレンスゲート位置：硝子体側、３−７）既定表示レポート種別：単眼検査用レポート、等が設定する。そして、表示制御部３０５は、取得された断層画像や撮影条件に関する情報等を表示部６００に表示させる。このとき、本実施形態では、操作者が入力部７００を操作して表示画面（解析結果等を表示するためのレポート画面）における不図示の高画質化ボタンを押下することにより、高画質化部が断層画像に対する高画質化処理を実施するものとする。すなわち、高画質化ボタンは、高画質化処理の実行を指示するためのボタンである。もちろん、高画質化ボタンは、（高画質化ボタンを押下する前に生成された）高画質画像の表示を指示するためのボタンであってもよい。 Here, the low-quality tomographic image is acquired as follows, for example. First, when the operator operates the input unit 700 and presses the shooting start (Capture) button on the shooting screen (preview screen), OCT shooting under the shooting conditions set according to the instruction from the operator is started. Will be done. At this time, the instruction unit 304 instructs the tomographic imaging apparatus 200 to perform OCT imaging based on the setting instructed by the operator, and the tomographic imaging apparatus 200 acquires the corresponding OCT tomographic image. The tomographic image capturing apparatus 200 also acquires an SLO image and executes tracking processing based on the SLO moving image. Here, the setting of the imaging conditions is, for example, 1) registration of the Macular Disease inspection set, 2) selection of the OCT scan mode, 3) setting of the following imaging parameters, and the like. The shooting parameters include, for example, 3-1) scanning pattern: 300A scan (book) x 300B scan (sheets), 3-2) scanning area size: 6 x 6 mm, 3-3) main scanning direction: horizontal direction. , Etc. are set. Further, as imaging parameters, for example, 3-4) scanning interval: 0.01 mm, 3-5) fixation lamp position: macula (fovea centralis), 3-6) coherence gate position: glass side, 3- 7) Default display report type: Monocular examination report, etc. are set. Then, the display control unit 305 causes the display unit 600 to display the acquired tomographic image, information on the imaging conditions, and the like. At this time, in the present embodiment, the operator operates the input unit 700 and presses a high image quality button (not shown) on the display screen (report screen for displaying the analysis result or the like), thereby causing the high image quality unit. Shall carry out high image quality processing on the tomographic image. That is, the high image quality button is a button for instructing the execution of the high image quality processing. Of course, the high image quality button may be a button for instructing the display of the high image quality image (generated before pressing the high image quality button).

本実施形態において、学習データとして用いる入力データは、断層画像数の少ない単一クラスタから生成された低画質の断層画像とする。また、学習データとして用いる出力データ（正解データ）は、位置合わせ済の複数の断層画像を加算平均して得られた高画質の断層画像とする。なお、学習データとして用いる出力データはこれに限らず、例えば、多数枚の断層画像で構成される単一クラスタから生成された高画質な断層画像でもよい。また、学習データとして用いる出力データは、入力画像より高解像度な（高倍率な）断層画像を入力画像と同解像度（同倍率）にすることによって得られた高画質な断層画像でもよい。なお、機械学習モデルのトレーニングに用いる入力画像と出力画像のペアは上記に限られるものではなく、任意の公知の画像の組み合わせを用いてよい。例えば、断層画像撮影装置２００や他の装置で取得した断層画像に第一のノイズ成分を付加した画像を入力画像とし、該（断層画像撮影装置２００や他の装置で取得した）断層画像に（第一のノイズ成分とは異なる）第二のノイズ成分を付加した画像を出力画像として機械学習モデルのトレーニングに用いてもよい。すなわち、高画質化部は、眼底の断層画像を含む学習データを学習して得た高画質化用の学習済モデルを用いて、入力画像として入力された断層画像を高画質化するものであれば何でもよい。 In the present embodiment, the input data used as the training data is a low-quality tomographic image generated from a single cluster having a small number of tomographic images. The output data (correct answer data) used as the training data is a high-quality tomographic image obtained by adding and averaging a plurality of aligned tomographic images. The output data used as the training data is not limited to this, and may be, for example, a high-quality tomographic image generated from a single cluster composed of a large number of tomographic images. Further, the output data used as the training data may be a high-quality tomographic image obtained by setting a tomographic image having a higher resolution (higher magnification) than the input image to the same resolution (same magnification) as the input image. The pair of the input image and the output image used for training the machine learning model is not limited to the above, and any known image combination may be used. For example, an image obtained by adding the first noise component to a tomographic image acquired by the tomographic image capturing device 200 or another device is used as an input image, and the tomographic image (acquired by the tomographic imaging device 200 or another device) is used as an input image. An image to which a second noise component (different from the first noise component) is added may be used as an output image for training a machine learning model. That is, the image quality improving unit may improve the image quality of the tomographic image input as the input image by using the trained model for high image quality obtained by learning the learning data including the tomographic image of the fundus. Anything is fine.

本実施形態に係る高画質化部における機械学習モデルの構成例を図８に示す。機械学習モデルは、畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ；ＣＮＮ）であり、入力値群を加工して出力する処理を担う複数の層群によって構成される。なお、上記構成に含まれる層の種類として、畳み込み（Ｃｏｎｖｏｌｕｔｉｏｎ）層、ダウンサンプリング（Ｄｏｗｎｓａｍｐｌｉｎｇ）層、アップサンプリング（Ｕｐｓａｍｐｌｉｎｇ）層、合成（Ｍｅｒｇｅｒ）層がある。畳み込み層は、設定されたフィルタのカーネルサイズ、フィルタの数、ストライドの値、ダイレーションの値等のパラメータに従い、入力値群に対して畳み込み処理を行う層である。なお、入力される画像の次元数に応じて、上記フィルタのカーネルサイズの次元数も変更してもよい。ダウンサンプリング層は、入力値群を間引いたり、合成したりすることによって、出力値群の数を入力値群の数よりも少なくする処理である。具体的には、例えば、ＭａｘＰｏｏｌｉｎｇ処理がある。アップサンプリング層は、入力値群を複製したり、入力値群から補間した値を追加したりすることによって、出力値群の数を入力値群の数よりも多くする処理である。具体的には、例えば、線形補間処理がある。合成層は、ある層の出力値群や画像を構成する画素値群といった値群を、複数のソースから入力し、それらを連結したり、加算したりして合成する処理を行う層である。このような構成では、入力画像１３０１を構成する画素値群が畳み込み処理ブロックを経て出力された値群と、入力画像１３０１を構成する画素値群が、合成層で合成される。その後、合成された画素値群は最後の畳み込み層で高画質画像１３０２に成形される。なお、図示はしないが、ＣＮＮの構成の変更例として、例えば、畳み込み層の後にバッチ正規化（ＢａｔｃｈＮｏｒｍａｌｉｚａｔｉｏｎ）層や、正規化線形関数（ＲｅｃｔｉｆｉｅｒＬｉｎｅａｒＵｎｉｔ）を用いた活性化層を組み込む等をしても良い。なお、図８では説明を簡単にするため処理対象画像を２次元画像として説明しているが、本発明はこれに限定されない。例えば、３次元の低画質の断層画像を高画質化部に入力して３次元の高画質の断層画像を出力する場合も本発明に含まれる。 FIG. 8 shows a configuration example of a machine learning model in the image quality improving unit according to the present embodiment. The machine learning model is a convolutional neural network (CNN), and is composed of a plurality of layers responsible for processing input value groups and outputting them. The types of layers included in the above configuration include a convolution layer, a downsampling layer, an upsampling layer, and a composite layer. The convolution layer is a layer that performs convolution processing on the input value group according to parameters such as the kernel size of the set filter, the number of filters, the stride value, and the dilation value. The number of dimensions of the kernel size of the filter may be changed according to the number of dimensions of the input image. The downsampling layer is a process of reducing the number of output value groups to be smaller than the number of input value groups by thinning out or synthesizing the input value groups. Specifically, for example, there is a Max Polling process. The upsampling layer is a process of increasing the number of output value groups to be larger than the number of input value groups by duplicating the input value group or adding the interpolated value from the input value group. Specifically, for example, there is a linear interpolation process. The composite layer is a layer in which a value group such as an output value group of a certain layer or a pixel value group constituting an image is input from a plurality of sources, and the processing is performed by concatenating or adding them. In such a configuration, the value group in which the pixel value group constituting the input image 1301 is output through the convolution processing block and the pixel value group constituting the input image 1301 are combined in the composite layer. After that, the combined pixel value group is formed into a high-quality image 1302 in the final convolution layer. Although not shown, as an example of changing the configuration of the CNN, for example, a batch normalization layer or an activation layer using a rectifier linear unit may be incorporated after the convolution layer. You may. Although the image to be processed is described as a two-dimensional image in FIG. 8 for the sake of simplicity, the present invention is not limited to this. For example, the present invention also includes a case where a three-dimensional low-quality tomographic image is input to the high-quality image-enhancing unit and a three-dimensional high-quality tomographic image is output.

ここで、ＧＰＵは、データをより多く並列処理することで効率的な演算を行うことができる。このため、ディープラーニングのような学習モデルを用いて複数回に渡り学習を行う場合には、ＧＰＵで処理を行うことが有効である。そこで、本実施形態では、学習部（不図示）の一例である画像処理部３０３による処理には、ＣＰＵに加えてＧＰＵを用いる。具体的には、学習モデルを含む学習プログラムを実行する場合に、ＣＰＵとＧＰＵが協働して演算を行うことで学習を行う。なお、学習部の処理は、ＣＰＵまたはＧＰＵのみにより演算が行われても良い。また、高画質化部も、学習部と同様にＧＰＵを用いても良い。また、学習部は、不図示の誤差検出部と更新部とを備えてもよい。誤差検出部は、入力層に入力される入力データに応じてニューラルネットワークの出力層から出力される出力データと、正解データとの誤差を得る。誤差検出部は、損失関数を用いて、ニューラルネットワークからの出力データと正解データとの誤差を計算するようにしてもよい。また、更新部は、誤差検出部で得られた誤差に基づいて、その誤差が小さくなるように、ニューラルネットワークのノード間の結合重み付け係数等を更新する。この更新部は、例えば、誤差逆伝播法を用いて、結合重み付け係数等を更新する。誤差逆伝播法は、上記の誤差が小さくなるように、各ニューラルネットワークのノード間の結合重み付け係数等を調整する手法である。 Here, the GPU can perform efficient calculations by processing more data in parallel. Therefore, when learning is performed a plurality of times using a learning model such as deep learning, it is effective to perform processing on the GPU. Therefore, in the present embodiment, the GPU is used in addition to the CPU for the processing by the image processing unit 303, which is an example of the learning unit (not shown). Specifically, when executing a learning program including a learning model, learning is performed by the CPU and the GPU collaborating to perform calculations. The processing of the learning unit may be performed only by the CPU or GPU. Further, the GPU may be used for the high image quality unit as well as the learning unit. Further, the learning unit may include an error detecting unit and an updating unit (not shown). The error detection unit obtains an error between the output data output from the output layer of the neural network and the correct answer data according to the input data input to the input layer. The error detection unit may use the loss function to calculate the error between the output data from the neural network and the correct answer data. Further, the update unit updates the coupling weighting coefficient between the nodes of the neural network based on the error obtained by the error detection unit so that the error becomes small. This updating unit updates the coupling weighting coefficient and the like by using, for example, the backpropagation method. The error backpropagation method is a method of adjusting the coupling weighting coefficient between the nodes of each neural network so that the above error becomes small.

また、操作者は入力部７００を用いてＯＣＴ計測処理（例えば、後述するような、解析結果生成、診断結果生成、物体認識、セグメンテーション等の種々の計測処理）の開始を指示することができる。例えば、操作者からの指示に応じて、高画質化用の学習済モデルにより高画質化された断層画像を用いて、後述するような種々の計測処理を実行することができるため、これらの計測処理の精度を向上することができる。 In addition, the operator can instruct the start of OCT measurement processing (for example, various measurement processing such as analysis result generation, diagnosis result generation, object recognition, segmentation, etc., as described later) using the input unit 700. For example, in response to an instruction from the operator, various measurement processes as described later can be executed using the tomographic image whose image quality has been improved by the trained model for image quality improvement. The processing accuracy can be improved.

なお、上述した本実施形態において、断層画像に関して説明を行ったが、これに限らない。本実施形態に係る表示、高画質化、及び画像解析等の処理に関する画像は、正面画像やモーションコントラスト画像等であってもよい。さらには、断層画像だけではなく、ＳＬＯ画像、眼底写真、又は蛍光眼底写真など、異なる画像であっても構わない。その場合、高画質化処理を実行するためのユーザーインターフェースは、種類の異なる複数の画像に対して高画質化処理の実行を指示するもの、種類の異なる複数の画像から任意の画像を選択して高画質化処理の実行を指示するものがあってもよい。 In the above-described embodiment, the tomographic image has been described, but the present invention is not limited to this. The image related to processing such as display, high image quality, and image analysis according to the present embodiment may be a front image, a motion contrast image, or the like. Further, not only a tomographic image but also a different image such as an SLO image, a fundus photograph, or a fluorescent fundus photograph may be used. In that case, the user interface for executing the high image quality processing is one that instructs the execution of the high image quality processing for a plurality of different types of images, and an arbitrary image is selected from a plurality of different types of images. There may be something that instructs the execution of the high image quality processing.

このような構成により、本実施形態に係る高画質化部が処理した画像を表示制御部３０５が表示部６００に表示することができる。このとき、上述したように、高画質画像の表示、解析結果の表示、表示される正面画像の深度範囲等に関する複数の条件のうち少なくとも１つが選択された状態である場合には、表示画面が遷移されても、選択された状態が維持されてもよい。 With such a configuration, the display control unit 305 can display the image processed by the image quality improving unit according to the present embodiment on the display unit 600. At this time, as described above, when at least one of a plurality of conditions relating to the display of the high-quality image, the display of the analysis result, the depth range of the displayed front image, and the like is selected, the display screen is displayed. Even if it is transitioned, the selected state may be maintained.

また、上述したように、複数の条件のうち少なくとも１つが選択された状態である場合には、他の条件が選択された状態に変更されても、該少なくとも１つが選択された状態が維持されてもよい。例えば、表示制御部３０５は、解析結果の表示が選択状態である場合に、検者からの指示に応じて（例えば、不図示の高画質化ボタンが指定されると）、低画質画像の解析結果の表示を高画質画像の解析結果の表示に変更してもよい。また、表示制御部３０５は、解析結果の表示が選択状態である場合に、検者からの指示に応じて（例えば、不図示の高画質化ボタンの指定が解除されると）、高画質画像の解析結果の表示を低画質画像の解析結果の表示に変更してもよい。 Further, as described above, when at least one of the plurality of conditions is in the selected state, the selected state of at least one is maintained even if the other conditions are changed to the selected state. You may. For example, the display control unit 305 analyzes a low-quality image in response to an instruction from the examiner (for example, when a high-quality button (not shown) is specified) when the display of the analysis result is in the selected state. The display of the result may be changed to the display of the analysis result of the high-quality image. Further, the display control unit 305 responds to an instruction from the examiner (for example, when the designation of the high image quality button (not shown) is canceled) when the display of the analysis result is in the selected state, and the display control unit 305 produces a high image quality image. The display of the analysis result of is changed to the display of the analysis result of the low-quality image.

また、表示制御部３０５は、高画質画像の表示が非選択状態である場合に、検者からの指示に応じて（例えば、解析結果の表示の指定が解除されると）、低画質画像の解析結果の表示を低画質画像の表示に変更してもよい。また、表示制御部３０５は、高画質画像の表示が非選択状態である場合に、検者からの指示に応じて（例えば、解析結果の表示が指定されると）、低画質画像の表示を低画質画像の解析結果の表示に変更してもよい。また、表示制御部３０５は、高画質画像の表示が選択状態である場合に、検者からの指示に応じて（例えば、解析結果の表示の指定が解除されると）、高画質画像の解析結果の表示を高画質画像の表示に変更してもよい。また、表示制御部３０５は、高画質画像の表示が選択状態である場合に、検者からの指示に応じて（例えば、解析結果の表示が指定されると）、高画質画像の表示を高画質画像の解析結果の表示に変更してもよい。 Further, the display control unit 305 receives an instruction from the examiner (for example, when the designation of displaying the analysis result is canceled) when the display of the high-quality image is not selected, and the display control unit 305 determines the display of the low-quality image. The display of the analysis result may be changed to the display of a low-quality image. Further, the display control unit 305 displays the low-quality image in response to an instruction from the examiner (for example, when the display of the analysis result is specified) when the display of the high-quality image is not selected. The display may be changed to display the analysis result of a low-quality image. Further, the display control unit 305 analyzes the high-quality image in response to an instruction from the examiner (for example, when the designation of displaying the analysis result is canceled) when the display of the high-quality image is in the selected state. The display of the result may be changed to the display of a high-quality image. Further, when the display of the high-quality image is selected, the display control unit 305 increases the display of the high-quality image according to the instruction from the examiner (for example, when the display of the analysis result is specified). You may change to display the analysis result of the image quality image.

また、高画質画像の表示が非選択状態で且つ第１の種類の解析結果の表示が選択状態である場合を考える。この場合には、表示制御部３０５は、検者からの指示に応じて（例えば、第２の種類の解析結果の表示が指定されると）、低画質画像の第１の種類の解析結果の表示を低画質画像の第２の種類の解析結果の表示に変更してもよい。また、高画質画像の表示が選択状態で且つ第１の種類の解析結果の表示が選択状態である場合を考える。この場合には、表示制御部３０５は、検者からの指示に応じて（例えば、第２の種類の解析結果の表示が指定されると）、高画質画像の第１の種類の解析結果の表示を高画質画像の第２の種類の解析結果の表示に変更してもよい。 Further, consider the case where the display of the high-quality image is in the non-selected state and the display of the analysis result of the first type is in the selected state. In this case, the display control unit 305 receives the instruction from the examiner (for example, when the display of the analysis result of the second type is specified), and the display control unit 305 determines the analysis result of the first type of the low-quality image. The display may be changed to display the analysis result of the second type of the low image quality image. Further, consider the case where the display of the high-quality image is in the selected state and the display of the analysis result of the first type is in the selected state. In this case, the display control unit 305 receives the instruction from the examiner (for example, when the display of the second type of analysis result is specified), and the display control unit 305 determines the analysis result of the first type of the high-quality image. The display may be changed to display the analysis result of the second type of the high-quality image.

なお、経過観察用の表示画面においては、上述したように、これらの表示の変更が、異なる日時で得た複数の画像に対して一括で反映されるように構成してもよい。ここで、解析結果の表示は、解析結果を任意の透明度により画像に重畳表示させたものであってもよい。このとき、解析結果の表示への変更は、例えば、表示されている画像に対して任意の透明度により解析結果を重畳させた状態に変更したものであってもよい。また、解析結果の表示への変更は、例えば、解析結果と画像とを任意の透明度によりブレンド処理して得た画像（例えば、２次元マップ）の表示への変更であってもよい。 As described above, the display screen for follow-up observation may be configured so that these display changes are collectively reflected on a plurality of images obtained at different dates and times. Here, the analysis result may be displayed by superimposing the analysis result on the image with arbitrary transparency. At this time, the change to the display of the analysis result may be changed to a state in which the analysis result is superimposed on the displayed image with arbitrary transparency, for example. Further, the change to the display of the analysis result may be, for example, a change to the display of an image (for example, a two-dimensional map) obtained by blending the analysis result and the image with an arbitrary transparency.

また、上述した実施形態において、表示制御部３０５は、高画質化部によって生成された高画質画像と入力画像のうち、検者からの指示に応じて選択された画像を表示部６００に表示させることができる。また、表示制御部３０５は、検者からの指示に応じて、表示部６００上の表示を撮影画像（入力画像）から高画質画像に切り替えてもよい。すなわち、表示制御部３０５は、検者からの指示に応じて、低画質画像の表示を高画質画像の表示に変更してもよい。また、表示制御部３０５は、検者からの指示に応じて、高画質画像の表示を低画質画像の表示に変更してもよい。 Further, in the above-described embodiment, the display control unit 305 causes the display unit 600 to display an image selected according to an instruction from the examiner among the high-quality image and the input image generated by the high-quality image unit. be able to. Further, the display control unit 305 may switch the display on the display unit 600 from a captured image (input image) to a high-quality image in response to an instruction from the examiner. That is, the display control unit 305 may change the display of the low-quality image to the display of the high-quality image in response to an instruction from the examiner. Further, the display control unit 305 may change the display of the high-quality image to the display of the low-quality image in response to an instruction from the examiner.

さらに、高画質化部が、高画質化エンジン（高画質化用の学習済モデル）による高画質化処理の開始（高画質化エンジンへの画像の入力）を検者からの指示に応じて実行し、表示制御部３０５が、高画質化部によって生成された高画質画像を表示部６００に表示させてもよい。これに対し、撮影装置（断層画像撮影装置２００）によって入力画像が撮影されると、高画質化エンジンが自動的に入力画像に基づいて高画質画像を生成し、表示制御部３０５が、検者からの指示に応じて高画質画像を表示部６００に表示させてもよい。ここで、高画質化エンジンとは、上述した画質向上処理（高画質化処理）を行う学習済モデルを含む。 Furthermore, the high image quality section starts the high image quality processing (input of the image to the high image quality engine) by the high image quality engine (learned model for high image quality) according to the instruction from the examiner. Then, the display control unit 305 may display the high-quality image generated by the high-quality image-enhancing unit on the display unit 600. On the other hand, when the input image is captured by the imaging device (tomographic image capturing device 200), the high image quality engine automatically generates a high image quality image based on the input image, and the display control unit 305 causes the examiner to perform the image quality. A high-quality image may be displayed on the display unit 600 in response to an instruction from. Here, the high image quality engine includes a trained model that performs the above-mentioned image quality improvement processing (high image quality processing).

なお、これらの処理は解析結果の出力についても同様に行うことができる。すなわち、表示制御部３０５は、検者からの指示に応じて、低画質画像の解析結果の表示を高画質画像の解析結果の表示に変更してもよい。また、表示制御部３０５は、検者からの指示に応じて、高画質画像の解析結果の表示を低画質画像の解析結果の表示に変更してもよい。もちろん、表示制御部３０５は、検者からの指示に応じて、低画質画像の解析結果の表示を低画質画像の表示に変更してもよい。また、表示制御部３０５は、検者からの指示に応じて、低画質画像の表示を低画質画像の解析結果の表示に変更してもよい。また、表示制御部３０５は、検者からの指示に応じて、高画質画像の解析結果の表示を高画質画像の表示に変更してもよい。また、表示制御部３０５は、検者からの指示に応じて、高画質画像の表示を高画質画像の解析結果の表示に変更してもよい。 Note that these processes can be performed in the same manner for the output of the analysis result. That is, the display control unit 305 may change the display of the analysis result of the low-quality image to the display of the analysis result of the high-quality image in response to the instruction from the examiner. Further, the display control unit 305 may change the display of the analysis result of the high-quality image to the display of the analysis result of the low-quality image in response to an instruction from the examiner. Of course, the display control unit 305 may change the display of the analysis result of the low-quality image to the display of the low-quality image in response to the instruction from the examiner. Further, the display control unit 305 may change the display of the low-quality image to the display of the analysis result of the low-quality image in response to an instruction from the examiner. Further, the display control unit 305 may change the display of the analysis result of the high-quality image to the display of the high-quality image in response to an instruction from the examiner. Further, the display control unit 305 may change the display of the high-quality image to the display of the analysis result of the high-quality image in response to an instruction from the examiner.

また、表示制御部３０５は、検者からの指示に応じて、低画質画像の解析結果の表示を低画質画像の他の種類の解析結果の表示に変更してもよい。また、表示制御部３０５は、検者からの指示に応じて、高画質画像の解析結果の表示を高画質画像の他の種類の解析結果の表示に変更してもよい。 Further, the display control unit 305 may change the display of the analysis result of the low-quality image to the display of the analysis result of another type of the low-quality image in response to the instruction from the examiner. Further, the display control unit 305 may change the display of the analysis result of the high-quality image to the display of the analysis result of another type of the high-quality image in response to the instruction from the examiner.

ここで、高画質画像の解析結果の表示は、高画質画像の解析結果を任意の透明度により高画質画像に重畳表示させたものであってもよい。また、低画質画像の解析結果の表示は、低画質画像の解析結果を任意の透明度により低画質画像に重畳表示させたものであってもよい。このとき、解析結果の表示への変更は、例えば、表示されている画像に対して任意の透明度により解析結果を重畳させた状態に変更したものであってもよい。また、解析結果の表示への変更は、例えば、解析結果と画像とを任意の透明度によりブレンド処理して得た画像（例えば、２次元マップ）の表示への変更であってもよい。 Here, the display of the analysis result of the high-quality image may be a superposition display of the analysis result of the high-quality image on the high-quality image with arbitrary transparency. Further, the display of the analysis result of the low-quality image may be obtained by superimposing the analysis result of the low-quality image on the low-quality image with arbitrary transparency. At this time, the change to the display of the analysis result may be changed to a state in which the analysis result is superimposed on the displayed image with arbitrary transparency, for example. Further, the change to the display of the analysis result may be, for example, a change to the display of an image (for example, a two-dimensional map) obtained by blending the analysis result and the image with an arbitrary transparency.

（変形例１）
上述した様々な実施形態において、ノイズの一例である黒点の位置の検出または予測は、機械学習による物体認識用の機械学習モデルやセグメンテーション用の機械学習モデルを用いて行われてもよい。このとき、撮影して得た医用画像から、被写体の医用画像を含む学習データを学習して得た学習済モデルを用いて、ノイズの位置が検出または予測される。ここで、本変形例に係る機械学習モデルとは、医用画像の一例である断層画像である入力データと、該断層画像におけるノイズの位置がラベル付け（アノテーション）されたデータ（正解データ）のペア群で構成された学習データを用いて機械学習を行うことにより生成した関数のことを指す。なお、機械学習モデルや学習部の処理等については、図８等のような上述した種々の手法が適用可能である。また、後述の敵対的生成ネットワーク又はオートエンコーダを用いた異常部位の検出の手法により、上述のノイズの位置の検出または予測を行ってもよい。 (Modification example 1)
In the various embodiments described above, the detection or prediction of the position of the black spot, which is an example of noise, may be performed using a machine learning model for object recognition by machine learning or a machine learning model for segmentation. At this time, the position of noise is detected or predicted by using the learned model obtained by learning the learning data including the medical image of the subject from the medical image obtained by photographing. Here, the machine learning model according to this modified example is a pair of input data which is a tomographic image which is an example of a medical image and data (correct answer data) in which the position of noise in the tomographic image is labeled (annotated). It refers to a function generated by performing machine learning using training data composed of groups. The above-mentioned various methods as shown in FIG. 8 and the like can be applied to the machine learning model, the processing of the learning unit, and the like. In addition, the above-mentioned noise position may be detected or predicted by a method of detecting an abnormal portion using a hostile generation network or an autoencoder described later.

（変形例２）
上述した様々な実施形態及び変形例における表示制御部３０５は、表示画面のレポート画面において、所望の層の層厚や各種の血管密度等の解析結果を表示させてもよい。また、視神経乳頭部、黄斑部、血管領域、神経線維束、硝子体領域、黄斑領域、脈絡膜領域、強膜領域、篩状板領域、網膜層境界、網膜層境界端部、視細胞、血球、血管壁、血管内壁境界、血管外側境界、神経節細胞、角膜領域、隅角領域、シュレム管等の少なくとも１つを含む注目部位に関するパラメータの値（分布）を解析結果として表示させてもよい。このとき、例えば、各種のアーチファクトの低減処理が適用された医用画像を解析することで、精度の良い解析結果を表示させることができる。なお、アーチファクトは、例えば、血管領域等による光吸収により生じる偽像領域や、プロジェクションアーチファクト、被検眼の状態（動きや瞬き等）によって測定光の主走査方向に生じる正面画像における帯状のアーチファクト等であってもよい。また、アーチファクトは、例えば、被検者の所定部位の医用画像上に撮影毎にランダムに生じるような写損領域であれば、何でもよい。また、上述したような様々なアーチファクト（写損領域）の少なくとも１つを含む領域に関するパラメータの値（分布）を解析結果として表示させてもよい。また、ドルーゼン、新生血管、白斑（硬性白斑）、シュードドルーゼン等の異常部位等の少なくとも１つを含む領域に関するパラメータの値（分布）を解析結果として表示させてもよい。 (Modification 2)
The display control unit 305 in the various embodiments and modifications described above may display analysis results such as a desired layer thickness and various blood vessel densities on the report screen of the display screen. In addition, the optic nerve head, macular region, vascular region, nerve fiber bundle, vitreous region, macular region, choroidal region, scleral region, lamina cribrosa region, retinal layer boundary, retinal layer boundary edge, photoreceptor cells, blood cells, The value (distribution) of the parameter relating to the site of interest including at least one such as the vascular wall, the vascular inner wall boundary, the vascular lateral boundary, the ganglion cell, the corneal region, the corner region, and Schlemm's canal may be displayed as the analysis result. At this time, for example, by analyzing a medical image to which various artifact reduction processes are applied, it is possible to display an accurate analysis result. The artifact is, for example, a false image region generated by light absorption by a blood vessel region or the like, a projection artifact, a band-shaped artifact in a front image generated in the main scanning direction of the measured light depending on the state of the eye to be inspected (movement, blinking, etc.), or the like. There may be. Further, the artifact may be any image loss region as long as it is randomly generated for each image taken on a medical image of a predetermined portion of the subject, for example. In addition, the value (distribution) of the parameter relating to the region including at least one of the various artifacts (copy loss region) as described above may be displayed as the analysis result. Further, the value (distribution) of the parameter relating to the region including at least one such as an abnormal site such as drusen, new blood vessel, vitiligo (hard vitiligo), and pseudo-drusen may be displayed as an analysis result.

また、解析結果は、解析マップや、各分割領域に対応する統計値を示すセクタ等で表示されてもよい。なお、解析結果は、医用画像の解析結果を学習データとして学習して得た学習済モデル（解析結果生成エンジン、解析結果生成用の学習済モデル）を用いて生成されたものであってもよい。このとき、学習済モデルは、医用画像とその医用画像の解析結果とを含む学習データや、医用画像とその医用画像とは異なる種類の医用画像の解析結果とを含む学習データ等を用いた学習により得たものであってもよい。また、学習済モデルは、輝度正面画像及びモーションコントラスト正面画像のように、所定部位の異なる種類の複数の医用画像をセットとする入力データを含む学習データを用いた学習により得たものであってもよい。ここで、輝度正面画像は断層画像のＥｎ−Ｆａｃｅ画像に対応し、モーションコントラスト正面画像はＯＣＴＡのＥｎ−Ｆａｃｅ画像に対応する。また、高画質化用の学習済モデルにより生成された高画質画像を用いて得た解析結果が表示されるように構成されてもよい。なお、高画質化用の学習済モデルは、第一の画像を入力データとし、第一の画像よりも高画質な第二の画像を正解データとする学習データを学習して得たものであってもよい。このとき、第二の画像は、例えば、複数の第一の画像の重ね合わせ処理（例えば、位置合わせして得た複数の第一の画像の平均化処理）等によって、高コントラスト化やノイズ低減等が行われたような高画質な画像であってもよい。 Further, the analysis result may be displayed as an analysis map, a sector showing statistical values corresponding to each divided area, or the like. The analysis result may be generated by using a trained model (analysis result generation engine, trained model for analysis result generation) obtained by learning the analysis result of the medical image as training data. .. At this time, the trained model is trained using training data including a medical image and an analysis result of the medical image, training data including a medical image and an analysis result of a medical image of a type different from the medical image, and the like. It may be obtained by. Further, the trained model is obtained by training using training data including input data in which a plurality of medical images of different types of predetermined parts are set, such as a luminance front image and a motion contrast front image. May be good. Here, the luminance front image corresponds to the En-Face image of the tomographic image, and the motion contrast front image corresponds to the En-Face image of OCTA. Further, the analysis result obtained by using the high-quality image generated by the trained model for high image quality may be displayed. The trained model for high image quality is obtained by learning training data using the first image as input data and the second image having higher image quality than the first image as correct answer data. You may. At this time, the second image is subjected to, for example, a superposition process of a plurality of first images (for example, an averaging process of a plurality of first images obtained by alignment) to increase the contrast and reduce noise. It may be a high-quality image in which the above is performed.

また、学習データに含まれる入力データとしては、高画質化用の学習済モデルにより生成された高画質画像であってもよいし、低画質画像と高画質画像とのセットであってもよい。また、学習データは、例えば、解析領域を解析して得た解析値（例えば、平均値や中央値等）、解析値を含む表、解析マップ、画像におけるセクタ等の解析領域の位置等の少なくとも１つを含む情報を（教師あり学習の）正解データとして、入力データにラベル付け（アノテーション）したデータであってもよい。なお、検者からの指示に応じて、解析結果生成用の学習済モデルにより得た解析結果が表示されるように構成されてもよい。例えば、画像処理部３０３は、（高画質化用の学習済モデルとは異なる）解析結果生成用の学習済モデルを用いて、上述のノイズ低減画像から、該ノイズ低減画像に関連する解析結果を生成することができる。また、例えば、表示制御部３０５は、被写体の医用画像を含む学習データを学習して得た解析結果生成用の学習済モデルを用いて上述のノイズ低減画像から生成された該ノイズ低減画像に関連する解析結果を表示部６００に表示させることができる。 Further, the input data included in the training data may be a high-quality image generated by the trained model for high image quality, or may be a set of a low-quality image and a high-quality image. Further, the training data includes, for example, at least analysis values (for example, average value, median value, etc.) obtained by analyzing the analysis area, a table including the analysis values, an analysis map, the position of the analysis area such as a sector in the image, and the like. The information including one may be the data labeled (annotated) with the input data as the correct answer data (for supervised learning). In addition, according to the instruction from the examiner, the analysis result obtained by the trained model for generating the analysis result may be displayed. For example, the image processing unit 303 uses the trained model for generating the analysis result (different from the trained model for improving the image quality) to obtain the analysis result related to the noise-reduced image from the above-mentioned noise-reduced image. Can be generated. Further, for example, the display control unit 305 is related to the noise reduction image generated from the noise reduction image described above by using the learned model for generating the analysis result obtained by learning the learning data including the medical image of the subject. The analysis result to be performed can be displayed on the display unit 600.

また、上述した様々な実施形態及び変形例における表示制御部３０５は、表示画面のレポート画面において、緑内障や加齢黄斑変性等の種々の診断結果を表示させてもよい。このとき、例えば、上述したような各種のアーチファクトの低減処理が適用された医用画像を解析することで、精度の良い診断結果を表示させることができる。また、診断結果としては、特定された異常部位等の位置が画像上に表示されてもよいし、また、異常部位の状態等が文字等によって表示されてもよい。また、異常部位等の分類結果（例えば、Ｃｕｒｔｉｎ分類）が診断結果として表示されてもよい。また、分類結果としては、例えば、異常部位毎の確からしさを示す情報（例えば、割合を示す数値）が表示されてもよい。また、医師が診断を確定させる上で必要な情報が診断結果として表示されてもよい。上記必要な情報としては、例えば、追加撮影等のアドバイスが考えられる。例えば、ＯＣＴＡ画像における血管領域に異常部位が検出された場合には、ＯＣＴＡよりも詳細に血管を観察可能な造影剤を用いた蛍光撮影を追加で行う旨が表示されてもよい。 In addition, the display control unit 305 in the various embodiments and modifications described above may display various diagnostic results such as glaucoma and age-related macular degeneration on the report screen of the display screen. At this time, for example, by analyzing a medical image to which various artifact reduction processes as described above are applied, it is possible to display an accurate diagnostic result. Further, as the diagnosis result, the position of the specified abnormal part or the like may be displayed on the image, or the state of the abnormal part or the like may be displayed by characters or the like. In addition, the classification result of the abnormal part or the like (for example, Curtin classification) may be displayed as the diagnosis result. Further, as the classification result, for example, information indicating the certainty of each abnormal part (for example, a numerical value indicating a ratio) may be displayed. In addition, information necessary for the doctor to confirm the diagnosis may be displayed as a diagnosis result. As the necessary information, for example, advice such as additional shooting can be considered. For example, when an abnormal site is detected in the blood vessel region in the OCTA image, it may be displayed that fluorescence imaging using a contrast medium capable of observing the blood vessel in more detail than OCTA is performed.

なお、診断結果は、医用画像の診断結果を学習データとして学習して得た学習済モデル（診断結果生成エンジン、診断結果生成用の学習済モデル）を用いて生成されたものであってもよい。また、学習済モデルは、医用画像とその医用画像の診断結果とを含む学習データや、医用画像とその医用画像とは異なる種類の医用画像の診断結果とを含む学習データ等を用いた学習により得たものであってもよい。また、高画質化用の学習済モデルにより生成された高画質画像を用いて得た診断結果が表示されるように構成されてもよい。 The diagnosis result may be generated by using a trained model (diagnosis result generation engine, trained model for generation of diagnosis result) obtained by learning the diagnosis result of the medical image as training data. .. In addition, the trained model is based on training using training data including a medical image and a diagnosis result of the medical image, and training data including a medical image and a diagnosis result of a medical image of a type different from the medical image. It may be obtained. Further, the diagnostic result obtained by using the high-quality image generated by the trained model for high image quality may be displayed.

また、学習データに含まれる入力データとしては、高画質化用の学習済モデルにより生成された高画質画像であってもよいし、低画質画像と高画質画像とのセットであってもよい。また、学習データは、例えば、診断名、病変（異常部位）の種類や状態（程度）、画像における病変の位置、注目領域に対する病変の位置、所見（読影所見等）、診断名の根拠（肯定的な医用支援情報等）、診断名を否定する根拠（否定的な医用支援情報）等の少なくとも１つを含む情報を（教師あり学習の）正解データとして、入力データにラベル付け（アノテーション）したデータであってもよい。なお、検者からの指示に応じて、診断結果生成用の学習済モデルにより得た診断結果が表示されるように構成されてもよい。例えば、画像処理部３０３は、（高画質化用の学習済モデルとは異なる）診断結果生成用の学習済モデルを用いて、上述のノイズ低減画像から、該ノイズ低減画像に関連する診断結果を生成することができる。また、例えば、表示制御部３０５は、被写体の医用画像を含む学習データを学習して得た診断結果生成用の学習済モデルを用いて上述のノイズ低減画像から生成された該ノイズ低減画像に関連する診断結果を表示部６００に表示させることができる。 Further, the input data included in the training data may be a high-quality image generated by the trained model for high image quality, or may be a set of a low-quality image and a high-quality image. In addition, the learning data includes, for example, the diagnosis name, the type and state (degree) of the lesion (abnormal site), the position of the lesion in the image, the position of the lesion with respect to the region of interest, the findings (interpretation findings, etc.), and the basis of the diagnosis name (affirmation). Information including at least one such as (general medical support information, etc.) and grounds for denying the diagnosis name (negative medical support information), etc. are labeled (annotated) in the input data as correct answer data (for supervised learning). It may be data. In addition, according to the instruction from the examiner, the diagnosis result obtained by the trained model for generating the diagnosis result may be displayed. For example, the image processing unit 303 uses a trained model for generating a diagnostic result (different from the trained model for improving image quality) to obtain a diagnostic result related to the noise-reduced image from the above-mentioned noise-reduced image. Can be generated. Further, for example, the display control unit 305 is related to the noise reduction image generated from the noise reduction image described above by using the learned model for generating the diagnosis result obtained by learning the learning data including the medical image of the subject. The diagnosis result to be performed can be displayed on the display unit 600.

また、上述した様々な実施形態及び変形例における表示制御部３０５は、表示画面のレポート画面において、上述したような注目部位、アーチファクト、異常部位等の物体認識結果（物体検出結果）やセグメンテーション結果を表示させてもよい。このとき、例えば、画像上の物体の周辺に矩形の枠等を重畳して表示させてもよい。また、例えば、画像における物体上に色等を重畳して表示させてもよい。なお、物体認識結果やセグメンテーション結果は、物体認識やセグメンテーションを示す情報を正解データとして医用画像にラベル付け（アノテーション）した学習データを学習して得た学習済モデル（物体認識エンジン、物体認識用の学習済モデル、セグメンテーションエンジン、セグメンテーション用の学習済モデル）を用いて生成されたものであってもよい。なお、上述した解析結果生成や診断結果生成は、上述した物体認識結果やセグメンテーション結果を利用することで得られたものであってもよい。例えば、物体認識やセグメンテーションの処理により得た注目部位等の部分領域に対して解析結果生成や診断結果生成の処理を行ってもよい。例えば、画像処理部３０３は、（高画質化用の学習済モデルとは異なる）物体認識用の学習済モデルを用いて、上述のノイズ低減画像から、該ノイズ低減画像に関連する物体認識結果を生成することができる。また、例えば、画像処理部３０３は、（高画質化用の学習済モデルとは異なる）セグメンテーション用の学習済モデルを用いて、上述のノイズ低減画像から、該ノイズ低減画像に関連するセグメンテーション結果を生成することができる。また、例えば、表示制御部３０５は、被写体の医用画像を含む学習データを学習して得た物体認識用の学習済モデルを用いて上述のノイズ低減画像から生成された物体認識結果を表示部６００に表示させることができる。また、例えば、表示制御部３０５は、被写体の医用画像を含む学習データを学習して得たセグメンテーション用の学習済モデルを用いて上述のノイズ低減画像から生成されたセグメンテーション結果を表示部６００に表示させることができる。このとき、物体認識結果やセグメンテーション結果は、例えば、ノイズ低減画像から検出された部分領域である。 Further, the display control unit 305 in the various embodiments and modifications described above displays the object recognition result (object detection result) and the segmentation result of the above-mentioned attention portion, artifact, abnormal portion, etc. on the report screen of the display screen. It may be displayed. At this time, for example, a rectangular frame or the like may be superimposed and displayed around the object on the image. Further, for example, colors and the like may be superimposed and displayed on the object in the image. The object recognition result and the segmentation result are learned models (object recognition engine, for object recognition) obtained by learning the learning data obtained by labeling (annotating) the medical image with the information indicating the object recognition and the segmentation as the correct answer data. It may be generated using a trained model, a segmentation engine, a trained model for segmentation). The above-mentioned analysis result generation and diagnosis result generation may be obtained by using the above-mentioned object recognition result and segmentation result. For example, analysis result generation and diagnosis result generation processing may be performed on a partial region such as a region of interest obtained by object recognition or segmentation processing. For example, the image processing unit 303 uses a trained model for object recognition (different from the trained model for improving image quality) to obtain an object recognition result related to the noise-reduced image from the above-mentioned noise-reduced image. Can be generated. Further, for example, the image processing unit 303 uses a trained model for segmentation (different from the trained model for high image quality) to obtain a segmentation result related to the noise-reduced image from the above-mentioned noise-reduced image. Can be generated. Further, for example, the display control unit 305 displays the object recognition result generated from the above-mentioned noise reduction image by using the learned model for object recognition obtained by learning the learning data including the medical image of the subject. Can be displayed on. Further, for example, the display control unit 305 displays the segmentation result generated from the above-mentioned noise reduction image on the display unit 600 by using the learned model for segmentation obtained by learning the learning data including the medical image of the subject. Can be made to. At this time, the object recognition result and the segmentation result are, for example, partial regions detected from the noise reduction image.

また、異常部位を検出する場合には、敵対的生成ネットワーク（ＧＡＮ：ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｋｓ）や変分オートエンコーダ―（ＶＡＥ：Ｖａｒｉａｔｉｏｎａｌａｕｔｏ−ｅｎｃｏｄｅｒ）を用いてもよい。例えば、断層画像の生成を学習して得た生成器と、生成器が生成した新たな断層画像と本物の眼底正面画像との識別を学習して得た識別器とからなるＤＣＧＡＮ（ＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＧＡＮ）を機械学習モデルとして用いることができる。 In addition, when detecting an abnormal site, a hostile generative network (GAN: Generative Adversarial Networks) or a variational autoencoder (VAE: Variational Auto-Encoder) may be used. For example, a DCGAN (Deep Convolutional GAN) consisting of a generator obtained by learning the generation of a tomographic image and a discriminator obtained by learning the discrimination between a new tomographic image generated by the generator and a real frontal image of the fundus of the eye. ) Can be used as a machine learning model.

ＤＣＧＡＮを用いる場合には、例えば、識別器が入力された断層画像をエンコードすることで潜在変数にし、生成器が潜在変数に基づいて新たな断層画像を生成する。その後、入力された断層画像と生成された新たな断層画像との差分を異常部位として抽出（検出）することができる。また、ＶＡＥを用いる場合には、例えば、入力された断層画像をエンコーダーによりエンコードすることで潜在変数にし、潜在変数をデコーダーによりデコードすることで新たな断層画像を生成する。その後、入力された断層画像と生成された新たな断層画像像との差分を異常部位として抽出（検出）することができる。なお、入力データの例として断層画像を例として説明したが、眼底画像や前眼の正面画像等を用いてもよい。 When DCGAN is used, for example, the discriminator encodes the input tomographic image into a latent variable, and the generator generates a new tomographic image based on the latent variable. After that, the difference between the input tomographic image and the generated new tomographic image can be extracted (detected) as an abnormal part. When VAE is used, for example, the input tomographic image is encoded by an encoder to be a latent variable, and the latent variable is decoded by a decoder to generate a new tomographic image. After that, the difference between the input tomographic image and the generated new tomographic image can be extracted (detected) as an abnormal part. Although a tomographic image has been described as an example of the input data, a fundus image, a frontal image of the anterior eye, or the like may be used.

さらに、情報処理装置は、畳み込みオートエンコーダ（ＣＡＥ：ＣｏｎｖｏｌｕｔｉｏｎａｌＡｕｔｏ−Ｅｎｃｏｄｅｒ）を用いて、異常部位を検出してもよい。ＣＡＥを用いる場合には、学習時に入力データ及び出力データとして同じ画像を学習させる。これにより、推定時に異常部位がある画像をＣＡＥに入力すると、学習の傾向に従って異常部位がない画像が出力される。その後、ＣＡＥに入力された画像とＣＡＥから出力された画像の差分を異常部位として抽出（検出）することができる。なお、この場合にも、断層画像だけでなく、眼底画像や前眼の正面画像等を入力データとして用いてもよい。 Further, the information processing apparatus may detect an abnormal portion by using a convolutional autoencoder (CAE). When CAE is used, the same image is learned as input data and output data at the time of learning. As a result, when an image with an abnormal part is input to CAE at the time of estimation, an image without an abnormal part is output according to the learning tendency. After that, the difference between the image input to the CAE and the image output from the CAE can be extracted (detected) as an abnormal portion. In this case as well, not only the tomographic image but also the fundus image, the frontal image of the anterior eye, and the like may be used as input data.

これらの場合、情報処理装置は、敵対的生成ネットワーク又はオートエンコーダを用いて得た医用画像と、該敵対的生成ネットワーク又はオートエンコーダに入力された医用画像との差に関する情報を異常部位に関する情報として生成することができる。これにより、情報処理装置は、高速に精度よく異常部位を検出することが期待できる。ここで、オートエンコーダには、ＶＡＥやＣＡＥ等が含まれる。例えば、画像処理部３０３は、上述のノイズ低減画像から敵対的生成ネットワーク又はオートエンコーダを用いて得た医用画像と該ノイズ低減画像との差に関する情報を、異常部位に関する情報として生成することができる。また、例えば、表示制御部３０５は、上述のノイズ低減画像から敵対的生成ネットワーク又はオートエンコーダを用いて得た医用画像と該ノイズ低減画像との差に関する情報を、異常部位に関する情報として表示部６００に表示させることができる。 In these cases, the information processing apparatus uses information on the difference between the medical image obtained by using the hostile generation network or the autoencoder and the medical image input to the hostile generation network or the autoencoder as information on the abnormal part. Can be generated. As a result, the information processing apparatus can be expected to detect the abnormal portion at high speed and with high accuracy. Here, the autoencoder includes VAE, CAE, and the like. For example, the image processing unit 303 can generate information on the difference between the medical image obtained by using the hostile generation network or the autoencoder from the above-mentioned noise reduction image and the noise reduction image as information on the abnormal portion. .. Further, for example, the display control unit 305 uses the information on the difference between the medical image obtained from the above-mentioned noise reduction image by using the hostile generation network or the autoencoder and the noise reduction image as the information on the abnormal portion 600. Can be displayed on.

また、疾病眼では、疾病の種類に応じて画像特徴が異なる。そのため、上述した様々な実施形態や変形例において用いられる学習済モデルは、疾病の種類毎又は異常部位毎にそれぞれ生成・用意されてもよい。この場合には、例えば、画像処理装置３００は、操作者からの被検眼の疾病の種類や異常部位等の入力（指示）に応じて、処理に用いる学習済モデルを選択することができる。なお、疾病の種類や異常部位毎に用意される学習済モデルは、網膜層の検出や領域ラベル画像等の生成に用いられる学習済モデルに限られず、例えば、画像の評価用のエンジンや解析用のエンジン等で用いられる学習済モデルであってもよい。このとき、画像処理装置３００は、別に用意された学習済モデルを用いて、画像から被検眼の疾病の種類や異常部位を識別してもよい。この場合には、画像処理装置３００は、当該別に用意された学習済モデルを用いて識別された疾病の種類や異常部位に基づいて、上記処理に用いる学習済モデルを自動的に選択することができる。なお、当該被検眼の疾病の種類や異常部位を識別するための学習済モデルは、断層画像や眼底画像等を入力データとし、疾病の種類やこれら画像における異常部位を出力データとした学習データのペアを用いて学習を行ってよい。ここで、学習データの入力データとしては、断層画像や眼底画像等を単独で入力データとしてもよいし、これらの組み合わせを入力データとしてもよい。 Further, in the diseased eye, the image features differ depending on the type of disease. Therefore, the trained models used in the various embodiments and modifications described above may be generated and prepared for each type of disease or for each abnormal site. In this case, for example, the image processing device 300 can select a trained model to be used for processing according to an input (instruction) of the type of disease of the eye to be inspected, an abnormal part, or the like from the operator. The trained model prepared for each type of disease or abnormal site is not limited to the trained model used for detecting the retinal layer and generating a region label image, for example, for an engine for image evaluation or for analysis. It may be a trained model used in the engine of the above. At this time, the image processing device 300 may identify the type of disease or abnormal site of the eye to be inspected from the image by using a separately prepared trained model. In this case, the image processing device 300 may automatically select the trained model to be used for the above processing based on the type of disease or abnormal site identified by using the trained model prepared separately. it can. The trained model for identifying the disease type and abnormal site of the eye to be inspected is the training data in which the tomographic image, the fundus image, etc. are input data, and the disease type and the abnormal site in these images are output data. Learning may be performed using pairs. Here, as the input data of the training data, a tomographic image, a fundus image, or the like may be used alone as input data, or a combination thereof may be used as input data.

また、特に診断結果生成用の学習済モデルは、被検者の所定部位の異なる種類の複数の医用画像をセットとする入力データを含む学習データにより学習して得た学習済モデルであってもよい。このとき、学習データに含まれる入力データとして、例えば、眼底のモーションコントラスト正面画像及び輝度正面画像（あるいは輝度断層画像）をセットとする入力データが考えられる。また、学習データに含まれる入力データとして、例えば、眼底の断層画像（Ｂスキャン画像）及びカラー眼底画像（あるいは蛍光眼底画像）をセットとする入力データ等も考えられる。また、異なる種類の複数の医療画像は、異なるモダリティ、異なる光学系、又は異なる原理等により取得されたものであれば何でもよい。 Further, in particular, the trained model for generating the diagnosis result may be a trained model obtained by learning from the training data including the input data including a set of a plurality of medical images of different types of the predetermined part of the subject. Good. At this time, as the input data included in the training data, for example, input data in which a motion contrast front image of the fundus and a luminance front image (or a luminance tom image) are set can be considered. Further, as the input data included in the training data, for example, input data in which a tomographic image (B scan image) of the fundus and a color fundus image (or a fluorescent fundus image) are set can be considered. Further, the plurality of medical images of different types may be any image as long as they are acquired by different modality, different optical systems, different principles, or the like.

また、特に診断結果生成用の学習済モデルは、被検者の異なる部位の複数の医用画像をセットとする入力データを含む学習データにより学習して得た学習済モデルであってもよい。このとき、学習データに含まれる入力データとして、例えば、眼底の断層画像（Ｂスキャン画像）と前眼部の断層画像（Ｂスキャン画像）とをセットとする入力データが考えられる。また、学習データに含まれる入力データとして、例えば、眼底の黄斑の三次元ＯＣＴ画像（三次元断層画像）と眼底の視神経乳頭のサークルスキャン（又はラスタースキャン）断層画像とをセットとする入力データ等も考えられる。 Further, the trained model for generating the diagnosis result may be a trained model obtained by learning from the training data including the input data including a plurality of medical images of different parts of the subject. At this time, as the input data included in the training data, for example, input data in which a tomographic image of the fundus of the eye (B scan image) and a tomographic image of the anterior segment of the eye (B scan image) are considered as a set can be considered. Further, as the input data included in the training data, for example, input data in which a three-dimensional OCT image (three-dimensional tomographic image) of the macula of the fundus and a circle scan (or raster scan) tomographic image of the optic nerve head of the fundus are set as a set, etc. Is also possible.

なお、学習データに含まれる入力データは、被検者の異なる部位及び異なる種類の複数の医用画像であってもよい。このとき、学習データに含まれる入力データは、例えば、前眼部の断層画像とカラー眼底画像とをセットとする入力データ等が考えられる。また、上述した学習済モデルは、被検者の所定部位の異なる撮影画角の複数の医用画像をセットとする入力データを含む学習データにより学習して得た学習済モデルであってもよい。また、学習データに含まれる入力データは、パノラマ画像のように、所定部位を複数領域に時分割して得た複数の医用画像を貼り合わせたものであってもよい。このとき、パノラマ画像のような広画角画像を学習データとして用いることにより、狭画角画像よりも情報量が多い等の理由から画像の特徴量を精度良く取得できる可能性があるため、各処理の結果を向上することができる。例えば、推定時（予測時）において、広画角画像における複数の位置で異常部位が検出された場合に、各異常部位の拡大画像を順次表示可能に構成させる。これにより、複数の位置における異常部位を効率よく確認することができるため、例えば、検者の利便性を向上することができる。このとき、例えば、異常部位が検出された広画角画像上の各位置を検者が選択可能に構成され、選択された位置における異常部位の拡大画像が表示されるように構成されてもよい。また、学習データに含まれる入力データは、被検者の所定部位の異なる日時の複数の医用画像をセットとする入力データであってもよい。 The input data included in the learning data may be different parts of the subject and a plurality of different types of medical images. At this time, the input data included in the training data may be, for example, input data in which a tomographic image of the anterior segment of the eye and a color fundus image are set. Further, the trained model described above may be a trained model obtained by learning from training data including input data including a set of a plurality of medical images having different shooting angles of view of a predetermined portion of the subject. Further, the input data included in the learning data may be a combination of a plurality of medical images obtained by time-dividing a predetermined portion into a plurality of regions, such as a panoramic image. At this time, by using a wide angle-of-view image such as a panoramic image as learning data, there is a possibility that the feature amount of the image can be accurately acquired because the amount of information is larger than that of the narrow angle-of-view image. The result of processing can be improved. For example, when an abnormal portion is detected at a plurality of positions in a wide angle-of-view image at the time of estimation (prediction), an enlarged image of each abnormal portion can be sequentially displayed. As a result, it is possible to efficiently confirm the abnormal portion at a plurality of positions, so that the convenience of the examiner can be improved, for example. At this time, for example, the examiner may be configured to select each position on the wide angle-of-view image in which the abnormal portion is detected, and an enlarged image of the abnormal portion at the selected position may be displayed. .. Further, the input data included in the learning data may be input data in which a plurality of medical images of different dates and times of a predetermined part of the subject are set.

また、上述した解析結果と診断結果と物体認識結果とセグメンテーション結果とのうち少なくとも１つの結果が表示される表示画面は、レポート画面に限らない。このような表示画面は、例えば、撮影確認画面、経過観察用の表示画面、及び撮影前の各種調整用のプレビュー画面（各種のライブ動画像が表示される表示画面）等の少なくとも１つの表示画面に表示されてもよい。例えば、上述した学習済モデルを用いて得た上記少なくとも１つの結果を撮影確認画面に表示させることにより、検者は、撮影直後であっても精度の良い結果を確認することができる。また、上述した低画質画像と高画質画像との表示の変更は、例えば、低画質画像の解析結果と高画質画像の解析結果との表示の変更であってもよい。 Further, the display screen on which at least one of the above-mentioned analysis result, diagnosis result, object recognition result, and segmentation result is displayed is not limited to the report screen. Such a display screen is, for example, at least one display screen such as a shooting confirmation screen, a display screen for follow-up observation, and a preview screen for various adjustments before shooting (a display screen on which various live moving images are displayed). It may be displayed in. For example, by displaying at least one result obtained by using the above-mentioned trained model on the shooting confirmation screen, the examiner can confirm the accurate result even immediately after shooting. Further, the above-mentioned change in the display between the low-quality image and the high-quality image may be, for example, a change in the display between the analysis result of the low-quality image and the analysis result of the high-quality image.

ここで、上述した様々な学習済モデルは、学習データを用いた機械学習により得ることができる。機械学習には、例えば、多階層のニューラルネットワークから成る深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）がある。また、多階層のニューラルネットワークの少なくとも一部には、例えば、畳み込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を機械学習モデルとして用いることができる。また、多階層のニューラルネットワークの少なくとも一部には、オートエンコーダ（自己符号化器）に関する技術が用いられてもよい。また、学習には、バックプロパゲーション（誤差逆伝搬法）に関する技術が用いられてもよい。ただし、機械学習としては、深層学習に限らず、画像等の学習データの特徴量を学習によって自ら抽出（表現）可能なモデルを用いた学習であれば何でもよい。ここで、機械学習モデルとは、ディープラーニング等の機械学習アルゴリズムによる学習モデルをいう。また、学習済モデルとは、任意の機械学習アルゴリズムによる機械学習モデルに対して、事前に適切な学習データを用いてトレーニングした（学習を行った）モデルである。ただし、学習済モデルは、それ以上の学習を行わないものではなく、追加の学習を行うこともできるものとする。また、学習データとは、入力データ及び出力データ（正解データ）のペアで構成される。ここで、学習データを教師データという場合もあるし、あるいは、正解データを教師データという場合もある。 Here, the various trained models described above can be obtained by machine learning using the training data. Machine learning includes, for example, deep learning consisting of a multi-layer neural network. Further, for at least a part of the multi-layer neural network, for example, a convolutional neural network (CNN) can be used as a machine learning model. Further, a technique related to an autoencoder (self-encoder) may be used for at least a part of a multi-layer neural network. Further, a technique related to backpropagation (error backpropagation method) may be used for learning. However, the machine learning is not limited to deep learning, and any learning using a model capable of extracting (expressing) the features of learning data such as images by learning may be used. Here, the machine learning model refers to a learning model based on a machine learning algorithm such as deep learning. The trained model is a model in which a machine learning model by an arbitrary machine learning algorithm is trained (learned) in advance using appropriate learning data. However, the trained model does not require further learning, and additional learning can be performed. The learning data is composed of a pair of input data and output data (correct answer data). Here, the learning data may be referred to as teacher data, or the correct answer data may be referred to as teacher data.

なお、ＧＰＵは、データをより多く並列処理することで効率的な演算を行うことができる。このため、ディープラーニングのような学習モデルを用いて複数回に渡り学習を行う場合には、ＧＰＵで処理を行うことが有効である。そこで、本変形例では、学習部（不図示）の一例である画像処理部３０３による処理には、ＣＰＵに加えてＧＰＵを用いる。具体的には、学習モデルを含む学習プログラムを実行する場合に、ＣＰＵとＧＰＵが協働して演算を行うことで学習を行う。なお、学習部の処理は、ＣＰＵまたはＧＰＵのみにより演算が行われても良い。また、上述した様々な学習済モデルを用いた処理を実行する処理部（推定部）も、学習部と同様にＧＰＵを用いても良い。また、学習部は、不図示の誤差検出部と更新部とを備えてもよい。誤差検出部は、入力層に入力される入力データに応じてニューラルネットワークの出力層から出力される出力データと、正解データとの誤差を得る。誤差検出部は、損失関数を用いて、ニューラルネットワークからの出力データと正解データとの誤差を計算するようにしてもよい。また、更新部は、誤差検出部で得られた誤差に基づいて、その誤差が小さくなるように、ニューラルネットワークのノード間の結合重み付け係数等を更新する。この更新部は、例えば、誤差逆伝播法を用いて、結合重み付け係数等を更新する。誤差逆伝播法は、上記の誤差が小さくなるように、各ニューラルネットワークのノード間の結合重み付け係数等を調整する手法である。 The GPU can perform efficient calculations by processing more data in parallel. Therefore, when learning is performed a plurality of times using a learning model such as deep learning, it is effective to perform processing on the GPU. Therefore, in this modification, the GPU is used in addition to the CPU for the processing by the image processing unit 303, which is an example of the learning unit (not shown). Specifically, when executing a learning program including a learning model, learning is performed by the CPU and the GPU collaborating to perform calculations. The processing of the learning unit may be performed only by the CPU or GPU. Further, the processing unit (estimation unit) that executes the processing using the various trained models described above may also use the GPU in the same manner as the learning unit. Further, the learning unit may include an error detecting unit and an updating unit (not shown). The error detection unit obtains an error between the output data output from the output layer of the neural network and the correct answer data according to the input data input to the input layer. The error detection unit may use the loss function to calculate the error between the output data from the neural network and the correct answer data. Further, the update unit updates the coupling weighting coefficient between the nodes of the neural network based on the error obtained by the error detection unit so that the error becomes small. This updating unit updates the coupling weighting coefficient and the like by using, for example, the backpropagation method. The error backpropagation method is a method of adjusting the coupling weighting coefficient between the nodes of each neural network so that the above error becomes small.

また、高画質化やセグメンテーション等に用いられる機械学習モデルとしては、複数のダウンサンプリング層を含む複数の階層からなるエンコーダーの機能と、複数のアップサンプリング層を含む複数の階層からなるデコーダーの機能とを有するＵ−ｎｅｔ型の機械学習モデルが適用可能である。Ｕ−ｎｅｔ型の機械学習モデルでは、エンコーダーとして構成される複数の階層において曖昧にされた位置情報（空間情報）を、デコーダーとして構成される複数の階層において、同次元の階層（互いに対応する階層）で用いることができるように（例えば、スキップコネクションを用いて）構成される。 In addition, as machine learning models used for high image quality and segmentation, there are an encoder function consisting of a plurality of layers including a plurality of downsampling layers and a decoder function consisting of a plurality of layers including a plurality of upsampling layers. A U-net type machine learning model having the above is applicable. In the U-net type machine learning model, position information (spatial information) that is ambiguous in a plurality of layers configured as encoders is displayed in layers of the same dimension (layers corresponding to each other) in a plurality of layers configured as a decoder. ) (For example, using a skip connection).

また、高画質化やセグメンテーション等に用いられる機械学習モデルとしては、例えば、ＦＣＮ（ＦｕｌｌｙＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋ）、又はＳｅｇＮｅｔ等を用いることもできる。また、所望の構成に応じて領域単位で物体認識を行う機械学習モデルを用いてもよい。物体認識を行う機械学習モデルとしては、例えば、ＲＣＮＮ（ＲｅｇｉｏｎＣＮＮ）、ｆａｓｔＲＣＮＮ、又はｆａｓｔｅｒＲＣＮＮを用いることができる。さらに、領域単位で物体認識を行う機械学習モデルとして、ＹＯＬＯ（ＹｏｕＯｎｌｙＬｏｏｋＯｎｃｅ）、又はＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｏｒ）を用いることもできる。 Further, as a machine learning model used for high image quality, segmentation, etc., for example, FCN (Full Convolutional Network), SegNet, or the like can be used. Further, a machine learning model that recognizes an object in a region unit according to a desired configuration may be used. As a machine learning model for performing object recognition, for example, RCNN (Region CNN), fastRCNN, or fasterRCNN can be used. Further, YOLO (You Only Look None) or SSD (Single Shot Detector) can be used as a machine learning model for recognizing an object in a region unit.

また、機械学習モデルは、例えば、カプセルネットワーク（ＣａｐｓｕｌｅＮｅｔｗｏｒｋ；ＣａｐｓＮｅｔ）でもよい。ここで、一般的なニューラルネットワークでは、各ユニット（各ニューロン）はスカラー値を出力するように構成されることによって、例えば、画像における特徴間の空間的な位置関係（相対位置）に関する空間情報が低減されるように構成されている。これにより、例えば、画像の局所的な歪みや平行移動等の影響が低減されるような学習を行うことができる。一方、カプセルネットワークでは、各ユニット（各カプセル）は空間情報をベクトルとして出力するように構成されることよって、例えば、空間情報が保持されるように構成されている。これにより、例えば、画像における特徴間の空間的な位置関係が考慮されたような学習を行うことができる。 Further, the machine learning model may be, for example, a capsule network (Capsule Network; CapsNet). Here, in a general neural network, each unit (each neuron) is configured to output a scalar value, so that, for example, spatial information regarding the spatial positional relationship (relative position) between features in an image can be obtained. It is configured to be reduced. Thereby, for example, learning can be performed so as to reduce the influence of local distortion and translation of the image. On the other hand, in the capsule network, each unit (each capsule) is configured to output spatial information as a vector, so that, for example, spatial information is retained. Thereby, for example, learning can be performed in which the spatial positional relationship between the features in the image is taken into consideration.

また、高画質化エンジン（高画質化用の学習済モデル）は、高画質化エンジンにより生成された少なくとも１つの高画質画像を含む学習データを追加学習して得た学習済モデルであってもよい。このとき、高画質画像を追加学習用の学習データとして用いるか否かを、検者からの指示により選択可能に構成されてもよい。なお、これらの構成は、高画質化用の学習済モデルに限らず、上述した様々な学習済モデルに対しても適用可能である。また、上述した様々な学習済モデルの学習に用いられる正解データの生成には、ラベル付け（アノテーション）等の正解データを生成するための正解データ生成用の学習済モデルが用いられてもよい。このとき、正解データ生成用の学習済モデルは、検者がラベル付け（アノテーション）して得た正解データを（順次）追加学習することにより得られたものであってもよい。すなわち、正解データ生成用の学習済モデルは、ラベル付け前のデータを入力データとし、ラベル付け後のデータを出力データとする学習データを追加学習することにより得られたものであってもよい。また、動画像等のような連続する複数フレームにおいて、前後のフレームの物体認識やセグメンテーション等の結果を考慮して、結果の精度が低いと判定されたフレームの結果を修正するように構成されてもよい。このとき、検者からの指示に応じて、修正後の結果を正解データとして追加学習するように構成されてもよい。 Further, the high image quality engine (trained model for high image quality) may be a trained model obtained by additionally learning learning data including at least one high image quality image generated by the high image quality engine. Good. At this time, it may be configured so that whether or not to use the high-quality image as learning data for additional learning can be selected by an instruction from the examiner. It should be noted that these configurations can be applied not only to the trained model for high image quality but also to the various trained models described above. Further, in the generation of the correct answer data used for learning the various trained models described above, the trained model for generating the correct answer data for generating the correct answer data such as labeling (annotation) may be used. At this time, the trained model for generating correct answer data may be obtained by (sequentially) additionally learning the correct answer data obtained by labeling (annotation) by the examiner. That is, the trained model for generating correct answer data may be obtained by additional training of training data in which the data before labeling is used as input data and the data after labeling is used as output data. Further, in a plurality of consecutive frames such as a moving image, the result of the frame determined to have low accuracy of the result is corrected in consideration of the results of object recognition and segmentation of the preceding and following frames. May be good. At this time, according to the instruction from the examiner, the corrected result may be additionally learned as correct answer data.

（変形例３）
上述した様々な実施形態及び変形例において、物体認識用の学習済モデルやセグメンテーション用の学習済モデルを用いて被検眼の領域（部分領域）を検出する場合には、検出した領域毎に所定の画像処理を施すこともできる。例えば、硝子体領域、網膜領域、及び脈絡膜領域のうちの少なくとも２つの領域を検出する場合を考える。この場合には、検出された少なくとも２つの領域に対してコントラスト調整等の画像処理を施す際に、それぞれ異なる画像処理のパラメータを用いることで、各領域に適した調整を行うことができる。各領域に適した調整が行われた画像を表示することで、操作者は領域毎の疾病等をより適切に診断することができる。なお、検出された領域毎に異なる画像処理のパラメータを用いる構成については、例えば、学習済モデルを用いずに検出された被検眼の領域について同様に適用されてもよい。 (Modification example 3)
In the various embodiments and modifications described above, when a region (partial region) of an eye to be inspected is detected using a trained model for object recognition or a trained model for segmentation, a predetermined region is determined for each detected region. Image processing can also be applied. For example, consider the case of detecting at least two regions of the vitreous region, the retinal region, and the choroid region. In this case, when performing image processing such as contrast adjustment on at least two detected regions, adjustments suitable for each region can be performed by using different image processing parameters. By displaying an image adjusted suitable for each area, the operator can more appropriately diagnose a disease or the like in each area. Note that the configuration using different image processing parameters for each detected region may be similarly applied to the region of the eye to be inspected detected without using the trained model, for example.

（変形例４）
上述した様々な実施形態及び変形例におけるプレビュー画面において、ライブ動画像の少なくとも１つのフレーム毎に上述した学習済モデルが用いられるように構成されてもよい。このとき、プレビュー画面において、異なる部位や異なる種類の複数のライブ動画像が表示されている場合には、各ライブ動画像に対応する学習済モデルが用いられるように構成されてもよい。これにより、例えば、ライブ動画像であっても、処理時間を短縮することができるため、検者は撮影開始前に精度の高い情報を得ることができる。このため、例えば、再撮影の失敗等を低減することができるため、診断の精度や効率を向上させることができる。なお、複数のライブ動画像は、例えば、ＸＹＺ方向のアライメントのための前眼部の動画像、及び眼底観察光学系のフォーカス調整やＯＣＴフォーカス調整のための眼底の正面動画像であってよい。また、複数のライブ動画像は、例えば、ＯＣＴのコヒーレンスゲート調整（測定光路長と参照光路長との光路長差の調整）のための眼底の断層動画像等であってもよい。このとき、上述した物体認識用の学習済モデルやセグメンテーション用の学習済モデルを用いて検出された領域が所定の条件を満たすように、上述した各種調整が行われるように構成されてもよい。例えば、物体認識用の学習済モデルやセグメンテーション用の学習済モデルを用いて検出された硝子体領域やＲＰＥ等の所定の網膜層等に関する値（例えば、コントラスト値あるいは強度値）が閾値を超える（あるいはピーク値になる）ように、ＯＣＴフォーカス調整等の各種調整が行われるように構成されてもよい。また、例えば、物体認識用の学習済モデルやセグメンテーション用の学習済モデルを用いて検出された硝子体領域やＲＰＥ等の所定の網膜層が深さ方向における所定の位置になるように、ＯＣＴのコヒーレンスゲート調整が行われるように構成されてもよい。 (Modification example 4)
In the preview screens in the various embodiments and modifications described above, the trained model described above may be used for at least one frame of the live moving image. At this time, when a plurality of live moving images of different parts or different types are displayed on the preview screen, the trained model corresponding to each live moving image may be used. As a result, for example, even if it is a live moving image, the processing time can be shortened, so that the examiner can obtain highly accurate information before the start of shooting. Therefore, for example, the failure of re-imaging can be reduced, so that the accuracy and efficiency of diagnosis can be improved. The plurality of live moving images may be, for example, a moving image of the anterior segment for alignment in the XYZ direction, and a frontal moving image of the fundus for focusing adjustment or OCT focus adjustment of the fundus observation optical system. Further, the plurality of live moving images may be, for example, a tomographic moving image of the fundus for coherence gate adjustment of OCT (adjustment of the optical path length difference between the measured optical path length and the reference optical path length). At this time, the above-mentioned various adjustments may be performed so that the region detected by using the above-described trained model for object recognition or the above-mentioned trained model for segmentation satisfies a predetermined condition. For example, a value (for example, a contrast value or an intensity value) related to a predetermined retinal layer such as a vitreous region or RPE detected by using a trained model for object recognition or a trained model for segmentation exceeds a threshold value (for example, a contrast value or an intensity value). Alternatively, various adjustments such as OCT focus adjustment may be performed so as to reach a peak value). Further, for example, the OCT so that a predetermined retinal layer such as a vitreous region or RPE detected by using a trained model for object recognition or a trained model for segmentation is at a predetermined position in the depth direction. The coherence gate adjustment may be configured to be performed.

また、上述した学習済モデルを適用可能な動画像は、ライブ動画像に限らず、例えば、記憶部３０２に記憶（保存）された動画像であってもよい。このとき、例えば、記憶部３０２に記憶（保存）された眼底の断層動画像の少なくとも１つのフレーム毎に位置合わせして得た動画像が表示画面に表示されてもよい。例えば、硝子体領域を好適に観察したい場合には、まず、フレーム上に硝子体領域ができるだけ存在する等の条件を基準とする基準フレームを選択してもよい。このとき、各フレームは、ＸＺ方向の断層画像（Ｂスキャン像）である。そして、選択された基準フレームに対して他のフレームがＸＺ方向に位置合わせされた動画像が表示画面に表示されてもよい。このとき、例えば、動画像の少なくとも１つのフレーム毎に高画質化用の学習済モデルにより順次生成された高画質画像（高画質フレーム）を連続表示させるように構成してもよい。 Further, the moving image to which the above-described trained model can be applied is not limited to the live moving image, and may be, for example, a moving image stored (stored) in the storage unit 302. At this time, for example, the moving image obtained by aligning each frame of the tomographic moving image of the fundus stored (stored) in the storage unit 302 may be displayed on the display screen. For example, when it is desired to preferably observe the vitreous region, a reference frame may be selected based on conditions such as the presence of the vitreous region on the frame as much as possible. At this time, each frame is a tomographic image (B scan image) in the XZ direction. Then, a moving image in which another frame is aligned in the XZ direction with respect to the selected reference frame may be displayed on the display screen. At this time, for example, a high-quality image (high-quality frame) sequentially generated by the trained model for high image quality may be continuously displayed for each at least one frame of the moving image.

なお、上述したフレーム間の位置合わせの手法としては、Ｘ方向の位置合わせの手法とＺ方向（深度方向）の位置合わせの手法とは、同じ手法が適用されてもよいし、全て異なる手法が適用されてもよい。また、同一方向の位置合わせは、異なる手法で複数回行われてもよく、例えば、粗い位置合わせを行った後に、精密な位置合わせが行われてもよい。また、位置合わせの手法としては、例えば、断層画像（Ｂスキャン像）をセグメンテーション処理して得た網膜層境界を用いた（Ｚ方向の粗い）位置合わせ、断層画像を分割して得た複数の領域と基準画像との相関情報（類似度）を用いた（Ｘ方向やＺ方向の精密な）位置合わせ、断層画像（Ｂスキャン像）毎に生成した１次元投影像を用いた（Ｘ方向の）位置合わせ、２次元正面画像を用いた（Ｘ方向の）位置合わせ等がある。また、ピクセル単位で粗く位置合わせが行われてから、サブピクセル単位で精密な位置合わせが行われるように構成されてもよい。 As the above-mentioned alignment method between frames, the same method may be applied to the alignment method in the X direction and the alignment method in the Z direction (depth direction), and all different methods may be applied. May be applied. Further, the alignment in the same direction may be performed a plurality of times by different methods, and for example, a precise alignment may be performed after performing a rough alignment. Further, as a method of alignment, for example, a plurality of alignments obtained by segmenting a tomographic image (B scan image) using a retinal layer boundary obtained by performing segmentation processing (coarse in the Z direction) and dividing the tomographic image. Alignment using the correlation information (similarity) between the region and the reference image (precise in the X and Z directions), and the one-dimensional projection image generated for each tomographic image (B scan image) was used (in the X direction). ) Alignment, etc. Alignment (in the X direction) using a two-dimensional front image. Further, it may be configured so that the alignment is roughly performed in pixel units and then the precise alignment is performed in subpixel units.

ここで、各種の調整中では、被検眼の網膜等の撮影対象がまだ上手く撮像できていない可能性がある。このため、学習済モデルに入力される医用画像と学習データとして用いられた医用画像との違いが大きいために、精度良く高画質画像が得られない可能性がある。そこで、断層画像（Ｂスキャン）の画質評価等の評価値が閾値を超えたら、高画質動画像の表示（高画質フレームの連続表示）を自動的に開始するように構成してもよい。また、断層画像（Ｂスキャン）の画質評価等の評価値が閾値を超えたら、高画質化ボタンを検者が指定可能な状態（アクティブ状態）に変更するように構成されてもよい。なお、高画質化ボタンは、高画質化処理の実行を指定するためのボタンである。もちろん、高画質化ボタンは、高画質画像の表示を指示するためのボタンであってもよい。 Here, during various adjustments, it is possible that the imaged object such as the retina of the eye to be inspected has not yet been successfully imaged. Therefore, since there is a large difference between the medical image input to the trained model and the medical image used as the training data, there is a possibility that a high-quality image cannot be obtained with high accuracy. Therefore, when the evaluation value such as the image quality evaluation of the tomographic image (B scan) exceeds the threshold value, the display of the high-quality moving image (continuous display of the high-quality frame) may be automatically started. Further, when the evaluation value such as the image quality evaluation of the tomographic image (B scan) exceeds the threshold value, the image quality enhancement button may be configured to be changed to a state (active state) that can be specified by the examiner. The high image quality button is a button for designating the execution of the high image quality processing. Of course, the high image quality button may be a button for instructing the display of a high image quality image.

また、走査パターン等が異なる撮影モード毎に異なる高画質化用の学習済モデルを用意して、選択された撮影モードに対応する高画質化用の学習済モデルが選択されるように構成されてもよい。また、異なる撮影モードで得た様々な医用画像を含む学習データを学習して得た１つの高画質化用の学習済モデルが用いられてもよい。 Further, a trained model for high image quality is prepared for each shooting mode having a different scanning pattern and the like, and the trained model for high image quality corresponding to the selected shooting mode is selected. May be good. Further, one trained model for high image quality obtained by learning learning data including various medical images obtained in different imaging modes may be used.

（変形例５）
上述した様々な実施形態及び変形例においては、学習済モデルが追加学習中である場合、追加学習中の学習済モデル自体を用いて出力（推論・予測）することが難しい可能性がある。このため、追加学習中の学習済モデルに対する医用画像の入力を禁止することがよい。また、追加学習中の学習済モデルと同じ学習済モデルをもう一つ予備の学習済モデルとして用意してもよい。このとき、追加学習中には、予備の学習済モデルに対して医用画像の入力が実行できるようにすることがよい。そして、追加学習が完了した後に、追加学習後の学習済モデルを評価し、問題なければ、予備の学習済モデルから追加学習後の学習済モデルに置き換えればよい。また、問題があれば、予備の学習済モデルが用いられるようにしてもよい。なお、学習済モデルの評価としては、例えば、高画質化用の学習済モデルで得た高画質画像を他の種類の画像と分類するための分類用の学習済モデルが用いられてもよい。分類用の学習済モデルは、例えば、高画質化用の学習済モデルで得た高画質画像と低画質画像とを含む複数の画像を入力データとし、これらの画像の種類がラベル付け（アノテーション）されたデータを正解データとして含む学習データを学習して得た学習済モデルであってもよい。このとき、推定時（予測時）の入力データの画像の種類が、学習時の正解データに含まれる画像の種類毎の確からしさを示す情報（例えば、割合を示す数値）と合わせて表示されてもよい。なお、分類用の学習済モデルの入力データとしては、上記の画像以外にも、複数の低画質画像の重ね合わせ処理（例えば、位置合わせして得た複数の低画質画像の平均化処理）等によって、高コントラスト化やノイズ低減等が行われたような高画質な画像が含まれても良い。 (Modification 5)
In the various embodiments and modifications described above, when the trained model is undergoing additional learning, it may be difficult to output (infer / predict) using the trained model itself during the additional learning. Therefore, it is preferable to prohibit the input of the medical image to the trained model during the additional learning. Further, another trained model that is the same as the trained model being additionally trained may be prepared as another preliminary trained model. At this time, during the additional learning, it is preferable to enable the input of the medical image to the preliminary trained model. Then, after the additional learning is completed, the trained model after the additional learning is evaluated, and if there is no problem, the preliminary trained model may be replaced with the trained model after the additional learning. Also, if there is a problem, a preliminary trained model may be used. As the evaluation of the trained model, for example, a trained model for classification for classifying a high-quality image obtained by the trained model for high image quality with another type of image may be used. The trained model for classification uses, for example, a plurality of images including a high-quality image and a low-quality image obtained by the trained model for high image quality as input data, and the types of these images are labeled (annotation). It may be a trained model obtained by training training data including the obtained data as correct answer data. At this time, the image type of the input data at the time of estimation (prediction) is displayed together with the information (for example, a numerical value indicating the ratio) indicating the certainty of each type of image included in the correct answer data at the time of learning. May be good. In addition to the above images, the input data of the trained model for classification includes overlay processing of a plurality of low-quality images (for example, averaging processing of a plurality of low-quality images obtained by alignment) and the like. It may include a high-quality image in which high contrast and noise reduction are performed.

また、撮影部位毎に学習して得た学習済モデルを選択的に利用できるようにしてもよい。具体的には、第１の撮影部位（肺、被検眼等）を含む学習データを用いて得た第１の学習済モデルと、第１の撮影部位とは異なる第２の撮影部位を含む学習データを用いて得た第２の学習済モデルと、を含む複数の学習済モデルを用意することができる。そして、画像処理部３０３は、これら複数の学習済モデルのいずれかを選択する選択手段（不図示）を有してもよい。このとき、画像処理部３０３は、選択された学習済モデルに対して追加学習として実行する制御手段（不図示）を有してもよい。制御手段は、検者からの指示に応じて、選択された学習済モデルに対応する撮影部位と該撮影部位の撮影画像とがペアとなるデータを検索し、検索して得たデータを学習データとする学習を、選択された学習済モデルに対して追加学習として実行することができる。なお、選択された学習済モデルに対応する撮影部位は、データのヘッダの情報から取得したり、検者により手動入力されたりしたものであってよい。また、データの検索は、例えば、病院や研究所等の外部施設のサーバ等からネットワークを介して行われてよい。これにより、学習済モデルに対応する撮影部位の撮影画像を用いて、撮影部位毎に効率的に追加学習することができる。 In addition, the trained model obtained by learning for each imaging site may be selectively used. Specifically, learning including a first learned model obtained by using learning data including a first imaging site (lung, eye to be examined, etc.) and a second imaging site different from the first imaging site. A second trained model obtained using the data and a plurality of trained models including the second trained model can be prepared. Then, the image processing unit 303 may have a selection means (not shown) for selecting one of the plurality of trained models. At this time, the image processing unit 303 may have a control means (not shown) that executes additional learning for the selected trained model. The control means searches for data in which the captured part corresponding to the selected trained model and the captured image of the captured part are paired in response to an instruction from the examiner, and the data obtained by the search is the learning data. Can be executed as additional learning for the selected trained model. The imaging region corresponding to the selected trained model may be acquired from the information in the header of the data or manually input by the examiner. Further, the data search may be performed from a server of an external facility such as a hospital or a research institute via a network, for example. As a result, additional learning can be efficiently performed for each imaged part by using the photographed image of the imaged part corresponding to the trained model.

なお、選択手段及び制御手段は、画像処理部３０３のＣＰＵやＭＰＵ等のプロセッサによって実行されるソフトウェアモジュールにより構成されてよい。また、選択手段及び制御手段は、ＡＳＩＣ等の特定の機能を果たす回路や独立した装置等によって構成されてもよい。 The selection means and the control means may be composed of a software module executed by a processor such as a CPU or an MPU of the image processing unit 303. Further, the selection means and the control means may be composed of a circuit that performs a specific function such as an ASIC, an independent device, or the like.

また、追加学習用の学習データを、病院や研究所等の外部施設のサーバ等からネットワークを介して取得する際には、改ざんや、追加学習時のシステムトラブル等による信頼性低下を低減したい。そこで、デジタル署名やハッシュ化による一致性の確認を行うことで、追加学習用の学習データの正当性を検出してもよい。これにより、追加学習用の学習データを保護することができる。このとき、デジタル署名やハッシュ化による一致性の確認した結果として、追加学習用の学習データの正当性が検出できなかった場合には、その旨の警告を行い、その学習データによる追加学習を行わない。なお、サーバは、その設置場所を問わず、例えば、クラウドサーバ、フォグサーバ、エッジサーバ等のどのような形態でもよい。 In addition, when acquiring learning data for additional learning from a server of an external facility such as a hospital or research institute via a network, it is desired to reduce reliability deterioration due to falsification or system trouble during additional learning. Therefore, the correctness of the learning data for additional learning may be detected by confirming the consistency by digital signature or hashing. As a result, the learning data for additional learning can be protected. At this time, if the validity of the training data for additional learning cannot be detected as a result of confirming the consistency by digital signature or hashing, a warning to that effect is given and additional learning is performed using the training data. Absent. The server may be in any form, for example, a cloud server, a fog server, an edge server, or the like, regardless of its installation location.

（変形例６）
上述した様々な実施形態及び変形例において、検者からの指示は、手動による指示（例えば、ユーザーインターフェース等を用いた指示）以外にも、音声等による指示であってもよい。このとき、例えば、機械学習により得た音声認識モデル（音声認識エンジン、音声認識用の学習済モデル）を含む機械学習モデルが用いられてもよい。また、手動による指示は、キーボードやタッチパネル等を用いた文字入力等による指示であってもよい。このとき、例えば、機械学習により得た文字認識モデル（文字認識エンジン、文字認識用の学習済モデル）を含む機械学習モデルが用いられてもよい。また、検者からの指示は、ジェスチャー等による指示であってもよい。このとき、機械学習により得たジェスチャー認識モデル（ジェスチャー認識エンジン、ジェスチャー認識用の学習済モデル）を含む機械学習モデルが用いられてもよい。 (Modification 6)
In the various embodiments and modifications described above, the instruction from the examiner may be an instruction by voice or the like in addition to a manual instruction (for example, an instruction using a user interface or the like). At this time, for example, a machine learning model including a voice recognition model (speech recognition engine, trained model for voice recognition) obtained by machine learning may be used. Further, the manual instruction may be an instruction by character input or the like using a keyboard, a touch panel, or the like. At this time, for example, a machine learning model including a character recognition model (character recognition engine, trained model for character recognition) obtained by machine learning may be used. Further, the instruction from the examiner may be an instruction by a gesture or the like. At this time, a machine learning model including a gesture recognition model (gesture recognition engine, learned model for gesture recognition) obtained by machine learning may be used.

また、検者からの指示は、表示部６００における表示画面上の検者の視線検出結果等であってもよい。視線検出結果は、例えば、表示部６００における表示画面の周辺から撮影して得た検者の動画像を用いた瞳孔検出結果であってもよい。このとき、動画像からの瞳孔検出は、上述したような物体認識エンジンを用いてもよい。また、検者からの指示は、脳波、体を流れる微弱な電気信号等による指示であってもよい。 Further, the instruction from the examiner may be the line-of-sight detection result of the examiner on the display screen of the display unit 600 or the like. The line-of-sight detection result may be, for example, a pupil detection result using a moving image of the examiner obtained by photographing from the periphery of the display screen on the display unit 600. At this time, the object recognition engine as described above may be used for the pupil detection from the moving image. Further, the instruction from the examiner may be an instruction by an electroencephalogram, a weak electric signal flowing through the body, or the like.

このような場合、例えば、学習データとしては、上述したような種々の学習済モデルの処理による結果の表示の指示を示す文字データ又は音声データ（波形データ）等を入力データとし、種々の学習済モデルの処理による結果等を実際に表示部に表示させるための実行命令を正解データとする学習データであってもよい。また、学習データとしては、例えば、高画質化用の学習済モデルで得た高画質画像の表示の指示を示す文字データ又は音声データ等を入力データとし、高画質画像の表示の実行命令及び高画質化ボタンをアクティブ状態に変更するための実行命令を正解データとする学習データであってもよい。もちろん、学習データとしては、例えば、文字データ又は音声データ等が示す指示内容と実行命令内容とが互いに対応するものであれば何でもよい。また、音響モデルや言語モデル等を用いて、音声データから文字データに変換してもよい。また、複数のマイクで得た波形データを用いて、音声データに重畳しているノイズデータを低減する処理を行ってもよい。また、文字又は音声等による指示と、マウス、タッチパネル等による指示とを、検者からの指示に応じて選択可能に構成されてもよい。また、文字又は音声等による指示のオン・オフを、検者からの指示に応じて選択可能に構成されてもよい。 In such a case, for example, as the training data, various trained data such as character data or voice data (waveform data) indicating an instruction for displaying the result by processing the various trained models as described above are used as input data. The training data may be training data in which the execution instruction for actually displaying the result of the model processing on the display unit is the correct answer data. Further, as the training data, for example, character data or audio data indicating an instruction for displaying a high-quality image obtained by a trained model for high image quality is used as input data, and an execution command for displaying a high-quality image and a high image quality are used. It may be learning data in which the execution instruction for changing the image quality button to the active state is the correct answer data. Of course, the learning data may be any data as long as the instruction content and the execution instruction content indicated by the character data, the voice data, or the like correspond to each other. Further, voice data may be converted into character data by using an acoustic model, a language model, or the like. Further, the waveform data obtained by the plurality of microphones may be used to perform a process of reducing the noise data superimposed on the audio data. Further, the instruction by characters or voice and the instruction by a mouse, a touch panel or the like may be configured to be selectable according to the instruction from the examiner. Further, the on / off of the instruction by characters or voice may be selectably configured according to the instruction from the examiner.

ここで、機械学習には、上述したような深層学習があり、また、多階層のニューラルネットワークの少なくとも一部には、例えば、再帰型ニューラルネットワーク（ＲＮＮ：ＲｅｃｕｒｒｅｒｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いることができる。ここで、本変形例に係る機械学習モデルの一例として、時系列情報を扱うニューラルネットワークであるＲＮＮに関して、図９（ａ）及び（ｂ）を参照して説明する。また、ＲＮＮの一種であるＬｏｎｇｓｈｏｒｔ−ｔｅｒｍｍｅｍｏｒｙ（以下、ＬＳＴＭ）に関して、図１０（ａ）及び（ｂ）を参照して説明する。 Here, machine learning includes deep learning as described above, and for at least a part of a multi-layer neural network, for example, a recurrent neural network (RNN) can be used. Here, as an example of the machine learning model according to this modified example, RNN, which is a neural network that handles time series information, will be described with reference to FIGS. 9 (a) and 9 (b). Further, a Long short-term memory (hereinafter referred to as LSTM), which is a kind of RNN, will be described with reference to FIGS. 10 (a) and 10 (b).

図９（ａ）は、機械学習モデルであるＲＮＮの構造を示す。ＲＮＮ３５２０は、ネットワークにループ構造を持ち、時刻ｔにおいてデータｘ^ｔ３５１０を入力し、データｈ^ｔ３５３０を出力する。ＲＮＮ３５２０はネットワークにループ機能を持つため、現時刻の状態を次の状態に引き継ぐことが可能であるため、時系列情報を扱うことができる。図９（ｂ）には時刻ｔにおけるパラメータベクトルの入出力の一例を示す。データｘ^ｔ３５１０にはＮ個（Ｐａｒａｍｓ１〜ＰａｒａｍｓＮ）のデータが含まれる。また、ＲＮＮ３５２０より出力されるデータｈ^ｔ３５３０には入力データに対応するＮ個（Ｐａｒａｍｓ１〜ＰａｒａｍｓＮ）のデータが含まれる。 FIG. 9A shows the structure of the RNN, which is a machine learning model. The RNN 3520 has a loop structure in the network, ^{inputs data x t} 3510 at time t, and outputs ^{data h t 3530.} Since the RNN3520 has a loop function in the network, the current state can be inherited to the next state, so that time-series information can be handled. FIG. 9B shows an example of input / output of the parameter vector at time t. The data x ^t 3510 contains N pieces of data (Params1 to ParamsN). Further, the data h ^t 3530 output from the RNN 3520 includes N data (Params1 to ParamsN) corresponding to the input data.

しかし、ＲＮＮでは誤差逆伝搬時に長期時間の情報を扱うことができないため、ＬＳＴＭが用いられることがある。ＬＳＴＭは、忘却ゲート、入力ゲート、及び出力ゲートを備えることで長期時間の情報を学習することができる。ここで、図１０（ａ）にＬＳＴＭの構造を示す。ＬＳＴＭ３５４０において、ネットワークが次の時刻ｔに引き継ぐ情報は、セルと呼ばれるネットワークの内部状態ｃ^ｔ−１と出力データｈ^ｔ−１である。なお、図の小文字（ｃ、ｈ、ｘ）はベクトルを表している。 However, since RNN cannot handle long-term information at the time of error back propagation, LSTM may be used. The LSTM can learn long-term information by including a forgetting gate, an input gate, and an output gate. Here, FIG. 10A shows the structure of the LSTM. In the RSTM3540, the information that the network takes over at the next time t is the internal state c ^{t-1 of the} network called the cell and the output data h ^t-1 . The lowercase letters (c, h, x) in the figure represent vectors.

次に、図１０（ｂ）にＬＳＴＭ３５４０の詳細を示す。図１０（ｂ）において、ＦＧは忘却ゲートネットワーク、ＩＧは入力ゲートネットワーク、ＯＧは出力ゲートネットワークを示し、それぞれはシグモイド層である。そのため、各要素が０から１の値となるベクトルを出力する。忘却ゲートネットワークＦＧは過去の情報をどれだけ保持するかを決め、入力ゲートネットワークＩＧはどの値を更新するかを判定するものである。ＣＵは、セル更新候補ネットワークであり、活性化関数ｔａｎｈ層である。これは、セルに加えられる新たな候補値のベクトルを作成する。出力ゲートネットワークＯＧは、セル候補の要素を選択し次の時刻にどの程度の情報を伝えるか選択する。 Next, FIG. 10 (b) shows the details of RSTM3540. In FIG. 10B, FG indicates a forgetting gate network, IG indicates an input gate network, and OG indicates an output gate network, each of which is a sigmoid layer. Therefore, a vector in which each element has a value of 0 to 1 is output. The oblivion gate network FG determines how much past information is retained, and the input gate network IG determines which value to update. The CU is a cell update candidate network and is an activation function tanh layer. This creates a vector of new candidate values to be added to the cell. The output gate network OG selects the cell candidate element and selects how much information to convey at the next time.

なお、上述したＬＳＴＭのモデルは基本形であるため、ここで示したネットワークに限らない。ネットワーク間の結合を変更してもよい。ＬＳＴＭではなく、ＱＲＮＮ（ＱｕａｓｉＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いてもよい。さらに、機械学習モデルは、ニューラルネットワークに限定されるものではなく、ブースティングやサポートベクターマシン等が用いられてもよい。また、検者からの指示が文字又は音声等による入力の場合には、自然言語処理に関する技術（例えば、ＳｅｑｕｅｎｃｅｔｏＳｅｑｕｅｎｃｅ）が適用されてもよい。また、検者に対して文字又は音声等による出力で応答する対話エンジン（対話モデル、対話用の学習済モデル）が適用されてもよい。 Since the above-mentioned LSTM model is a basic model, it is not limited to the network shown here. You may change the coupling between the networks. QRNN (Quasi Recurrent Neural Network) may be used instead of RSTM. Further, the machine learning model is not limited to the neural network, and boosting, a support vector machine, or the like may be used. Further, when the instruction from the examiner is input by characters, voice, or the like, a technique related to natural language processing (for example, Sequence to Sequence) may be applied. Further, a dialogue engine (dialogue model, trained model for dialogue) that responds to the examiner by outputting characters or voices may be applied.

（変形例７）
上述した様々な実施形態及び変形例において、高画質画像等は、検者からの指示に応じて記憶部３０２に保存されてもよい。このとき、高画質画像等を保存するための検者からの指示の後、ファイル名の登録の際に、推奨のファイル名として、ファイル名のいずれかの箇所（例えば、最初の箇所、最後の箇所）に、高画質化用の学習済モデルを用いた処理（高画質化処理）により生成された画像であることを示す情報（例えば、文字）を含むファイル名が、検者からの指示に応じて編集可能な状態で表示されてもよい。 (Modification 7)
In the various embodiments and modifications described above, the high-quality image or the like may be stored in the storage unit 302 in response to an instruction from the examiner. At this time, after the instruction from the examiner to save the high-quality image etc., when registering the file name, as a recommended file name, any part of the file name (for example, the first part, the last part) A file name containing information (for example, characters) indicating that the image is an image generated by processing using a trained model for high image quality (high image quality processing) is instructed by the examiner. It may be displayed in an editable state accordingly.

また、レポート画面等の種々の表示画面において、表示部６００に高画質画像を表示させる際に、表示されている画像が高画質化用の学習済モデルを用いた処理により生成された高画質画像であることを示す表示が、高画質画像とともに表示されてもよい。この場合には、検者は、当該表示によって、表示された高画質画像が撮影によって取得した画像そのものではないことが容易に識別できるため、誤診断を低減させたり、診断効率を向上させたりすることができる。なお、高画質化用の学習済モデルを用いた処理により生成された高画質画像であることを示す表示は、入力画像と当該処理により生成された高画質画像とを識別可能な表示であればどのような態様のものでもよい。また、高画質化用の学習済モデルを用いた処理だけでなく、上述したような種々の学習済モデルを用いた処理についても、その種類の学習済モデルを用いた処理により生成された結果であることを示す表示が、その結果とともに表示されてもよい。 Further, when displaying a high-quality image on the display unit 600 on various display screens such as a report screen, the displayed image is a high-quality image generated by processing using a learned model for high image quality. A display indicating that the image may be displayed together with a high-quality image. In this case, the examiner can easily identify from the display that the displayed high-quality image is not the image itself acquired by shooting, so that misdiagnosis can be reduced or the diagnosis efficiency can be improved. be able to. The display indicating that the image is a high-quality image generated by the process using the trained model for high image quality is a display that can distinguish the input image and the high-quality image generated by the process. It may be of any aspect. Further, not only the processing using the trained model for high image quality but also the processing using various trained models as described above is the result generated by the processing using the trained model of the same type. A display indicating that there is may be displayed with the result.

このとき、レポート画面等の表示画面は、検者からの指示に応じて、画像データとして記憶部３０２に保存されてもよい。例えば、高画質画像等と、これらの画像が高画質化用の学習済モデルを用いた処理により生成された高画質画像であることを示す表示とが並んだ１つの画像としてレポート画面が記憶部３０２に保存されてもよい。 At this time, the display screen such as the report screen may be saved in the storage unit 302 as image data according to the instruction from the examiner. For example, the report screen is stored as one image in which a high-quality image or the like and a display indicating that these images are high-quality images generated by processing using a trained model for high image quality are arranged. It may be stored in 302.

また、高画質化用の学習済モデルを用いた処理により生成された高画質画像であることを示す表示について、高画質化用の学習済モデルがどのような学習データによって学習を行ったものであるかを示す表示が表示部に表示されてもよい。当該表示としては、学習データの入力データと正解データの種類の説明や、入力データと正解データに含まれる撮影部位等の正解データに関する任意の表示を含んでよい。なお、高画質化用の学習済モデルを用いた処理だけでなく、上述したような種々の学習済モデルを用いた処理についても、その種類の学習済モデルがどのような学習データによって学習を行ったものであるかを示す表示が表示部に表示されてもよい。 In addition, what kind of training data was used by the trained model for high image quality to learn the display indicating that the image is a high quality image generated by processing using the trained model for high image quality. A display indicating the presence or absence may be displayed on the display unit. The display may include an explanation of the types of the input data and the correct answer data of the learning data, and an arbitrary display regarding the correct answer data such as the imaging part included in the input data and the correct answer data. It should be noted that not only the processing using the trained model for high image quality but also the processing using the various trained models as described above is trained by what kind of training data the trained model of that type uses. A display indicating whether or not the data is used may be displayed on the display unit.

また、高画質化用の学習済モデルを用いた処理により生成された画像であることを示す情報（例えば、文字）を、高画質画像等に重畳した状態で表示又は保存されるように構成されてもよい。このとき、画像上に重畳する箇所は、撮影対象となる注目部位等が表示されている領域には重ならない領域（例えば、画像の端）であればどこでもよい。また、重ならない領域を判定し、判定された領域に重畳させてもよい。 In addition, information (for example, characters) indicating that the image is generated by processing using a trained model for high image quality is displayed or saved in a state of being superimposed on the high image quality image or the like. You may. At this time, the portion to be superimposed on the image may be any region (for example, the edge of the image) that does not overlap the region where the region of interest to be photographed is displayed. Further, the non-overlapping areas may be determined and superimposed on the determined areas.

また、レポート画面の初期表示画面として、高画質化ボタンがアクティブ状態（高画質化処理がオン）となるようにデフォルト設定されている場合には、検者からの指示に応じて、高画質画像等を含むレポート画面に対応するレポート画像が外部記憶部５００等のサーバに送信されるように構成されてもよい。また、高画質化ボタンがアクティブ状態となるようにデフォルト設定されている場合には、検査終了時（例えば、検者からの指示に応じて、撮影確認画面やプレビュー画面からレポート画面に変更された場合）に、高画質画像等を含むレポート画面に対応するレポート画像がサーバに（自動的に）送信されるように構成されてもよい。このとき、デフォルト設定における各種設定（例えば、レポート画面の初期表示画面におけるＥｎ−Ｆａｃｅ画像の生成のための深度範囲、解析マップの重畳の有無、高画質画像か否か、経過観察用の表示画面か否か等の少なくとも１つに関する設定）に基づいて生成されたレポート画像がサーバに送信されるように構成されもよい。 In addition, when the high image quality button is set to the active state (high image quality processing is on) by default as the initial display screen of the report screen, the high image quality image is instructed by the examiner. The report image corresponding to the report screen including the above may be configured to be transmitted to a server such as the external storage unit 500. In addition, when the high image quality button is set to the active state by default, it is changed from the shooting confirmation screen or the preview screen to the report screen at the end of the inspection (for example, according to the instruction from the inspector). In some cases), the report image corresponding to the report screen including the high-quality image and the like may be configured to be (automatically) transmitted to the server. At this time, various settings in the default settings (for example, the depth range for generating the En-Face image on the initial display screen of the report screen, the presence / absence of superimposition of the analysis map, whether or not the image is high quality, and the display screen for follow-up observation. The report image generated based on (settings related to at least one such as whether or not) may be configured to be transmitted to the server.

（変形例８）
上述した様々な実施形態及び変形例において、上述したような種々の学習済モデルのうち、第１の種類の学習済モデルで得た画像（例えば、高画質画像、解析マップ等の解析結果を示す画像、物体認識結果を示す画像、セグメンテーション結果を示す画像）を、第１の種類とは異なる第２の種類の学習済モデルに入力してもよい。このとき、第２の種類の学習済モデルの処理による結果（例えば、解析結果、診断結果、物体認識結果、セグメンテーション結果）が生成されるように構成されてもよい。 (Modification 8)
In the various embodiments and modifications described above, the analysis results of images (for example, high-quality images, analysis maps, etc.) obtained by the first type of trained model among the various trained models described above are shown. An image, an image showing an object recognition result, an image showing a segmentation result) may be input to a second type of trained model different from the first type. At this time, the result (for example, analysis result, diagnosis result, object recognition result, segmentation result) of the processing of the second type of trained model may be generated.

また、上述したような種々の学習済モデルのうち、第１の種類の学習済モデルの処理による結果（例えば、解析結果、診断結果、物体認識結果、セグメンテーション結果）を用いて、第１の種類の学習済モデルに入力した画像から、第１の種類とは異なる第２の種類の学習済モデルに入力する画像を生成してもよい。このとき、生成された画像は、第２の種類の学習済モデルにより処理する画像として適した画像である可能性が高い。このため、生成された画像を第２の種類の学習済モデルに入力して得た画像（例えば、高画質画像、解析マップ等の解析結果を示す画像、物体認識結果を示す画像、セグメンテーション結果を示す画像）の精度を向上することができる。 Further, among the various trained models as described above, the result of processing the first type of trained model (for example, analysis result, diagnosis result, object recognition result, segmentation result) is used to use the first type. An image to be input to a second type of trained model different from the first type may be generated from the image input to the trained model of. At this time, the generated image is likely to be an image suitable as an image to be processed by the second type of trained model. Therefore, an image obtained by inputting the generated image into the second type of trained model (for example, a high-quality image, an image showing an analysis result such as an analysis map, an image showing an object recognition result, and a segmentation result are displayed. The accuracy of the image shown) can be improved.

また、上述したような種々の学習済モデルは、被検体の２次元の医用画像を含む学習データを学習して得た学習済モデルであってもよいし、また、被検体の３次元の医用画像を含む学習データを学習して得た学習済モデルであってもよい。 Further, the various trained models as described above may be trained models obtained by learning learning data including a two-dimensional medical image of the subject, or may be a three-dimensional medical model of the subject. It may be a trained model obtained by learning training data including images.

また、上述したような学習済モデルの処理による解析結果や診断結果等を検索キーとして、サーバ等に格納された外部のデータベースを利用した類似症例画像検索を行ってもよい。なお、データベースにおいて保存されている複数の画像が、既に機械学習等によって該複数の画像それぞれの特徴量を付帯情報として付帯された状態で管理されている場合等には、画像自体を検索キーとする類似症例画像検索エンジン（類似症例画像検索モデル、類似症例画像検索用の学習済モデル）が用いられてもよい。例えば、画像処理部３０３は、（高画質化用の学習済モデルとは異なる）類似症例画像検索用の学習済モデルを用いて、上述のノイズ低減画像から、該ノイズ低減画像に関連する類似症例画像の検索を行うことができる。また、例えば、表示制御部３０５は、被写体の医用画像を含む学習データを学習して得た類似症例画像検索用の学習済モデルを用いて上述のノイズ低減画像から得た該ノイズ低減画像に関連する類似症例画像を表示部６００に表示させることができる。 Further, a similar case image search using an external database stored in a server or the like may be performed using the analysis result, the diagnosis result, etc. obtained by the processing of the learned model as described above as a search key. If a plurality of images stored in the database are already managed by machine learning or the like with the feature amount of each of the plurality of images attached as incidental information, the image itself is used as a search key. Similar case image search engine (similar case image search model, learned model for similar case image search) may be used. For example, the image processing unit 303 uses a trained model for searching for a similar case image (different from the trained model for improving image quality) to obtain a similar case related to the noise-reduced image from the above-mentioned noise-reduced image. You can search for images. Further, for example, the display control unit 305 is related to the noise reduction image obtained from the noise reduction image described above by using the learned model for searching the similar case image obtained by learning the learning data including the medical image of the subject. The similar case image can be displayed on the display unit 600.

（変形例９）
上述した様々な実施形態及び変形例において、上記各種処理は、断層画像の輝度値に基づいて行われる構成に限られない。上記各種処理は、断層画像撮影装置２００で取得された干渉信号、干渉信号にフーリエ変換を施した信号、該信号に任意の処理を施した信号、及びこれらに基づく断層画像等を含む断層データに対して適用されてよい。これらの場合も、上記構成と同様の効果を奏することができる。例えば、分割手段として光カプラを使用したファイバ光学系を用いているが、コリメータとビームスプリッタを使用した空間光学系を用いてもよい。また、断層画像撮影装置２００の構成は、上記の構成に限られず、断層画像撮影装置２００に含まれる構成の一部を断層画像撮影装置２００と別体の構成としてもよい。また、上記の構成では、断層画像撮影装置２００の干渉光学系としてマイケルソン型干渉計の構成が用いられているが、干渉光学系の構成はこれに限られない。例えば、断層画像撮影装置２００の干渉光学系は、マッハツェンダー干渉計の構成を有していてもよい。また、ＯＣＴ装置として、ＳＬＤを光源として用いたスペクトラルドメインＯＣＴ（ＳＤ−ＯＣＴ）装置について述べられているが、ＯＣＴ装置の構成はこれに限られない。例えば、出射光の波長を掃引することができる波長掃引光源を用いた波長掃引型ＯＣＴ（ＳＳ−ＯＣＴ）装置等の他の任意の種類のＯＣＴ装置にも本発明を適用することができる。また、ライン光を用いたＬｉｎｅ−ＯＣＴ装置（あるいはＳＳ−Ｌｉｎｅ−ＯＣＴ装置）に対して本発明を適用することもできる。また、エリア光を用いたＦｕｌｌＦｉｅｌｄ−ＯＣＴ装置（あるいはＳＳ−ＦｕｌｌＦｉｅｌｄ−ＯＣＴ装置）にも本発明を適用することもできる。また、画像処理部３０３は、断層画像撮影装置２００で取得された干渉信号や画像処理部で生成された三次元断層画像等を取得しているが、画像処理部３０３がこれらの信号や画像を取得する構成はこれに限られない。例えば、画像処理部３０３は、ＬＡＮ、ＷＡＮ、又はインターネット等を介して接続されるサーバや撮影装置からこれらの信号を取得してもよい。 (Modification 9)
In the various embodiments and modifications described above, the various processes are not limited to the configuration performed based on the brightness value of the tomographic image. The various processes are applied to fault data including interference signals acquired by the tomographic image capturing apparatus 200, signals obtained by subjecting the interference signals to Fourier transform, signals obtained by subjecting the signals to arbitrary processing, and tomographic images based on these. It may be applied to. In these cases as well, the same effect as the above configuration can be obtained. For example, although a fiber optical system using an optical coupler is used as the dividing means, a spatial optical system using a collimator and a beam splitter may be used. Further, the configuration of the tomographic imaging apparatus 200 is not limited to the above configuration, and a part of the configuration included in the tomographic imaging apparatus 200 may be a configuration separate from the tomographic imaging apparatus 200. Further, in the above configuration, the configuration of the Michelson type interferometer is used as the interference optical system of the tomographic imaging apparatus 200, but the configuration of the interference optical system is not limited to this. For example, the interferometric optical system of the tomographic imaging apparatus 200 may have the configuration of a Mach-Zehnder interferometer. Further, as the OCT device, a spectral domain OCT (SD-OCT) device using an SLD as a light source is described, but the configuration of the OCT device is not limited to this. For example, the present invention can be applied to any other type of OCT device such as a wavelength sweep type OCT (SS-OCT) device using a wavelength sweep light source capable of sweeping the wavelength of emitted light. The present invention can also be applied to a Line-OCT device (or SS-Line-OCT device) using line light. The present invention can also be applied to a Full Field-OCT device (or SS-Full Field-OCT device) using area light. Further, the image processing unit 303 acquires interference signals acquired by the tomographic image capturing apparatus 200, a three-dimensional tom image generated by the image processing unit, and the like, and the image processing unit 303 obtains these signals and images. The configuration to be acquired is not limited to this. For example, the image processing unit 303 may acquire these signals from a server or a photographing device connected via a LAN, WAN, the Internet, or the like.

なお、学習済モデルは、画像処理部３０３に設けられることができる。学習済モデルは、例えば、ＣＰＵ等のプロセッサによって実行されるソフトウェアモジュール等で構成されることができる。また、学習済モデルは、画像処理部３０３と接続される別のサーバ等に設けられてもよい。この場合には、画像処理部３０３は、インターネット等の任意のネットワークを介して学習済モデルを備えるサーバに接続することで、学習済モデルを用いて高画質化処理を行うことができる。 The trained model can be provided in the image processing unit 303. The trained model can be composed of, for example, software modules executed by a processor such as a CPU. Further, the trained model may be provided on another server or the like connected to the image processing unit 303. In this case, the image processing unit 303 can perform high image quality processing using the trained model by connecting to a server including the trained model via an arbitrary network such as the Internet.

（変形例１０）
また、上述した様々な実施形態及び変形例による画像処理装置又は画像処理方法によって処理される画像は、任意のモダリティ（撮影装置、撮影方法）を用いて取得された医用画像を含む。処理される医用画像は、任意の撮影装置等で取得された医用画像や、上記実施形態及び変形例による画像処理装置又は画像処理方法によって作成された画像を含むことができる。 (Modification example 10)
Further, the image processed by the image processing apparatus or the image processing method according to the various embodiments and modifications described above includes a medical image acquired by using an arbitrary modality (imaging apparatus, imaging method). The medical image to be processed may include a medical image acquired by an arbitrary imaging device or the like, or an image created by an image processing device or an image processing method according to the above-described embodiment and modification.

さらに、処理される医用画像は、被検者（被検体）の所定部位の画像であり、所定部位の画像は被検者の所定部位の少なくとも一部を含む。また、当該医用画像は、被検者の他の部位を含んでもよい。また、医用画像は、静止画像又は動画像であってよく、白黒画像又はカラー画像であってもよい。さらに医用画像は、所定部位の構造（形態）を表す画像でもよいし、その機能を表す画像でもよい。機能を表す画像は、例えば、ＯＣＴＡ画像、ドップラーＯＣＴ画像、ｆＭＲＩ画像、及び超音波ドップラー画像等の血流動態（血流量、血流速度等）を表す画像を含む。なお、被検者の所定部位は、撮影対象に応じて決定されてよく、人眼（被検眼）、脳、肺、腸、心臓、すい臓、腎臓、及び肝臓等の臓器、頭部、胸部、脚部、並びに腕部等の任意の部位を含む。 Further, the medical image to be processed is an image of a predetermined part of the subject (subject), and the image of the predetermined part includes at least a part of the predetermined part of the subject. In addition, the medical image may include other parts of the subject. Further, the medical image may be a still image or a moving image, and may be a black-and-white image or a color image. Further, the medical image may be an image showing the structure (morphology) of a predetermined part or an image showing the function thereof. The image showing the function includes, for example, an OCTA image, a Doppler OCT image, an fMRI image, and an image showing blood flow dynamics (blood flow volume, blood flow velocity, etc.) such as an ultrasonic Doppler image. The predetermined part of the subject may be determined according to the subject to be imaged, and the human eye (eye to be examined), brain, lung, intestine, heart, pancreas, kidney, liver and other organs, head, chest, etc. Includes any part such as legs and arms.

また、医用画像は、被検者の断層画像であってもよいし、正面画像であってもよい。正面画像は、例えば、眼底正面画像や、前眼部の正面画像、蛍光撮影された眼底画像、ＯＣＴで取得したデータ（三次元のＯＣＴデータ）について撮影対象の深さ方向における少なくとも一部の範囲のデータを用いて生成したＥｎ−Ｆａｃｅ画像を含む。Ｅｎ−Ｆａｃｅ画像は、三次元のＯＣＴＡデータ（三次元のモーションコントラストデータ）について撮影対象の深さ方向における少なくとも一部の範囲のデータを用いて生成したＯＣＴＡのＥｎ−Ｆａｃｅ画像（モーションコントラスト正面画像）でもよい。また、三次元のＯＣＴデータや三次元のモーションコントラストデータは、三次元の医用画像データの一例である。 Further, the medical image may be a tomographic image of the subject or a frontal image. The frontal image is, for example, a frontal image of the fundus, a frontal image of the anterior segment of the eye, a fundus image photographed by fluorescence, and data acquired by OCT (three-dimensional OCT data) in at least a part range in the depth direction of the imaged object. Includes En-Face images generated using the data from. The En-Face image is an OCTA En-Face image (motion contrast front image) generated by using at least a part of the data in the depth direction of the shooting target for the three-dimensional OCTA data (three-dimensional motion contrast data). ) May be. Further, three-dimensional OCT data and three-dimensional motion contrast data are examples of three-dimensional medical image data.

ここで、モーションコントラストデータとは、被検眼の同一領域（同一位置）において測定光が複数回走査されるように制御して得た複数のボリュームデータ間での変化を示すデータである。このとき、ボリュームデータは、異なる位置で得た複数の断層画像により構成される。そして、異なる位置それぞれにおいて、略同一位置で得た複数の断層画像の間での変化を示すデータを得ることで、モーションコントラストデータをボリュームデータとして得ることができる。なお、モーションコントラスト正面画像は、血流の動きを測定するＯＣＴアンギオグラフィ（ＯＣＴＡ）に関するＯＣＴＡ正面画像（ＯＣＴＡのＥｎ−Ｆａｃｅ画像）とも呼ばれ、モーションコントラストデータはＯＣＴＡデータとも呼ばれる。モーションコントラストデータは、例えば、２枚の断層画像又はこれに対応する干渉信号間の脱相関値、分散値、又は最大値を最小値で割った値（最大値／最小値）として求めることができ、公知の任意の方法により求められてよい。このとき、２枚の断層画像は、例えば、被検眼の同一領域（同一位置）において測定光が複数回走査されるように制御して得ることができる。 Here, the motion contrast data is data indicating a change between a plurality of volume data obtained by controlling the measurement light to be scanned a plurality of times in the same region (same position) of the eye to be inspected. At this time, the volume data is composed of a plurality of tomographic images obtained at different positions. Then, motion contrast data can be obtained as volume data by obtaining data showing changes between a plurality of tomographic images obtained at substantially the same position at different positions. The motion contrast frontal image is also referred to as an OCTA frontal image (OCTA En-Face image) relating to OCTA angiography (OCTA) for measuring the movement of blood flow, and the motion contrast data is also referred to as OCTA data. The motion contrast data can be obtained, for example, as a decorrelation value, a variance value, or a maximum value divided by a minimum value (maximum value / minimum value) between two tomographic images or corresponding interference signals. , It may be obtained by any known method. At this time, the two tomographic images can be obtained, for example, by controlling the measurement light to be scanned a plurality of times in the same region (same position) of the eye to be inspected.

また、Ｅｎ−Ｆａｃｅ画像は、例えば、２つの層境界の間の範囲のデータをＸＹ方向に投影して生成した正面画像である。このとき、正面画像は、光干渉を用いて得たボリュームデータ（三次元の断層画像）の少なくとも一部の深度範囲であって、２つの基準面に基づいて定められた深度範囲に対応するデータを二次元平面に投影又は積算して生成される。Ｅｎ−Ｆａｃｅ画像は、ボリュームデータのうちの、検出された網膜層に基づいて決定された深度範囲に対応するデータを二次元平面に投影して生成された正面画像である。なお、２つの基準面に基づいて定められた深度範囲に対応するデータを二次元平面に投影する手法としては、例えば、当該深度範囲内のデータの代表値を二次元平面上の画素値とする手法を用いることができる。ここで、代表値は、２つの基準面に囲まれた領域の深さ方向の範囲内における画素値の平均値、中央値又は最大値などの値を含むことができる。また、Ｅｎ−Ｆａｃｅ画像に係る深度範囲は、例えば、検出された網膜層に関する２つの層境界の一方を基準として、より深い方向又はより浅い方向に所定の画素数分だけ含んだ範囲であってもよい。また、Ｅｎ−Ｆａｃｅ画像に係る深度範囲は、例えば、検出された網膜層に関する２つの層境界の間の範囲から、操作者の指示に応じて変更された（オフセットされた）範囲であってもよい。 The En-Face image is, for example, a front image generated by projecting data in the range between two layer boundaries in the XY directions. At this time, the front image is at least a part of the depth range of the volume data (three-dimensional tomographic image) obtained by using optical interference, and is the data corresponding to the depth range determined based on the two reference planes. Is projected or integrated on a two-dimensional plane. The En-Face image is a frontal image generated by projecting the data corresponding to the depth range determined based on the detected retinal layer of the volume data onto a two-dimensional plane. As a method of projecting data corresponding to a depth range determined based on two reference planes on a two-dimensional plane, for example, a representative value of data within the depth range is set as a pixel value on the two-dimensional plane. Techniques can be used. Here, the representative value can include a value such as an average value, a median value, or a maximum value of pixel values within a range in the depth direction of a region surrounded by two reference planes. Further, the depth range related to the En-Face image is, for example, a range including a predetermined number of pixels in a deeper direction or a shallower direction with respect to one of the two layer boundaries relating to the detected retinal layer. May be good. Further, the depth range related to the En-Face image may be, for example, a range changed (offset) according to an operator's instruction from a range between two layer boundaries related to the detected retinal layer. Good.

また、撮影装置とは、診断に用いられる画像を撮影するための装置である。撮影装置は、例えば、被検者の所定部位に光、Ｘ線等の放射線、電磁波、又は超音波等を照射することにより所定部位の画像を得る装置や、被写体から放出される放射線を検出することにより所定部位の画像を得る装置を含む。より具体的には、上述した様々な実施形態及び変形例に係る撮影装置は、少なくとも、Ｘ線撮影装置、ＣＴ装置、ＭＲＩ装置、ＰＥＴ装置、ＳＰＥＣＴ装置、ＳＬＯ装置、ＯＣＴ装置、ＯＣＴＡ装置、眼底カメラ、及び内視鏡等を含む。 The imaging device is a device for capturing an image used for diagnosis. The photographing device detects, for example, a device that obtains an image of a predetermined part by irradiating a predetermined part of the subject with radiation such as light or X-rays, electromagnetic waves, ultrasonic waves, or the like, or radiation emitted from the subject. This includes a device for obtaining an image of a predetermined part. More specifically, the imaging devices according to the various embodiments and modifications described above include at least an X-ray imaging device, a CT device, an MRI device, a PET device, a SPECT device, an SLO device, an OCT device, an OCTA device, and a fundus. Includes cameras, endoscopes, etc.

なお、ＯＣＴ装置としては、タイムドメインＯＣＴ（ＴＤ−ＯＣＴ）装置やフーリエドメインＯＣＴ（ＦＤ−ＯＣＴ）装置を含んでよい。また、フーリエドメインＯＣＴ装置はスペクトラルドメインＯＣＴ（ＳＤ−ＯＣＴ）装置や波長掃引型ＯＣＴ（ＳＳ−ＯＣＴ）装置を含んでよい。また、ＳＬＯ装置やＯＣＴ装置として、波面補償光学系を用いた波面補償ＳＬＯ（ＡＯ−ＳＬＯ）装置や波面補償ＯＣＴ（ＡＯ−ＯＣＴ）装置等を含んでよい。また、ＳＬＯ装置やＯＣＴ装置として、偏光位相差や偏光解消に関する情報を可視化するための偏光ＳＬＯ（ＰＳ−ＳＬＯ）装置や偏光ＯＣＴ（ＰＳ−ＯＣＴ）装置等を含んでよい。 The OCT apparatus may include a time domain OCT (TD-OCT) apparatus and a Fourier domain OCT (FD-OCT) apparatus. Further, the Fourier domain OCT apparatus may include a spectral domain OCT (SD-OCT) apparatus and a wavelength sweep type OCT (SS-OCT) apparatus. Further, the SLO device and the OCT device may include a wave surface compensation SLO (AO-SLO) device using a wave surface compensation optical system, a wave surface compensation OCT (AO-OCT) device, and the like. Further, the SLO device and the OCT device may include a polarized SLO (PS-SLO) device, a polarized OCT (PS-OCT) device, and the like for visualizing information on polarization phase difference and polarization elimination.

（その他の実施形態）
上記のそれぞれの実施形態は、画像処理装置として実現したものである。しかしながら、本発明は画像処理装置のみに限定されるものではない。本発明をコンピュータ上で動作するソフトウェアとして実現することも可能である。画像処理装置のＣＰＵは、ＲＡＭやＲＯＭに格納されたコンピュータプログラムやデータを用いてコンピュータ全体の制御を行う。また、画像処理装置の各部に対応するソフトウェアの実行を制御して、各部の機能を実現する。また、ボタンなどのユーザーインターフェースや表示のレイアウトは上記で示したものに限定されるものではない。 (Other embodiments)
Each of the above embodiments has been realized as an image processing device. However, the present invention is not limited to the image processing apparatus. It is also possible to realize the present invention as software that operates on a computer. The CPU of the image processing device controls the entire computer using computer programs and data stored in RAM or ROM. In addition, the execution of software corresponding to each part of the image processing device is controlled to realize the function of each part. In addition, the user interface such as buttons and the layout of the display are not limited to those shown above.

また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した様々な実施形態及び変形例の１以上の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 The present invention is also realized by executing the following processing. That is, software (program) that realizes one or more functions of the various embodiments and modifications described above is supplied to the system or device via a network or various storage media, and the computer (or CPU) of the system or device is supplied. This is a process in which a program is read and executed by an MPU or the like.

また、本発明は、上述した様々な実施形態及び変形例の１以上の機能を実現するソフトウェア（プログラム）を、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータがプログラムを読出し実行する処理でも実現可能である。コンピュータは、１つ又は複数のプロセッサ若しくは回路を有し、コンピュータ実行可能命令を読み出し実行するために、分離した複数のコンピュータ又は分離した複数のプロセッサ若しくは回路のネットワークを含みうる。 Further, the present invention supplies software (program) that realizes one or more functions of the various embodiments and modifications described above to a system or device via a network or storage medium, and the computer of the system or device supplies the software (program). It can also be realized by the process of reading and executing the program. A computer may have one or more processors or circuits and may include multiple separate computers or a network of separate processors or circuits to read and execute computer executable instructions.

このとき、プロセッサ又は回路は、中央演算処理装置（ＣＰＵ）、マイクロプロセッシングユニット（ＭＰＵ）、グラフィクスプロセッシングユニット（ＧＰＵ）、特定用途向け集積回路（ＡＳＩＣ）、又はフィールドプログラマブルゲートウェイ（ＦＰＧＡ）を含みうる。また、プロセッサ又は回路は、デジタルシグナルプロセッサ（ＤＳＰ）、データフロープロセッサ（ＤＦＰ）、又はニューラルプロセッシングユニット（ＮＰＵ）を含みうる。 At this time, the processor or circuit may include a central processing unit (CPU), a microprocessing unit (MPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), or a field programmable gateway (FPGA). Also, the processor or circuit may include a digital signal processor (DSP), a data flow processor (DFP), or a neural processing unit (NPU).

Claims

A means for detecting the position of noise in a first medical image obtained by imaging a subject, and
A means for generating a second medical image in which the pixel value of the detected noise position in the first medical image is reduced, and an estimated image in which noise in the first medical image is presumed to be reduced. When,
A means for changing the estimated image by using an objective function including a first term relating to the difference between the second medical image and the estimated image and a second term relating to the estimated image.
A means for outputting an estimated image obtained by changing the objective function so as to satisfy a predetermined condition as a noise reduction image, and
An image processing device having.

The image processing apparatus according to claim 1, further comprising means for changing a threshold value for detecting the position of the noise according to a control parameter at the time of photographing of the OCT apparatus.

The means for detecting the position of the noise includes at least one pixel having a signal level higher than the peripheral vicinity in the first medical image and a pixel having a signal level smaller than the peripheral vicinity in the first medical image. The image processing apparatus according to claim 1 or 2, which detects the position of noise.

A means of predicting the position of noise in the first medical image obtained by photographing the subject,
A means for generating a second medical image in which the pixel value of the predicted noise position in the first medical image is reduced, and an estimated image in which noise in the first medical image is presumed to be reduced. When,
A means for changing the estimated image by using an objective function including a first term relating to the difference between the second medical image and the estimated image and a second term relating to the estimated image.
Noise reduction is an estimated image obtained by changing the objective function so as to satisfy a predetermined condition, and is obtained by synthesizing a plurality of the estimated images obtained by predicting the position of the noise as a random number. Means to output as an image and
An image processing device having.

The fourth aspect of claim 4, wherein when the position of the noise is predicted as a random number, the ratio of the number of pixels related to noise to the number of pixels in the first medical image is changed according to the control parameter at the time of imaging by the OCT device. Image processing equipment.

The image according to any one of claims 1 to 5, wherein the objective function has a parameter for changing the ratio of the error term according to the first term and the total variation regularization term according to the second term. Processing equipment.

The image processing apparatus according to claim 6, further comprising means for setting the parameters in response to an instruction from the operator.

The image processing device according to claim 6 or 7, wherein the parameters are changed according to control parameters at the time of photographing of the OCT device.

The image processing device according to any one of claims 1 to 8, wherein the first medical image is a tomographic image or a frontal image obtained by imaging the subject by the OCT device.

The image processing device according to any one of claims 1 to 9, wherein the subject is an eye to be inspected.

The image processing apparatus according to any one of claims 1 to 10, further comprising means for determining that the objective function satisfies a predetermined condition when the objective function is minimized.

The image processing apparatus according to any one of claims 1 to 10, wherein the means for changing the estimated image is a means for changing the estimated means using an autoencoder.

The noise reduction images obtained by using the plurality of first medical images obtained by using the means for controlling the scanning of the measurement light are aligned so that the same portion of the subject is scanned a plurality of times. The image processing apparatus according to any one of claims 1 to 12, wherein a plurality of noise reduction images obtained by alignment are added and averaged.

A plurality of the first medical images obtained by using a means for controlling scanning of the measurement light are aligned so that the same portion of the subject is scanned a plurality of times, and a plurality of the aligned first medical images are obtained. The image processing apparatus according to any one of claims 1 to 12, wherein a plurality of the noise reduction images obtained by using the medical image of 1 are added and averaged.

A high-quality image obtained by performing high-quality processing on the noise-reduced image is generated from the noise-reduced image by using a trained model for high image quality obtained by learning training data including a medical image of a subject. The image processing apparatus according to any one of claims 1 to 12.

The image processing apparatus according to any one of claims 1 to 15, further comprising a display control means for displaying the noise reduction image on the display means.

The display control means uses a trained model for generating analysis results obtained by learning training data including a medical image of a subject to obtain analysis results related to the noise reduction image generated from the noise reduction image. The image processing apparatus according to claim 16, wherein the display means displays the image.

The display control means uses a learned model for generating a diagnostic result obtained by learning learning data including a medical image of a subject to obtain a diagnostic result related to the noise-reduced image generated from the noise-reduced image. The image processing apparatus according to claim 16 or 17, which is displayed on the display means.

The display control means uses a trained model for object recognition obtained by learning training data including a medical image of a subject or a trained model for segmentation obtained by learning training data including a medical image of a subject. The image processing apparatus according to any one of claims 16 to 18, wherein a partial region detected from the noise-reduced image is displayed on the display means.

A claim that the display control means causes the display means to display information on a difference between a medical image obtained from the noise reduction image by using a hostile generation network or an autoencoder and the noise reduction image as information on an abnormal portion. The image processing apparatus according to any one of 16 to 19.

The display control means uses a trained model for searching for a similar case image obtained by learning training data including a medical image of a subject to obtain a similar case image related to the noise reduction image obtained from the noise reduction image. The image processing apparatus according to any one of claims 16 to 20, which is displayed on the display means.

The position of the noise is any one of claims 1 to 21 which is detected or predicted by using a learned model obtained by learning learning data including a medical image of a subject from the first medical image. The image processing apparatus according to.

A means of acquiring a first medical image obtained by imaging a subject,
A means for detecting the position of noise in the acquired first medical image from the acquired first medical image by using a learned model obtained by learning learning data including a medical image of a subject. ,
An image processing device having.

The image processing apparatus according to any one of claims 1 to 23,
A system including an OCT device for acquiring the first medical image as a tomographic image.

The process of detecting the position of noise in the first medical image obtained by imaging the subject, and
A step of generating a second medical image in which the pixel value of the detected noise position in the first medical image is reduced, and an estimated image in which noise in the first medical image is presumed to be reduced. When,
A step of changing the estimated image by using an objective function including a first term relating to the difference between the second medical image and the estimated image and a second term relating to the estimated image.
A step of outputting an estimated image obtained by changing the objective function so as to satisfy a predetermined condition as a noise reduction image, and
Image processing method having.

The process of predicting the position of noise in the first medical image obtained by photographing the subject, and
A step of generating a second medical image in which the pixel value of the predicted noise position in the first medical image is reduced, and an estimated image in which noise in the first medical image is presumed to be reduced. When,
A step of changing the estimated image by using an objective function including a first term relating to the difference between the second medical image and the estimated image and a second term relating to the estimated image.
Noise reduction is an estimated image obtained by changing the objective function so as to satisfy a predetermined condition, and is obtained by synthesizing a plurality of the estimated images obtained by predicting the position of the noise as a random number. The process of outputting as an image and
Image processing method having.

A program that executes each step of the image processing method according to claim 25 or 26.
7