CN113543695A

CN113543695A - Image processing apparatus, image processing method, and program

Info

Publication number: CN113543695A
Application number: CN201980093820.0A
Authority: CN
Inventors: 山添学; 岩濑好彦; 高桥理宇真; 内田弘树; 富田律也
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-03-11
Filing date: 2019-11-12
Publication date: 2021-10-22
Also published as: JP7362403B2; JP2020163100A

Abstract

The image processing apparatus includes: a display control unit configured to display, on a display unit, a mixed image obtained by performing mixing processing with variable transmittance using an Optical Coherence Tomography (OCT) image and an OCT angiography (OCTA) image of a mutually corresponding region in a subject acquired by OCT according to an instruction of an operator; setting means for setting a region of interest in the displayed mixed image; and an execution section for executing processing on a region of interest set in at least one of the OCT image and the occa image.

Description

Image processing apparatus, image processing method, and program

Technical Field

The present invention relates to an image processing apparatus and an image processing method that perform processing on a tomographic image of an object acquired by Optical Coherence Tomography (OCT).

Background

A medical tomographic image capturing apparatus such as Optical Coherence Tomography (OCT) can three-dimensionally observe the state inside the retina layer and can be used to diagnose ophthalmic retinal diseases such as age-related macular degeneration (AMD). In recent years, methods of acquiring images at high speed by OCT for clinical sites are roughly classified into two methods: spectral domain OCT (SD-OCT) that acquires an interferogram with a spectroscope using a broadband light source and swept source OCT (SS-OCT) that uses a method of measuring spectral interference with a single-channel photodetector by using a high-speed wavelength scanning light source as a light source. Among recent OCT of these two methods, OCT blood vessel imaging (OCTA) that images blood vessels without using a contrast agent is attracting much attention. The OCTA generates motion contrast data from OCT images acquired by OCT. The motion contrast data is data indicating temporal variation of the measurement target detected by repeatedly capturing images of the same cross section of the measurement target using OCT. Motion contrast data is calculated from differences, ratios or correlations of temporal changes in phase, vector and intensity of the composite OCT signal.

In general, it is becoming common practice to display an OCTA image as an OCTA frontal image converted into a two-dimensional image by projecting three-dimensional motion contrast data calculated from an acquired three-dimensional OCT image onto a two-dimensional plane. PTL1 discusses a technique of generating a two-dimensional front image by specifying the range in the depth direction of motion contrast data to be projected so as to display an occa image.

Reference list

Patent document

PTL 1: japanese patent application laid-open No.2017-6179

Disclosure of Invention

Technical problem

However, the analytical processing of OCT data or motion contrast data can be improved in various aspects. For example, when setting a region of interest serving as an analysis processing target of OCT data or motion contrast data, it is sometimes difficult to make appropriate settings only by using a front image.

The present invention has been devised in view of the above problems, and it is an object to make a region of interest serving as an analysis processing target settable as desired.

Solution to the problem

An image processing apparatus according to an aspect of the present invention includes: a display control unit configured to display, on a display unit, a mixed image obtained by performing mixing processing with variable transmittance using an Optical Coherence Tomography (OCT) image and an OCT angiography (OCTA) image of a mutually corresponding region in a subject acquired by OCT according to an instruction of an operator; setting means for setting a region of interest in the displayed mixed image; and an execution means for executing analysis or processing on a region of interest set in at least one of the OCT image and the occa image.

Advantageous effects of the invention

According to an aspect of the present invention, it is possible to make a region of interest serving as an analysis processing target settable as desired.

Drawings

Fig. 1 is a block diagram illustrating a configuration of an image processing apparatus according to a first exemplary embodiment.

Fig. 2 is a diagram illustrating a tomographic image capturing apparatus according to a first exemplary embodiment.

Fig. 3A is a diagram illustrating a display screen displaying a frontal image of the optic nerve head and a display screen displaying a mixed image obtained by the transmission processing.

Fig. 3B is a diagram illustrating a display screen displaying a frontal image of the optic nerve head and a display screen displaying a mixed image obtained by the transmission processing.

Fig. 4 is a block diagram illustrating a configuration of an image processing apparatus according to a first exemplary embodiment.

Fig. 5 is a flowchart illustrating an analysis process according to the first exemplary embodiment.

Fig. 6 illustrates an example of a configuration of a neural network relating to an image quality improvement process according to a fourth exemplary embodiment.

Fig. 7 is a flowchart illustrating an example of an image processing flow according to the fourth exemplary embodiment.

Fig. 8 is a flowchart illustrating another example of an image processing flow according to the fourth exemplary embodiment.

Fig. 9A illustrates an example of a configuration of a neural network serving as a machine learning engine according to modification 6.

Fig. 9B illustrates an example of a configuration of a neural network serving as a machine learning engine according to modification 6.

Fig. 10A illustrates an example of a configuration of a neural network serving as a machine learning engine according to modification 6.

Fig. 10B illustrates an example of a configuration of a neural network serving as a machine learning engine according to modification 6.

Fig. 11 illustrates an example of a user interface according to the fifth exemplary embodiment.

Fig. 12A illustrates an example of a teaching image related to the image quality improvement processing.

Fig. 12B illustrates an example of a teaching image related to the image quality improvement processing.

Fig. 13A illustrates an example of a teaching image related to the image quality improvement processing.

Fig. 13B illustrates an example of a teaching image related to the image quality improvement processing.

Fig. 14A illustrates an example of a user interface according to the fifth exemplary embodiment.

Fig. 14B illustrates an example of a user interface according to the fifth exemplary embodiment.

Fig. 15 illustrates an example of a configuration of a neural network relating to an image quality improvement process according to a fourth exemplary embodiment.

Detailed Description

[ first exemplary embodiment ]

A description will be given of the following cases: the image processing apparatus according to the present exemplary embodiment performs analysis processing while setting an analysis position and an analysis area with reference to a front image of OCT data for analyzing OCTA data. Hereinafter, an image processing system having an image processing apparatus according to a first exemplary embodiment of the present invention will be described with reference to the drawings.

(arrangement of image processing apparatus)

The configuration of the image processing apparatus 101 of the present exemplary embodiment and the connection with another device will be described with reference to fig. 1. The image processing apparatus 101 is a Personal Computer (PC) connected to the tomographic image capturing apparatus 100. The functions included in the functional blocks corresponding to the image acquisition unit 101-01, the imaging control unit 101-03, the image processing unit 101-04, and the display control unit 101-05 are realized by a computing processing device Central Processing Unit (CPU) (not shown) that executes software modules stored in the storage unit 101-02. It should be understood that the present invention is not limited to such a PC configuration. For example, the image processing unit 101-04 may be realized by dedicated hardware such as an Application Specific Integrated Circuit (ASIC), and the display control unit 101-05 may be realized by using a dedicated processor such as a Graphic Processing Unit (GPU) other than the CPU. Further, the tomographic image capturing apparatus 100 and the image processing apparatus 101 may be connected via a network, or an external storage unit 102 may also be placed on the network so that data can be shared by a plurality of image processing apparatuses.

The image acquisition unit 101-01 is a functional block that acquires signal data of a Scanning Laser Ophthalmoscope (SLO) fundus image or a tomographic image obtained by capturing an image of a subject using the tomographic image capturing apparatus 100 to generate an image. The image acquisition unit 101-01 includes a tomographic image generation unit 101-11 and a motion contrast data generation unit 101-12. The tomographic image generation unit 101-11 acquires signal data (interference signal) of a tomographic image captured by the tomographic image capturing apparatus 100 to generate a tomographic image by performing signal processing, and stores the generated tomographic image in the storage unit 101-02. The motion contrast data generation unit 101-12 generates motion contrast data based on a plurality of tomographic images of the same region (regions in the subject corresponding to each other) that have been generated by the tomographic image generation unit 101-11.

First, the tomographic image generation unit 101-11 generates a tomographic image for one cluster (cluster) by performing frequency conversion, Fast Fourier Transform (FFT), and absolute value conversion (amplitude acquisition) on the interference signal acquired by the image acquisition unit 101-01.

Next, the position adjustment units 101 to 41 align the positions of the tomographic images belonging to the same cluster, and perform superimposition processing. The image feature acquisition units 101 to 44 acquire slice boundary data from the superimposed tomographic images. In the present exemplary embodiment, the variable shape model is used as the acquisition method of the layer boundary, but any known acquisition method of the layer boundary may be used. Here, the acquisition processing of the layer boundary is not necessary. For example, in a case where only a three-dimensional motion contrast image is to be generated and a two-dimensional motion contrast image projected in the depth direction is not to be generated, the acquisition process of the layer boundary may be omitted. The motion contrast data generation unit 101-12 calculates the motion contrast between adjacent tomographic images in the same cluster. In the present exemplary embodiment, the motion contrast data generation unit 101-12 obtains the decorrelation value Mxy as the motion contrast based on the following expression (1),

in equation (1), Axy represents the amplitude of tomographic image data a (of complex data that has been subjected to FFT processing) at a position (x, y), and Bxy represents the amplitude of tomographic image data B at the same position as the position (x, y). The relationship 0. ltoreq. Mxy. ltoreq.1 is satisfied, and as the difference between the two amplitude values Axy and Bxy increases, the Mxy value becomes closer to 1. An image in which the average value of the motion contrast values obtained by performing the decorrelation calculation process represented by equation (1) between arbitrary adjacent tomographic images (belonging to the same cluster) is taken as a pixel value is generated as a final motion contrast image. The number of motion contrast values obtained is obtained by subtracting 1 from the number of tomographic images of each cluster.

In this example, the motion contrast is calculated based on the amplitude of the complex data that has been subjected to the FFT processing, but the calculation method of the motion contrast is not limited to the above-described method. For example, the motion contrast may be calculated based on phase information of the complex data, or may be calculated based on information on both amplitude and phase. Alternatively, the motion contrast may be calculated based on the real and imaginary parts of the complex data.

In addition, in the present exemplary embodiment, the decorrelation value is calculated as the motion contrast, but the calculation method of the motion contrast is not limited thereto. For example, the motion contrast may be calculated based on a difference between two values, or the motion contrast may be calculated based on a ratio between two values.

Further, in the above description, the final motion contrast image is acquired by obtaining an average value of a plurality of acquired decorrelation values. However, the present invention is not limited thereto. For example, an image having a median value or a maximum value of a plurality of acquired decorrelation values as a pixel value may be generated as a final motion contrast image.

The imaging control unit 101-03 is a functional block that performs imaging control on the tomographic image capturing apparatus 100. The imaging control includes specifying settings of image capturing parameters of the tomographic image capturing apparatus 100 and issuing a start or end instruction of image capturing. The image processing unit 101-04 is a functional block including a position adjusting unit 101-41, a synthesizing unit 101-42, a correcting unit 101-43, an image feature acquiring unit 101-44, a projecting unit 101-45, and an analyzing unit 101-46. The synthesis units 101-42 include, for example, a synthesis method specification unit 101-. The synthesizing units 101 to 42 synthesize a plurality of two-dimensional images into one image. Specifically, the synthesis method specification unit 101-421 specifies the type of the synthesis target image (the tomographic image, the motion contrast image, or the tomographic image and the motion contrast image) and the synthesis processing method (superimposition, combination, or collation display). The same-modality image synthesis unit 101-422 performs synthesis processing between tomographic images or motion contrast images. The multi-modality image synthesis unit 101-423 performs synthesis processing between the tomographic image and the motion contrast image. The synthesizing units 101 to 42 are examples of image quality improving means that improve the image quality of the motion contrast data. In addition, in the present exemplary embodiment, for example, in addition to the processing to be performed by the synthesizing units 101 to 42, the image quality improvement processing to be performed using machine learning in the fourth exemplary embodiment to be described below may be applied as the processing to be performed by the image quality improvement apparatus. The correction units 101-43 perform processing of suppressing projection artifacts generated in the motion contrast image. The projection artifact is a phenomenon that shows the contrast of motion in retinal surface layer blood vessels on the deep layer side (the deep layer of retina or the outer retina/choroid) and obtains a high decorrelation value in a region on the deep layer side where no blood vessel exists. For example, the correction unit performs a process of reducing the projection artifact in the synthesized motion contrast data. In other words, the correction units 101 to 43 are examples of processing units that perform processing of reducing the projection artifact on the synthesized motion contrast data. The projection units 101-45 project the tomographic images or the motion contrast images in the depth range based on the boundary positions acquired by the image feature acquisition units 101-44, and generate brightness frontal images (brightness tomographic images) or motion contrast frontal images. At this time, projection may be performed in any depth range. However, in the present exemplary embodiment, two types of front synthetic motion contrast images are generated in the depth ranges of the retinal surface layer and the retinal outer layer. As the projection method, any one of a Maximum Intensity Projection (MIP) and an Average Intensity Projection (AIP) may be selected. The projection range for generating the motion contrast frontal image can be changed by the operator selecting a depth range from a predetermined set of depth ranges displayed on a selection list (not shown). Alternatively, the projection range may be changed by changing the type and offset position of the layer boundary to be used for specifying the projection range from the user interface, or by operating moving layer boundary data superimposed on the tomographic image from the input unit 103. The motion contrast image to be displayed on the display unit 104 is not limited to the motion contrast front image, and a three-dimensional rendered three-dimensional motion contrast image may be displayed. Further, the above-described projection method or whether the projection artifact suppression process is to be performed may be changed from a user interface such as a context menu. For example, a motion contrast image that has been subjected to the projection artifact suppression processing may be displayed as a three-dimensional image on the display unit 104. The analysis units 101-46 are function blocks including an emphasis unit 101-461, an extraction unit 101-462, a measurement unit 101-463 and a comparison unit 101-464. The extraction unit 101-. The extraction unit 101-. The measurement unit 101-.

The image processing apparatus 101 is connected with the tomographic image capturing apparatus 100, the external storage unit 102, the input unit 103, and the display unit 104 via interfaces. The image processing apparatus 101 performs control of the stage unit 100-2 and control of the alignment operation. The external storage unit 102 stores a program for tomographic image capturing, information on the subject's eye (name, age, sex, etc. of the patient), captured images (tomographic image and SLO image/OCTA image), and a composite image, image capturing parameters, image data and measurement data of past examinations, and parameters set by the operator in association.

The input unit 103 is, for example, a mouse, a keyboard, or a touch panel for giving instructions to a computer, and the operator gives instructions to the image processing apparatus 101 and the tomographic image capturing apparatus 100 via the input unit 103. The display unit 104 is, for example, a monitor, and may be provided with a touch User Interface (UI).

(arrangement of tomographic image capturing apparatus)

The tomographic image capturing apparatus 100 is an apparatus for capturing a tomographic image of an eye. The configuration of the measurement optical system and the spectroscope of the tomographic image capturing apparatus 100 according to the present exemplary embodiment will be described with reference to fig. 2.

In the present exemplary embodiment, spectral domain OCT (SD-OCT) is used as the tomographic image capturing apparatus 100. The tomographic image capturing apparatus 100 is not limited thereto. For example, swept source OCT (SS-OCT) may be used.

The measurement optical system 100-1 is an optical system for acquiring an anterior ocular segment image, SLO fundus image, and tomographic image of an eye of a subject. The stage unit 100-2 allows the measurement optical system 100-1 to move back and forth and left and right. The base unit 100-3 includes a spectroscope described below.

Now, the inside of the measurement optical system 100-1 will be described. An objective lens 201 is mounted to face an eye 200 of a subject, and a first dichroic mirror 202 and a second dichroic mirror 203 are arranged on an optical axis of the objective lens 201. These dichroic mirrors separate the optical paths of the respective wavelength bands into an optical path 250 for the OCT optical system, an optical path 251 for the SLO optical system and the fixation lamp, and an optical path 252 for anterior eye observation. The optical path 251 for the SLO optical system and the fixation lamp includes an SLO scanning unit 204,

lenses

205 and 206, a reflecting mirror 207, a third dichroic mirror 208, an Avalanche Photodiode (APD)209, an SLO light source 210, and a fixation lamp 211. The mirror 207 is a prism obtained by evaporating a perforated mirror or a hollow mirror, and separates light into illumination light emitted by the SLO light source 210 and return light from the subject's eye. The third dichroic mirror 208 separates the optical path into the optical path of the SLO light source 210 and the optical path of the fixation lamp 211 by the wavelength band. The SLO scanning unit 204 scans the eye 200 of the subject with light emitted from the SLO light source 210, and includes an X scanner for scanning in the X direction and a Y scanner for scanning in the Y direction. In the present exemplary embodiment, the X scanner includes a polygon mirror that performs high-speed scanning, and the Y scanner includes a galvano mirror. The lens 205 is driven by a motor (not shown) for focusing the SLO optical system and the fixation lamp 211. SLO light source 210 emits light having a wavelength around 780 nm. The APD 209 detects return light from the subject eye. The fixation lamp 211 emits visible light and prompts fixation of the subject. Light emitted from the SLO light source 210 is reflected on the third dichroic mirror 208, passes through the reflecting mirror 207, passes through the

lenses

206 and 205, and is used by the SLO scanning unit 204 to scan the eye 200 of the subject. After returning through the same path as the illumination light, the return light from the subject eye 200 is reflected by the mirror 207, guided to the APD 209, and an SLO fundus image is obtained. Light emitted from the fixation lamp 211 passes through the third dichroic mirror 208 and the reflecting mirror 207, passes through the

lenses

206 and 205, forms a predetermined shape at an arbitrary position on the subject's eye 200 using the SLO scanning unit 204, and prompts subject fixation. On an optical path 252 for anterior ocular observation,

lenses

212 and 213, a beam splitter prism 214, and a Charge Coupled Device (CCD)215 for anterior ocular segment observation that detects infrared light are arranged. The CCD 215 has sensitivity to light in the wavelength of light (not shown) emitted for anterior ocular segment observation. Specifically, the CCD 215 has sensitivity to light in a wavelength near 970 nm. The spectroscopic prism 214 is arranged at a position conjugate to the pupil of the subject eye 200, and the spectroscopic prism 214 can detect the distance in the Z-axis direction (optical axis direction) of the measurement optical system 100-1 with respect to the subject eye 200 as a spectroscopic image of the anterior segment. As described above, the optical path 250 of the OCT optical system includes the OCT optical system, and the optical path 250 is provided for capturing a tomographic image of the subject eye 200. More specifically, the optical path 250 is provided for acquiring an interference signal for forming a tomographic image. An XY scanner 216 for scanning the subject eye 200 with light is provided. Fig. 2 illustrates an XY scanner 216 having one mirror. However, the XY scanner 216 is a current mirror that performs scanning in both XY directions. A lens 217 among the

lenses

217 and 218 is driven by a motor (not shown) for focusing light from the OCT light source 220 emitted from an optical fiber 224 connected to the optical coupler 219 onto the subject eye 200. By focusing, the return light from the subject eye 200 enters the optical fiber 224 while forming a spot-shaped image at the leading end of the optical fiber 224. Now, the optical path from the OCT light source 220 and the configuration of the reference optical system and the spectroscope will be described. The optical path includes an OCT light source 220, a reference mirror 221, dispersion compensation glass 222, a lens 223, an optical coupler 219, single-mode optical fibers 224 to 227 integrally connected to the optical coupler, and a beam splitter 230. These components constitute a Michelson interferometer. The light emitted from the OCT light source 220 passes through the optical fiber 225 and is separated into measurement light on the optical fiber 224 side and reference light on the optical fiber 226 side via the optical coupler 219. The measurement light is emitted onto the subject eye 200 serving as an observation target through the optical path of the above-described OCT optical system, and reaches the optical coupler 219 through the same optical path by reflection and scattering caused by the subject eye 200. In contrast, the reference light reaches the reference mirror 221 via the optical fiber 226, the lens 223, and the dispersion compensation glass 222 inserted to compensate for the wavelength dispersion of the measurement light and the reference light, and is reflected. Then, the reference light returns through the same optical path to reach the optical coupler 219. The measurement light and the reference light are combined by the optical coupler 219 to become interference light. When the optical path length of the measurement light and the optical path length of the reference light become substantially the same, interference occurs. The reference mirror 221 is kept adjustable in the optical axis direction by a motor and a driving mechanism (not shown), and the optical path length of the reference light can be made coincident with the optical path length of the measurement light. The interference light is guided to the beam splitter 230 via the optical fiber 227. In addition,

polarization adjusting units

228 and 229 are provided in the

optical fibers

224 and 226, respectively, and perform polarization adjustment. These polarization modifying units comprise several looped sections of optical fibers. By rotating the ring-shaped portion around the longer direction of the optical fiber, the optical fiber is twisted, and the polarization states of the measurement light and the reflected light can be adjusted and synchronized independently. The beam splitter 230 includes

lenses

232 and 234, a diffraction grating 233, and a line sensor 231.

The interference light emitted from the optical fiber 227 is changed into parallel light via the lens 234, and then the parallel light is dispersed by the diffraction grating 233 and formed on the line sensor 231 by the lens 232. Next, the periphery of the OCT light source 220 will be described. The OCT light source 220 is a superluminescent diode (SLD) as a typical low coherence light source. The OCT light source 220 has a central wavelength of 855nm and a wavelength bandwidth of about 100 nm. The bandwidth affects the resolution of a tomographic image to be obtained in the optical axis direction, and therefore the bandwidth is an important parameter. In this example, SLD is chosen as the type of light source, but only the light source is required to be able to emit low coherent light, and Amplified Spontaneous Emission (ASE) may be used. Near infrared light is suitable for the central wavelength in view of the measuring eye. The center wavelength also affects the resolution in the lateral direction of the tomographic image to be obtained, and therefore it is desirable that the center wavelength is as short as possible. For both reasons, the center wavelength was set to 855 nm. In the present exemplary embodiment, a michelson interferometer is used as the interferometer, but a mach-zehnder interferometer may also be used. Depending on the light amount difference between the measurement light and the reference light, it is desirable to use a mach-zehnder interferometer in the case where the light amount difference is large, and to use a michelson interferometer in the case where the light amount difference is relatively small.

(analysis of OCTA data)

Hereinafter, the analysis processing for OCT motion contrast data will be described specifically. Terms to be used in the description of the exemplary embodiments will be briefly defined. First, information on three-dimensional volume data will be described as OCT data or OCTA data. Next, two-dimensional information that can be extracted from the volume data will be described as an OCT image or an occa image. In particular, an image created by projecting volume data in a specified range in the depth direction will be described as an OCT frontal image or an oca frontal image. In addition, two-dimensional information including data in the depth direction will be described as a tomographic image.

Fig. 3A illustrates an OCTA frontal image 301 of the Optic Nerve Head (ONH). The slider bar 302 indicates a transmittance of 0% as a default value, which is a transmittance of an OCT front image described below. In this example, as the transmittance indicated by the slider bar 302, the transmittance set last may be stored, or when the OCTA front image is switched to another OCTA front image, the transmittance may return to the default value of 0%.

It is known that vascular function of ONH is closely related to the progression of glaucoma, and it is said that quantitative analysis of vascular dysfunction is of great clinical value. However, setting the boundaries of the Neural Canal Opening (NCO) on the OCTA frontal image is somewhat difficult. Since the visibility of NCO is enhanced in OCT frontal images, it becomes easier to set the analysis region. To assess the role of ONH circulation failure in glaucoma, it becomes important to obtain reliable microcirculation information.

Fig. 3B illustrates an example case where the operator sets the slider 302 to 60%. An image 303 generated based on the set transmittance is displayed using the oca frontal image and a second OCT frontal image different from the oca frontal image. In other words, a mixed image that has been subjected to mixing processing based on the variable transmittance is generated using an OCT image and an OCT image of the same point of the object that have been acquired by optical coherence tomography. Then, analysis processing is performed on the set analysis region 304. Specifically, an image processing system of the image processing apparatus 101 of the present exemplary embodiment will be described with reference to fig. 1 and 4. First, when the operator designates the OCTA frontal image in the designated range in the depth direction as the target image, the OCTA frontal image stored in the storage unit 101-02 and the OCT frontal image serving as the second medical image are acquired. The OCTA frontal image, OCT frontal image, and range in the depth direction need not always coincide with each other. The operator can also specify different ranges in the depth direction. In the transmittance setting unit 402, the operator sets the transmittance based on the position set by the slider 302, and determines the transmittance α (0 ≦ α ≦ 1) of the second medical image (in this example, the OCT front image). At this time, the transmission process performs weighted averaging of two images for each pixel using a typical α blend process. The mixing process is performed by performing, for example, a weighted average process of pixel values of mutually corresponding positions of the OCT image and the occa image.

(transparent image) × (first medical image) × (1- α) + (second medical image) × α (2)

The blending processing unit 403 generates a blended image (hereinafter, described as a transparent image) based on the above expression (2), and the display control unit 101-05 displays the generated blended image on the display unit 104. When checking the transparent image displayed on the display unit 104, the operator can change the transmittance until the transparent image becomes a desired transparent image. Alternatively, the operator may change the range in the depth direction of the image while checking the visibility.

Next, the ROI setting unit 404 sets a region of interest (ROI) to be analyzed on the transparent image. The ROI information may be set to parameters such as a center position and a size, or may be set to a general shape (e.g., a circle, an ellipse, or a rectangle). Alternatively, the ROI information may be set to a region having a spline curve formed of a plurality of control points as a free region. It is sufficient to display the ROI information superimposed on the transparent image. Further, in order to check whether the set ROI is a desired region, the operator may update the transparent image by changing the transmittance in a state where the ROI information is displayed in a superimposed manner. In this way, the visibility of the NCO boundary or the state of the microcirculation can be adjusted by appropriately changing the transmittance of the OCT frontal image.

Finally, the analysis units 101 to 46, which are examples of execution units that perform processing on a region of interest in an image, perform various types of image analysis. The type of analysis may be specified by an operator, or may be a preset analysis. The extraction unit 101-. The analysis result is displayed on the display unit 104. The operator specifies, for example, a blood vessel extraction process based on the set ROI information. The extraction unit 101-462 performs the blood vessel extraction by performing the determination processing for the blood vessel region and the non-blood vessel region using the OCTA front image. As an example of the determination processing, it is sufficient to extract pixels satisfying a predetermined threshold as a blood vessel region using threshold processing. The threshold may be a preset fixed value, or may be arbitrarily set by the subject. Alternatively, the threshold may be set adaptively from the oca front image based on a predetermined algorithm (e.g., histogram analysis). In the blood vessel extraction process, binary information representing a blood vessel or a non-blood vessel may be used, or a continuous value (e.g., a distance from a threshold) which is a possibility of a blood vessel may be used. Specific color information may be added to the blood vessel region, or in the case of adopting a continuous value, the color information may be added in a predetermined gradation. The color and gradation representing the blood vessel information are not limited to those based on red, and may be freely selected by the operator.

In addition, color may be added according to the depth of the blood vessel based on the OCTA data. By adding colors to the blood vessels in this way, the image to be used for the operator to set the ROI becomes easier to understand. Of course, vessel extraction may be performed from the OCTA data. By extracting blood vessels as three-dimensional information, color information can be added based on the position and thickness of the blood vessels.

The display control units 101-05 perform the blood vessel measurement based on the blood vessel information extracted by the extraction unit 101-462 and display the measurement result on the display unit 104. In the blood vessel measurement, for example, a blood vessel density or a blood vessel area may be used. The blood vessel area per unit area is obtained as the density of the blood vessel area by calculating, for example, the ratio of the blood vessel area in the entire area of the ROI. The value to be measured in the blood vessel measurement is not limited thereto. The total amount of blood vessels or the tortuosity characteristics of the blood vessels may be measured.

Further, the ROI may be divided into a plurality of regions, and a difference or a ratio between measured values of the respective regions may be calculated. By calculating the difference or ratio, for example, the symmetry properties of the blood vessel can be evaluated. By associating the density of each predetermined region with color data, the blood vessel density can be displayed as a color map image as an analysis result. The color map image and the OCTA front image may be mixed and displayed with a predetermined transmittance (e.g., 50%). In addition, a mixed image of the oca frontal image and the OCT frontal image, and a color map image can be mixed and displayed. The transmittance with respect to the color map image may be fixed, or may be made specifiable by an operator.

The follow-up examination may be performed as a follow-up of the subject, and a follow-up display screen on which analysis results are arranged in chronological order may be displayed. In this case, the comparison of the analysis results may be performed by the comparison unit 101-.

(analysis of optic nerve head)

Now, a processing procedure of the image processing apparatus 101 of the present exemplary embodiment will be described with reference to fig. 5. In step S501, the transmittance α of the OCT frontal image of ONH with respect to the OCTA frontal image is changed based on a setting value on a Graphical User Interface (GUI). In this example, it is assumed that α is a real number in the range from 0 to 1. Nonetheless, alpha may be described in percentage numbers on the GUI. In step S502, transmission processing of the two images is performed based on the changed transmittance, and a transparent image is displayed on the screen. In step S503, the operator determines the transmittance at which the ROI setting is easily performed while checking the transparent image. In step S504, an analysis position or ROI as an analysis region is set. In step S505, an instruction to execute blood vessel extraction processing based on the set ROI information is issued. Finally, in step S506, the blood vessel measurement of ONH is performed, and the measurement result is displayed on the screen.

So far, the description has been given using an analysis example of ONH. However, analysis of the macular region of the subject eye or detection of the foveal avascular region may be performed. For example, when analyzing neovasculature in the deep layer of the macular region, the ROI setting of the macular region becomes easier by performing transmission processing on the OCT frontal image of the surface layer rather than the layer corresponding to the OCTA frontal image. That is, the layers of the oca frontal image and the OCT frontal image need not always be identical, and transmission processing may be performed between images of different layers.

(analysis of the foveal avascular region)

Hereinafter, detection of the Foveal Avascular Zone (FAZ) will be described. Because the FAZ is an avascular area and is low in brightness, the FAZ is extracted, for example, by determining the connectivity of brightness in the peripheral portion based on the center point of the FAZ analysis area. It is sufficient to use any known method as the extraction method. For example, there are extraction using a region expansion method and extraction using a dynamic contour model such as Snake. The application of the above-described analysis processing is not limited to blood vessels. The above analysis processing can also be applied to a vascular channel analysis (for example, a lymphatic channel) used as a field of the vascular channel analysis. Further, in the present exemplary embodiment, the description has been given using an example of the OCTA front image. However, the order of the motion contrast data is not limited. For example, three-dimensional information obtained by performing weighted averaging on the OCTA data and OCT may be generated, and a three-dimensional ROI may be set. Of course, the motion contrast data may be one-dimensional motion contrast data or two-dimensional motion contrast data.

Further, the setting of the ROI may also be performed in the tomographic image. If the check button 305 illustrated in fig. 3A is turned on, a line 306 is displayed on the occa front image. The operator may move the position of the line 306 while dragging it with the mouse. In synchronization with this operation, the tomographic image 307 is updated to a corresponding tomographic image. The intersection of the set ROI 304 and the line 306 may be displayed as a line 308 extending in the vertical direction on the tomographic image 307. By horizontally moving the line 308 while dragging the line 308 using the mouse, the operator can adjust the ROI while checking the ROI in the tomographic image. According to the adjustment made using the line 308, the shape of the ROI 304 changes in such a manner as to smoothly become continuous. In addition, the movable range of the line 308 can be suppressed to a range in which the ROI adjustment in the tomographic image does not destroy the shape of the ROI 304.

The OCTA tomographic image and the OCT tomographic image may be mixed and displayed in a tomographic image 307 not shown. In this case, the slider 302 may be used in common, or a separate slider may be added. In particular, in data of an unhealthy eye, checking whether or not a blood vessel extracted on an OCTA tomographic image is appropriate may sometimes be performed to some extent by increasing the transmittance of the OCT tomographic image. In other words, when transmission processing between the respective tomographic images of the OCTA data and the OCT data is performed, a more detailed ROI can be set.

In addition, the second medical image used in the transmission processing may be an image obtained by an SLO or a fundus camera. Alternatively, the second medical image may be an OCTA front image of another layer. In this case, it is desirable to perform position alignment between images to be subjected to transmission processing.

Further, the number of images to be used in the transmission processing is not limited to 2. Optionally, the addition of the third medical image by weighted summation is considered. For example, the second mixed image may be acquired by performing a mixing process of the third medical image with the first mixed image at the second transmittance. The first mixed image is a mixed image obtained by performing mixing processing of the first medical image and the second medical image with the first transmittance.

[ second exemplary embodiment ]

A description will be given of the following cases: the image processing apparatus according to the present exemplary embodiment performs analysis processing while setting the analysis position and the analysis region in the OCT data with reference to the oca front image.

The thickness of the optic nerve fiber layer, the degree of concavity of the optic nerve head, and the curvature of the eyeball shape can be analyzed from the OCT data. In this way, the states of various diseases can be identified from the layer thickness information and curvature information of the eyeball. In addition, the layer thickness information and the curvature information may be displayed as an image by converting the information into a color map representing the thickness and the curvature as a color gradation, or the ROI may be divided into a plurality of regions and respective averages of the regions may be displayed.

Alternatively, it is believed that analyzing the status of the lamina cribrosa is also beneficial for the diagnosis of glaucoma. Specifically, the thickness of the sieve plate can also be measured by performing appropriate segmentation processing on the tomographic image of the OCT data.

Depending on the eye of the subject, in some cases, effective analysis can be performed by setting the ROI when compared with stricter blood vessel information. For example, in the case of severe myopia, the eyeball shape is deformed, and therefore, by performing transmission processing of the OCTA front image on the OCT front image with a specified transmittance, the ROI can be set while simultaneously checking blood vessel information. The layer thickness or curvature can be analyzed based on the set ROI.

Alternatively, the present invention can be used for a case where complicated determination is made by performing transmission processing of the OCTA front image on the OCT front image or the analysis result image, in addition to the setting of the ROI.

Specifically, by performing transmission processing of the OCTA front image on the color map image of the layer thickness, the operator can visually recognize the state of the blood vessel in, for example, an area having a low layer thickness. The same applies to color images of curvature. Alternatively, by adding the blood vessel information at a specified transmittance when the analysis result of the sieve plate is checked, the thickness of the sieve plate and the flow state of blood entering the sieve plate can be checked at the same time.

In the case of motion contrast data, visual identification of blood vessels is relatively difficult at locations where blood vessels leak or blood flow is small. In the case where it is desired to more strictly perform the transmission processing of the blood flow information, an image obtained by fluorescence fundus contrast study using fluorescein or indocyanine green can thus be used as the second medical image.

Heretofore, analysis of OCT data has been described in the present exemplary embodiment. However, it is not limited thereto. In addition, the first medical image to be subjected to the transmission processing is not limited to the OCT front image, and may be an image in which an analysis result is visualized. Similar to the first exemplary embodiment, a tomographic image can be used. Further, the second medical image to be subjected to the transmission processing is not limited to the oca front image, and is only required to be an image of a different type from the first medical image. At this time, it is only required that the first medical image and the second medical image are images of mutually corresponding regions in the subject.

[ third exemplary embodiment ]

A description will be given of the following cases: the image processing apparatus according to the present exemplary embodiment adaptively changes the transmittance of each pixel with respect to the transmission processing performed in the various exemplary embodiments described above. For example, in the case of performing transmission processing of the OCTA frontal image and the OCT frontal image, information on blood vessels becomes important.

In view of the above, when the transmission processing is performed, the method of the transmission processing is switched by preliminarily assigning the category of the blood vessel region or the non-blood vessel region as the attribute to each pixel. The extraction of the blood vessel region has been described in the first exemplary embodiment. The simplest way is to assign the pixels of the blood vessel region with the attribute of not performing the transmission processing, and assign the pixels of the non-blood vessel region with the attribute of performing the transmission processing. The threshold values for determining the blood vessel properties may be made specifiable by the operator on the screen. The attribute information is changed according to the changed threshold value, and the transmission processing is updated based on the changed attribute. Multiple thresholds for determining the attributes may be made specifiable. For example, attributes may be assigned by making a separation of vessel regions and non-vessel regions between specified threshold ranges.

With this configuration, it becomes possible to perform the transmission processing of the second medical image only for the pixels of the non-blood vessel region without performing the transmission processing of the blood vessel information. Alternatively, the transmission processing may be performed on the pixels having the attribute of the blood vessel region, but the transmittance of the transmission processing is suppressed. For example, in a case where the operator sets the transmittance of the second medical image to α, it is considered that the transmission processing is performed on the non-blood vessel region with the transmittance α, and the transmission processing is performed on the blood vessel region while suppressing the transmittance to α/2. The suppressing method may use a predetermined ratio, or a function based on the transmittance α may be prepared.

In addition, whether a region is a blood vessel region or a non-blood vessel region may be separately maintained as a continuous value with respect to the possibility as a blood vessel. In this case, for example, the maximum transmittance of the blood vessel property with respect to the maximum transmittance specified by the operator may be set in advance, and the transmittance may be determined based on a numerical value indicating the possibility as the blood vessel with respect to the transmittance specified by the operator. The transmission processing method is not limited to this, and various modifications may be made as long as the transmission processing can be performed based on the attribute information of each pixel.

In addition, a variety of attributes may be maintained. In the case of the OCTA front image, the attributes of the blood vessels are managed while being separated into at least two ranges corresponding to, for example, a shallow portion and a deep portion in the depth direction of the OCTA data. The operator may be enabled to instantly switch the attribute to be used based on instructions on the GUI.

In the above description, the attributes are assigned to the pixels based on the blood vessel region and the non-blood vessel region, but the attribute assignment is not limited thereto, and various attributes may be applied. For example, a property in which the transmission processing is not performed may be assigned to a pixel having a specific signal value (e.g., 0) in the second medical image. Alternatively, the attribute may be assigned to a preset partial region. For example, by manually specifying the region where bleeding is identified on the GUI, attributes based on the bleeding region and the non-bleeding region may be assigned to each pixel.

Further, since the depth information of the blood vessel is obtained from the OCTA data, the attribute value can be set based on the depth information of the blood vessel. When displaying a motion contrast frontal image, in the case of vessel overlap, it is sufficient to preliminarily determine whether to use the maximum value, the minimum value, or the average value. In addition, since the OCT data includes layer thickness information, a property based on the layer thickness can be set.

The attribute information may be assigned to the first medical image and the second medical image independently, or may be assigned to only any one of these images. In addition, the method of the transmission processing is not limited to the above-described method, and a person skilled in the art may make various modifications as the processing based on the attribute information set to at least any one.

Description has been given so far using an example of medical image processing of an eye in each of the above-described exemplary embodiments. However, the present invention may also be applied to medical image data (e.g., motion contrast data of skin tissue) acquired by optical coherence tomography.

[ fourth exemplary embodiment ]

Hereinafter, a medical image processing apparatus according to a fourth exemplary embodiment will be described with reference to fig. 6, 7, and 8. The image processing apparatus 101 according to the present exemplary embodiment includes, for example, an image quality improvement unit (not shown) as an image quality improvement apparatus that improves the image quality of the motion contrast data. The image quality improvement unit applies image quality improvement processing using machine learning instead of the above-described synthesizing units 101 to 42. At this time, the image quality improvement unit in the image processing apparatus 101 (or the image processing units 101-04) includes an image quality improvement engine. In the image quality improvement method included in the image quality improvement engine according to the present exemplary embodiment, a process using a machine learning algorithm is performed.

In the present exemplary embodiment, the machine learning model is trained according to a machine learning algorithm using the teaching data. The teaching data includes sets of pairs, each pair including input data as a low-quality image with a specific image capturing condition assumed as a processing target and output data as a high-quality image corresponding to the input data. Specifically, the specific image capturing conditions include a predetermined image capturing area, an image capturing method, an image capturing field angle, and an image size.

Here, the machine learning model is a model obtained by preliminarily performing training (learning) using teaching data (learning data) suitable for any machine learning algorithm. The teaching data comprises sets of one or more pairs, each pair comprising input data and output data (correct data). The format and combination of the input data and the output data of the pair of groups included in the teaching data may be a format and combination suitable for the desired configuration. For example, one of the input data and the output data may be an image, and the other may be a numerical value. One of the input data and the output data may include a plurality of images in a group, and the other may be a character string. Both the input data and the output data may be images.

Specifically, for example, the teaching data (hereinafter, first teaching data) includes sets of pairs each including an image acquired by OCT and an image capturing area label corresponding to the image. The image capture area tag is a specific numeric value or character string indicating an area. In addition, as another example of the teaching data, the teaching data (hereinafter, second teaching data) includes sets of pairs, each pair including a low-quality image containing much noise that has been acquired by normal image capturing of OCT and a high-quality image on which image quality improvement processing has been performed by performing image capturing a plurality of times with OCT.

At this time, if the input data is input into the machine learning model, output data following the design of the machine learning model is output. For example, according to trends trained using the teaching data, the machine learning model outputs output data that most likely corresponds to the input data. In addition, for example, according to a trend trained using the teaching data, the machine learning model may output a probability corresponding to the input data as a numerical value of each type of output data. Specifically, for example, an image acquired by OCT is input to a machine learning model trained using the first teaching data, the machine learning model outputs an image capturing area label of an image capturing area captured in the image, or outputs a probability of each image capturing area label. In addition, for example, a low-quality image with much noise, which has been acquired by normal image capturing of OCT, is input to a machine learning model trained using the second teaching data, and the machine learning model outputs a high-quality image equivalent to an image that has been subjected to image quality improvement processing by performing image capturing with OCT a plurality of times. From the viewpoint of quality maintenance, the machine learning model may be configured not to use output data output by the machine learning model itself as teaching data.

Machine learning algorithms include methods related to deep learning, such as Convolutional Neural Networks (CNNs). In the method related to deep learning, if the settings of parameters of a group of layers or a group of nodes included in a neural network are changed, the degree to which a trend trained using teaching data can be reproduced in output data is sometimes changed. For example, in a machine learning model of deep learning using the first teaching data, if appropriate parameters are set, the probability of outputting a correct image capturing area label becomes high in some cases. In addition, for example, in a machine learning model of deep learning using the second teaching data, if appropriate parameters are set, a higher quality image can be output in some cases.

Specifically, the parameters in the CNN may include, for example, the core size of the filter to be set in the convolutional layer, the number of filters, the value of the stride and the value of the expansion, and the number of nodes output by the affine layer. Based on the teaching data, the parameter set and the number of iterations of training can be set to values expected for the form of utilization of the machine learning model. For example, based on the teaching data, a parameter group and the number of iterations that can output a correct image capturing area label or output a high quality image with a high probability can be set.

One of such determination methods of the parameter set and the number of iterations will be exemplified. First, 70% of the group pairs included in the teaching data are set as the group pairs for training, and the remaining 30% are randomly set as the group pairs for evaluation. Then, training of the machine learning model is performed using the set for training, and at the end of each iteration of training, a training evaluation value is calculated using the set for evaluation. The training evaluation value is, for example, an average value of a value group obtained by evaluating an output obtained when the input data included in each pair is input to the machine learning model being trained and output data corresponding to the input data using a loss function. Finally, the parameter group and the number of iterations obtained when the training evaluation value becomes minimum are determined as the parameter group and the number of iterations of the machine learning model. As described above, by determining the number of iterations by dividing the pair groups included in the teaching data into the pair groups for training and the pair groups for evaluation, it is possible to prevent the machine learning model from performing excessive learning on the pair groups for training.

An image quality improvement engine (learning model for image quality improvement) is a module that outputs a high-quality image obtained by performing image quality improvement on an input low-quality image. The image quality improvement described in this specification refers to converting an input image into an image having an image quality suitable for image diagnosis, and a high-quality image refers to an image converted into an image having an image quality suitable for image diagnosis. In contrast, a low-quality image refers to an image captured without particularly performing a setting to obtain a high image quality, such as a two-dimensional image or a three-dimensional image acquired by, for example, X-ray image capturing, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), OCT, positron emission computed tomography (PET), or Single Photon Emission Computed Tomography (SPECT), or a three-dimensional moving image continuously captured by CT. Specifically, the low-quality image includes, for example, an image obtained by low-dose image capturing performed by an X-ray image capturing device or CT, image capturing performed by MRI without using a contrast agent, or short-time image capturing by COT, and an oca image obtained by performing image capturing a small number of times.

In addition, the content of image quality suitable for image diagnosis depends on what is desired to be diagnosed in various types of image diagnosis. Therefore, image qualities that cannot be generally adapted to image diagnosis include, for example, image qualities with less noise, image qualities with high contrast, image qualities with colors and gradations that make it easy to observe an image capturing target, image qualities with a large image size, and image qualities with high resolution. The image quality suitable for the image diagnosis may further include such image quality that an object and gradation that do not actually exist but are drawn during the image generation are removed from the image.

In addition, if a high-quality image with less noise and high contrast is used for image analysis such as blood vessel analysis processing of an OCTA image or region segmentation processing of a CT or OCT image, analysis can be performed more accurately than in the case of using a low-quality image in many cases. Therefore, the high-quality image output by the image quality improvement engine is sometimes useful not only for image diagnosis but also for image analysis.

In the image processing method included in the image quality improvement method according to the present exemplary embodiment, processing using various machine learning algorithms such as deep learning is performed. In the image processing method, in addition to the processing using the machine learning algorithm, existing arbitrary processing may be performed. Examples of existing arbitrary processing include various types of image filtering processing, matching processing using a database of high-quality images corresponding to similar images, and knowledge base image processing.

Specifically, examples of the configuration of the CNN that improves the image quality of the two-dimensional image include the configuration illustrated in fig. 6. The configuration of CNN includes a plurality of groups of convolution processing blocks 100. The convolution processing block 100 includes a convolution layer 101, a batch normalization layer 102, and an active layer 103 using rectifier linear cells. The configuration of CNN includes a fusion layer 104 and a final convolution layer 105. The fusion layer 104 fuses the output value group of the convolution processing block 100 and the pixel value group constituting the image by connection or addition. The last convolution layer 105 outputs a group of pixel values constituting the high-quality image Im120 that has been fused by the fusion layer 104. In such a configuration, a pixel value group constituting the input image Im110 output after passing through the group of convolution processing blocks 100 and a pixel value group constituting the input image Im110 are fused by the fusion layer 104. Thereafter, the fused pixel value group is formed into a high-quality image Im120 by the last convolution layer 105.

For example, by setting the number of convolution processing blocks 100 to 16 and setting the kernel size of the filter to be 3 pixels in width and 3 pixels in height and the number of filters to be 64 as parameters of the group of convolution layers 101, a certain effect of image quality improvement is obtained. However, as described in the description of the machine learning model described above, a better parameter set can be actually set using teaching data suitable for the utilization form of the machine learning model. In the case of processing a three-dimensional image or a four-dimensional image, the kernel size of the filter may be extended to three or four dimensions.

Another example of the CNN configuration in the image quality improvement unit according to the present exemplary embodiment will be described with reference to fig. 15. Fig. 15 illustrates an example of a machine learning model configuration in the image quality improvement unit. The configuration illustrated in fig. 15 includes a set of a plurality of layers that perform processing of an input value group and output of the processed input value group. The types of layers included in the configuration include a convolutional layer, a downsampling layer, an upsampling layer, and a fusion layer, as illustrated in fig. 15. The convolution layer is a layer that performs convolution processing on a set of input values according to parameters such as the set kernel size of the filter, the number of filters, the value of the stride, and the value of the expansion. The dimensionality of the kernel size of the filter may vary depending on the dimensionality of the input image. The downsampling layer performs a process of making the number of output value groups smaller than the number of input value groups by refining or fusing the input value groups. Specifically, the treatment includes, for example, Max Pooling (Max Pooling) treatment. The upsampling layer performs a process of making the number of output value groups larger than the number of input value groups by copying the input value groups or adding values inserted from the input value groups. Specifically, the processing includes, for example, linear interpolation processing. The fusion layer is a layer that performs processing of inputting a value group such as an output value group of a certain layer or a pixel value group constituting an image from a plurality of sources and fusing the value groups by connection or addition. In such a configuration, a pixel value group constituting the input image Im2410 output after passing through the convolution processing block and a pixel value group constituting the input image Im2410 are fused by a fusion layer. Thereafter, the fused pixel value group is formed as a high-quality image Im2420 by the final convolution layer. As a modification of the CNN configuration, for example, a bulk normalization layer or an activation layer using a rectifier linear unit may be added after the buildup layer (not shown in fig. 15).

GPUs can perform efficient computations by processing large amounts of data simultaneously. Therefore, in the case of performing learning a plurality of times using a learning model such as deep learning, it is effective to perform processing using the GPU. In view of the above, according to the present exemplary embodiment, in the processing to be performed by the image processing unit 101-04 serving as an example of the learning unit (not illustrated), the GPU is used in addition to the CPU. Specifically, in the case of executing a learning program including a learning model, learning is performed by the CPU and the GPU cooperatively performing computation. In the processing of the learning unit, the calculation may be performed only by the CPU or the GPU. The image quality improvement unit may also use a GPU, similar to the learning unit. The learning unit may further include an error detection unit and an update unit, not shown. The error detection unit obtains an error between correct data and output data output from an output layer of the neural network in accordance with input data input to an input layer. The error detection unit may use a loss function to calculate an error between the correct data and the output data from the neural network. Based on the error obtained by the error detection unit, the update unit updates the inter-node connection weighting coefficients of the neural network in such a manner that the error becomes smaller. The updating unit updates the connection weighting coefficients using, for example, back propagation. Back propagation is a method of adjusting the inter-node connection weighting coefficients of each neural network so that the error becomes small.

In the case of using a part of an image processing method such as image processing using CNN, it is necessary to pay attention to the image size. Specifically, it should be noted that, in some cases, an input low-quality image and an output high-quality image require different image sizes in order to cope with such a problem that the image quality of the peripheral portion of the high-quality image is not sufficiently improved.

In the case of employing an image quality improvement engine that requires a difference in image size between an image input to the image quality improvement engine and an output image, the image size is appropriately adjusted, which is not clearly described in the present exemplary embodiment for the sake of clarity of description. Specifically, the image size is adjusted by performing padding or connecting image capturing regions near the input image on the input image such as an image to be used in teaching data for training a machine learning model or an image input to an image quality improvement engine. According to the characteristics of the image quality improvement method, the region to be subjected to the filling is filled with a certain pixel value, is filled with an adjacent pixel value, or is subjected to the mirror filling, so that the image quality improvement can be effectively performed.

Although the image quality improvement method is performed using only one image processing method in some cases, in other cases, two or more image processing methods are used in combination to perform the image quality improvement method. Further, a plurality of image quality improvement methods of a group are performed in parallel, a plurality of high quality images of a group are generated, and finally a high quality image with the highest image quality is selected as a high quality image. The selection of the high-quality image with the highest image quality may be automatically performed using an image quality evaluation index, or may be performed by displaying a plurality of high-quality images in a group on a user interface included in an arbitrary display unit based on an instruction of an inspector (user).

In some cases, an input image that has not undergone image quality improvement is more suitable for image diagnosis. Accordingly, the input image may be included in the final image selection target. In addition, the parameters may be input to the image quality improvement engine along with the low quality image. For example, a parameter specifying the degree to which image quality improvement is performed or a parameter specifying the size of an image filter to be used in the image processing method may be input to the image quality improvement engine together with the input image.

In the present exemplary embodiment, the input data of the teaching data is a low-quality image acquired by an apparatus of the same model as the tomographic image capturing apparatus 100 using the same setting as the tomographic image capturing apparatus 100. In addition, the output data of the teaching data is a high-quality image obtained by performing image processing using settings used in the same model as the tomographic image. Specifically, for example, the output data is a high-quality image (superimposed image) obtained by performing superimposition processing such as sum-average on a group of images (original images) acquired by performing image capturing a plurality of times. The high-quality image and the low-quality image will be described using the motion contrast data of the OCTA as an example. The motion contrast data is data used in, for example, the occa, which indicates temporal variation of an image capturing target detected by repeatedly capturing images of the same point of the image capturing target. At this time, an En-Face image (motion contrast Face image) of the oca may be obtained by generating a Face image using data in a desired range in the depth direction of the image capturing target among the calculated motion contrast data (an example of three-dimensional medical image data). Hereinafter, repeatedly performing image capturing of OCT data at the same point will be referred to as a repetition Number (NOR).

In the present exemplary embodiment, two different types of methods will be described with reference to fig. 12 as a generation example of a high-quality image and a low-quality image using superimposition processing.

A first method will be described with reference to fig. 12A as an example of a high-quality image with respect to motion contrast data generated from OCT data obtained by repeatedly performing image capturing of the same point of an image capturing target. Fig. 12A illustrates three-dimensional motion contrast data Im2810 and two-dimensional motion contrast data Im2811 included in the three-dimensional motion contrast data. FIG. 12A also illustrates OCT tomographic images (B-scans) Im2811-1 through Im2811-3 used to generate two-dimensional motion contrast data Im 2811. In FIG. 12A, NOR indicates the number of OCT tomographic images in OCT tomographic images Im2811-1, Im2811-2, and Im 2811-3. In the example of fig. 12A, NOR is 3. The number of OCT tomographic images Im2811-1, Im2811-2 and Im2811-3 is captured at a predetermined time interval (Δ t). The same point indicates a line in the anterior direction (X-Y) of the subject's eye. In fig. 12A, the same point corresponds to one point of the two-dimensional motion contrast data Im 2811. The front direction is an example of a direction intersecting the depth direction. Since the motion contrast data is data indicating a detected temporal change, NOR is set to at least 2 in order to generate data. For example, in the case where the NOR is 2, one piece of motion contrast data is generated. In the case where the NOR is 3, two pieces of data are generated in the case where the OCT data is used only at adjacent time intervals (the first time and the second time and the third time) to generate the motion contrast data. In the case where the OCT data is also used at separate time intervals (first and third times) to generate the motion contrast data, a total of three pieces of data are generated. In other words, if the NOR is increased to three times, four times, etc., the number of pieces of motion contrast data at the same point is also increased. By aligning the positions of a plurality of pieces of motion contrast data acquired by repeatedly performing image capturing of the same point and performing superimposition processing such as sum-averaging, it is possible to generate motion contrast data with high image quality. Therefore, the NOR is set to at least three times or more, and is set to five times or more as desired. In contrast, as an example of a low-quality image corresponding to a high-image-quality image, motion contrast data that has not been subjected to superimposition processing such as sum-average may be employed. In this case, it is desirable to use the low-quality image as a reference image when performing superimposition processing such as sum-average. When the superimposition processing is performed, if the position alignment is performed by deforming the position or shape of the target image with respect to the reference image, a spatial position shift is hardly generated between the reference image and the image subjected to the superimposition processing. Therefore, a pair of a low-quality image and a high-quality image can be easily formed. Instead of the reference image, a target image on which image deformation processing for position alignment has been performed may be used as a low-quality image. By setting respective images of the original image group (reference image and target image) as input data and setting corresponding superimposed images as output data, a plurality of pairs in a group can be generated. For example, in a case where one superimposed image is to be obtained from a group of 15 original images, a pair of a first original image and a superimposed image in the group of original images and a pair of a second original image and a superimposed image in the group of original images may be generated. In this way, in the case where one superimposed image is to be obtained from a group of 15 original images, groups of 15 pairs each including one image in the group of original images and the superimposed image can be generated. By repeatedly performing image capturing of the same point in the main scanning (X) direction and performing scanning while shifting the image capturing position in the sub-scanning (Y) direction, three-dimensional high image quality data can be generated.

A second method in which a high-quality image is generated by performing superimposition processing on motion contrast data obtained by performing image capturing of the same area of an image capturing target a plurality of times will be described with reference to fig. 12B. The same region refers to a region having a size of, for example, 3mm × 3mm or 10mm × 10mm in the front direction (X-Y) of the subject eye, and three-dimensional motion contrast data including the depth direction of the tomographic image is acquired. When the superimposition processing is performed by performing image capturing of the same area a plurality of times, the NOR is set to two or three times as desired in order to make the one-time image capturing time short. In order to generate three-dimensional motion contrast data with high image quality, at least two or more pieces of three-dimensional data of the same region are required. Fig. 12B illustrates an example of pieces of three-dimensional motion contrast data. Similar to fig. 12A, fig. 12B illustrates three-dimensional motion contrast data Im2820, Im2830, and Im 2840. Using the two or more pieces of three-dimensional motion contrast data, a position alignment process in the front direction (X-Y) and the depth direction (Z) is performed, data causing an artifact is removed from each piece of data, and then an averaging process is performed. Thereby, a piece of three-dimensional motion contrast data having high image quality from which the artifact is removed can be generated. By generating an arbitrary plane from the three-dimensional motion contrast data, a high-quality image is obtained. In contrast, a low-quality image corresponding thereto is set as desired as an arbitrary plane generated from the reference data when the superimposition processing such as the sum-average is performed. As described in the first method, a spatial position shift is hardly generated between the reference image and the image that has undergone the sum-averaging, and thus a pair of a low-quality image and a high-quality image can be easily formed. An arbitrary plane generated from target data, to which the image modification processing of the position alignment has been performed, instead of the reference data may be set as a low-quality image.

Since image capturing is performed once in the first method, the burden on the subject is small. However, as the number of NORs increases, the image capturing time of one image capturing becomes longer. In addition, in the case where eye clouding or an artifact such as eyelashes enters the eyes during image capturing, a good image is not always obtained. Since image capturing is performed a plurality of times in the second method, the burden imposed on the subject is increased a little bit. However, the image capturing time for one image may be short, and even if an artifact enters the eye in one image capturing, a good image with less artifact may be finally obtained as long as the artifact is not included in another image capturing. In view of these features, when data is collected, an arbitrary method is selected according to the state of the subject.

In the present exemplary embodiment, the motion contrast data has been described as an example, and is not limited thereto. Because OCT data is captured to generate motion contrast data, the same processing can be performed in OCT data using the methods described above. In the present exemplary embodiment, the description of the tracking process is omitted. However, since images of the same point or the same region of the subject eye are captured, it is desirable to perform image capturing while tracking the subject eye.

In the present exemplary embodiment, since a pair of three-dimensional high image quality data and low image quality data is formed, a pair of arbitrary two-dimensional images can be thereby generated. In this regard, a description will be given with reference to fig. 13. For example, in the case where the target image is set as the En-Face image of OCTA, the En-Face image of OCTA is generated from the three-dimensional data in a desired depth range. The desired depth range is referred to as the Z-direction in fig. 12. Fig. 13A illustrates an example of an En-Face image of the occa to be generated at this time. Learning is performed using an En-Face image of the OCTA generated in different depth ranges such as the superficial layer (Im2910), the deep layer (Im2920), the external layer (Im2930), and the choroidal vascular network (Im2940) as the En-Face image of the OCTA. The type of En-Face image of OCTA is not limited to these. An En-Face image of the oca to which different depth ranges are set by varying the reference layer and offset value can be generated, and the number of types can be increased. In performing the learning, the learning may be performed independently for the respective En-Face images of the ocas of different depths, a plurality of images in different depth ranges may be learned in combination (for example, divided into the surface layer side and the deep layer side), or the En-Face images of the ocas in all depth ranges may be learned together. In the case where an En-Face image of luminance is generated from OCT data, learning is performed using a plurality of En-Face images generated from an arbitrary depth range, similarly to the En-Face of occa. For example, a case will be considered in which the image quality improvement engine includes a machine learning engine obtained using learning data including a plurality of motion contrast frontal images corresponding to different depth ranges of the subject's eye. At this time, the acquisition unit may acquire, as the first image, a motion contrast front image corresponding to a partial depth range of the long depth range including the different depth ranges. In other words, a motion contrast frontal image corresponding to a depth range different from a plurality of depth ranges corresponding to a plurality of motion contrast frontal images included in the learning data can be used as an input image in image quality improvement. The motion contrast front image in the same depth range as that at the time of learning can be used as an input image in image quality improvement. In addition, the portion of the depth range may be set according to the examiner pressing any button on the user interface, or may be automatically set. The above is not limited to the motion contrast front image and can be applied to, for example, an En-Face image of luminance.

In the case where the processing target image is a tomographic image, learning is performed using an OCT tomographic image as a B-scan or a tomographic image of motion contrast data. In this regard, a description will be given with reference to fig. 13B. Fig. 13B illustrates tomographic images Im2951, Im2952, and Im2953 of OCT. Fig. 13B illustrates different images because the tomographic images Im2951 to Im2953 are tomographic images of places having different positions in the sub-scanning (Y) direction. In the tomographic image, learning can be performed together regardless of the positional difference in the sub-scanning direction. However, in the case where images are obtained by performing image capturing of different image capturing regions (for example, the center of the macular region and the center of the optic nerve head portion), learning may be performed independently for each region, or may be performed together irrespective of the image capturing region. The image feature amounts of the OCT tomographic image and the tomographic image of the motion contrast data are greatly different from each other, and it is desirable to perform learning independently.

Pixels drawn together in the original image group are emphasized in the superimposed image that has undergone the superimposition processing, thereby making the superimposed image a high-quality image suitable for image diagnosis. In this case, since the commonly drawn pixels are emphasized, the generated high-quality image becomes a high-contrast image in which the difference between the low-luminance region and the high-luminance region is clear. For example, in the superimposed image, random noise generated each time image capturing is performed can be reduced, and a region that has not been correctly drawn at a certain point in time in the original image can be interpolated by another original image group.

In the case where the input data requiring the machine learning model includes a plurality of images, a required number of sets of raw images can be selected from the sets of raw images and used as the input data. For example, in a case where one superimposed image is to be obtained from a group of 15 original images, if two images are required as input data of the machine learning model, a group of 105(15C2 ═ 105) pairs may be generated.

In the set of pairs including the teaching data, pairs that do not contribute to the image quality improvement may be excluded from the teaching data. For example, in the case where a high-quality image as output data of a pair constituting teaching data has image quality unsuitable for image diagnosis, an image output by an image quality improvement engine learned using the teaching data may have image quality unsuitable for image diagnosis. Thus, by excluding pairs whose output data have image quality unsuitable for image diagnosis from the teaching data, the probability that the image quality improvement engine generates images whose image quality is unsuitable for image diagnosis can be reduced.

In addition, in the case where the average luminance or luminance distribution of the paired image groups is largely different, the image quality improvement engine using the teaching data learning can output an image unsuitable for image diagnosis whose luminance distribution is largely different from that of a low-quality image. Therefore, pairs of input data and output data having large differences in average luminance or luminance distribution can be excluded from the teaching data.

In addition, in the case where the structure or position of the image capturing target drawn in the image group of the plurality of pairs differs greatly, the image quality improvement engine using the teaching data learning may output an image unsuitable for image diagnosis in which the image capturing target is drawn in a structure or position differing greatly from that in the low-quality image. Therefore, pairs of input data and output data in which the drawn image capturing target has a large difference in structure or position can be excluded from the teaching data. In addition, the image quality improvement engine may be configured not to use a high-quality image output by itself as teaching data in terms of quality maintenance.

By using the image quality improvement engine that has performed machine learning in this way, the image quality improvement unit in the image processing apparatus 101 (or the image processing units 101-04) can output a high-quality image on which contrast improvement or noise reduction is performed by superimposition processing in a case where a medical image acquired by one-time image capturing is input. Therefore, the image quality improvement unit can generate a high-quality image suitable for image diagnosis based on the low-quality image as the input image.

Now, a series of image processing according to the present exemplary embodiment will be described with reference to a flowchart illustrated in fig. 7. Fig. 7 is a flowchart of a series of image processing according to the present exemplary embodiment. First, when a series of image processing according to the present exemplary embodiment starts, the process proceeds to step S510.

In step S510, the image acquisition unit 101-01 acquires an image captured by the tomographic image capturing apparatus 100 as an input image from the tomographic image capturing apparatus 100 connected via a circuit or a network. The image acquisition unit 101-01 may acquire an input image according to a request from the tomographic image capturing apparatus 100. For example, when the tomographic image capturing apparatus 100 generates an image, such a request may be issued that the stored image is displayed on the display unit 104 when, before, or after the tomographic image capturing apparatus 100 stores the generated image in a storage device included in the tomographic image capturing apparatus 100, or when a high-quality image is used in image analysis processing.

The image acquisition unit 101-01 may acquire data for generating an image from the tomographic image capturing apparatus 100 and acquire an image generated by the image processing apparatus 101 based on the data as an input image. In this case, any conventional image generation method may be adopted as the image generation method by which the image processing apparatus 101 generates various images.

In step S520, an image capturing condition acquisition unit (not illustrated) in the image processing apparatus 101 acquires an image capturing condition group of the input image. Specifically, the image capturing condition acquisition unit acquires the set of image capturing conditions stored in the data structure including the input image, according to the data format of the input image. As described above, in the case where the image capturing condition is not stored in the input image, the image capturing condition acquisition unit may acquire the image capturing information group including the image capturing condition group from the tomographic image capturing apparatus 100 or an image management system (not illustrated).

In step S530, the image quality improvement performability determination unit (not illustrated) in the image processing apparatus 101 uses the acquired image capturing condition set to determine whether the image quality of the input image can be improved by the quality improvement engine included in the image quality improvement unit in the image processing apparatus 101 (or the image processing units 101-04). Specifically, the image quality improvement performability determination unit determines whether an image capturing area, an image capturing method, an image capturing field angle, and an image size of the input image satisfy conditions that can be managed by the image quality improvement engine.

In the case where the image quality improvement performability determination unit determines all the image capturing conditions and determines that the image capturing conditions are manageable, the process proceeds to step S540. In contrast, in the case where the image quality improvement performability determination unit determines that the image quality improvement engine cannot manage the input image based on these image capturing conditions, the process proceeds to step S550.

Depending on the setting or the installation configuration of the image processing apparatus 101, the image quality improvement processing in step S540 can be performed even if it is determined that the input image is not processable based on the image capturing area, the image capturing method, the image capturing field angle, and part of the image size. For example, such processing may be performed assuming that the image quality improvement engine can comprehensively manage any image capturing area of the subject and is installed so as to be manageable even if the input data includes an unknown image capturing area. In addition, the image quality improvement performability determination unit may determine whether at least one of an image capturing area, an image capturing method, an image capturing field angle, and an image size of the input image satisfies a condition that can be managed by the image quality improvement engine according to a desired configuration.

In step S540, the image quality improvement unit improves the image quality of the input image using the image quality improvement engine, and generates a high-quality image more suitable for image diagnosis than the input image. Specifically, the image quality improvement unit inputs an input image to the image quality improvement engine, and causes the image quality improvement engine to generate a high-quality image with improved image quality. The image quality improvement engine generates a high-quality image on which superimposition processing is performed using the input image, based on a machine learning model obtained by performing machine learning using the teaching data. Thus, the image quality improvement engine can generate a high-quality image with reduced noise and enhanced contrast compared to the input image.

The image quality improvement unit may input the parameter to the image quality improvement engine together with the input image according to the image capture condition group, and adjust, for example, the degree of the image quality improvement. In addition, the image quality improvement unit may input a parameter corresponding to an input of the examiner to the image quality improvement engine together with the input image, and adjust, for example, the degree of the image quality improvement.

In step S550, if a high-quality image is generated in step S540, the display control unit 101-05 outputs the high-quality image and displays the high-quality image on the display unit 104. In contrast, in the case where it is determined in step S530 that the image quality improvement processing is not executable, the display control unit 101-05 outputs an input image and displays the input image on the display unit 104. The display control unit 101-05 may display the output image on the tomographic image capturing apparatus 100 or another apparatus or stored therein, instead of displaying the output image on the display unit 104. The display control unit 101-05 may process the output image for use by the tomographic image capturing apparatus 100 or another apparatus, or convert the data format depending on the setting or installation configuration of the image processing apparatus 101 so that the output image can be transmitted to, for example, an image management system.

As described above, the image processing apparatus 101 according to the present exemplary embodiment includes the image acquisition unit 101-01 and the image quality improvement unit. The image acquisition unit 101-01 acquires an input image (first image) which is an image of a predetermined region of the subject. The image quality improvement unit generates, from the input image, a high-quality image (second image) on which at least one of noise reduction and contrast emphasis has been performed as compared with the input image, using an image quality improvement engine including a machine learning engine. The image quality improvement engine includes a machine learning engine that uses an image obtained by superimposition processing as learning data.

With this configuration, the image processing apparatus 101 according to the present exemplary embodiment can output a noise-reduced or contrast-emphasized high-quality image from an input image. Therefore, the image processing apparatus 101 can acquire an image suitable for image diagnosis such as a clear image or an image in which a region desired to be observed or a lesion is emphasized with less sacrifice, without increasing the invasiveness of a photographer or a subject or increasing labor, as compared with the related art.

In addition, the image processing apparatus 101 further includes an image quality improvement performability determination unit that determines whether or not a high-quality image can be generated using the image quality improvement engine with respect to the input image. The image quality improvement performability determination unit performs the determination based on at least one of an image capturing area, an image capturing method, an image capturing field angle, and an image size of the input image.

With this configuration, the image processing apparatus 101 according to the present exemplary embodiment can omit an input image that cannot be processed by the image quality improvement unit from the target of the image quality improvement processing, and reduce the processing load of the image processing apparatus 101 and the occurrence of errors.

In the present exemplary embodiment, the display control unit 101-05 is configured to display the generated high-quality image on the display unit 104, but the operation of the display control unit 101-05 is not limited thereto. For example, the display control unit 101-05 may also output a high-quality image to the tomographic image capturing apparatus 100 or another apparatus connected to the image processing apparatus 101. Thus, high quality images may be displayed on the user interface of these devices, stored in any storage device, used for any image analysis or sent to an image management system.

In the present exemplary embodiment, the image quality improvement performability determination unit determines whether the input image is an input image whose image quality can be improved by the image quality improvement engine, and if the input image is an input image whose image quality can be improved, the image quality improvement unit performs image quality improvement. In contrast to this, in the case where the tomographic image capturing apparatus 100 performs image capturing using only image capturing conditions under which image quality improvement can be performed, the image quality of an image acquired from the tomographic image capturing apparatus 100 can be unconditionally improved. In this case, as illustrated in fig. 8, the processes in steps S520 and S530 may be omitted, and step S540 may be performed subsequent to step S510.

In the present exemplary embodiment, the display control unit 101-05 is configured to display a high-quality image on the display unit 104. However, the display control units 101 to 05 can display a high-quality image on the display unit 104 according to an instruction from the examiner. For example, the display control unit 101-05 may display a high-quality image on the display unit 104 according to the examiner pressing any button on the user interface of the display unit 104. In this case, the display control unit 101-05 may display a high-quality image by switching the displayed image from the input image, or may display a high-quality image beside the input image.

Further, when displaying a high-quality image on the display unit 104, the display control unit 101-05 may display, together with the high-quality image, a display indicating that the displayed image is a high-quality image generated by processing using a machine learning algorithm. In this case, the user can easily recognize, based on the display, that the displayed high-quality image is not an image acquired by image capturing, whereby it is possible to reduce erroneous diagnosis and enhance diagnosis efficiency. The display indicating that the displayed image is a high-quality image generated by processing using a machine learning algorithm may be any display as long as the display makes the input image and the high-quality image generated by the processing distinguishable.

The display control unit 101-05 may display, on the display unit 104, a display indicating teaching data used by a machine learning algorithm for performing learning as a display indicating that the displayed image is a high-quality image generated by processing using the machine learning algorithm. The display may include any display of the teaching data, such as a description of the type of input data and output data of the teaching data, and an image capturing area included in the input data and output data.

In the image quality improvement engine according to the present exemplary embodiment, the superimposed image is used as the output data of the teaching data, but the teaching data is not limited thereto. The output data of the teaching data may use, for example, a high-quality image obtained by performing at least one of an overlay process serving as a means of obtaining a high-quality image, a process set described below, and an image capturing method described below.

As output data of the teaching data, a high-quality image obtained by performing, for example, a maximum a posteriori probability (MAP) estimation process on the original image group can be used. In the MAP estimation process, likelihood functions are obtained from the probability densities of the respective pixel values in the plurality of low-quality images, and true signal values (pixel values) are estimated using the obtained likelihood functions.

The high-quality image obtained by the MAP estimation process becomes a high-contrast image based on pixel values close to the true signal values. Since the estimated signal value is obtained based on the probability density, noise generated randomly is reduced in a high-quality image obtained by the MAP estimation process. Thus, the image quality improvement engine can generate a high-quality image with reduced noise or high contrast suitable for image diagnosis from the input image by using the high-quality image obtained by the MAP estimation process as teaching data. The generation method of the input data and output data pair of the teaching data may be a method similar to that used in the case of using the superimposed image as the teaching data.

As output data of the teaching data, a high-quality image obtained by applying a smoothing filter process to an original image may be used. In this case, the image quality improvement engine may generate a high-quality image in which random noise is reduced from the input image. Further, as the output data of the teaching data, an image obtained by applying the gradation conversion processing to the original image may be used. In this case, the image quality improvement engine may generate a high-quality image with contrast emphasis from the input image. The generation method of the input data and output data pair of the teaching data may be a method similar to that used in the case of using the superimposed image as the teaching data.

The input data of the teaching data may be an image acquired from an image capturing apparatus having the same tendency of image quality as the tomographic image capturing apparatus 100. The output data of the teaching data may be a high-quality image obtained by high-cost processing such as successive approximation or may be a high-quality image obtained by performing image capturing of the object corresponding to the input data using an image capturing apparatus having higher performance than the tomographic image capturing apparatus 100. Further, the output data may be a high-quality image obtained by performing a rule-based noise reduction process. The noise reduction processing may include, for example, processing of replacing a single high-luminance pixel, which is apparently noise and appears in a low-luminance region, with an average value of adjacent low-luminance pixel values. Therefore, the image quality improvement engine may use, as the learning data, an image captured by an image capturing apparatus having higher performance than that of the image capturing apparatus for image capturing of the input image or an image acquired by an image capturing process including more man-hours than that of the image capturing process of the input image. For example, in a case where a motion contrast frontal image is set as an input image, the image quality improvement engine may use, as the learning data, an image obtained by performing OCTA image capturing by an OCT image capturing apparatus having higher performance than that of the OCT image capturing apparatus used in the OCTA image capturing of the input image or an image obtained in the OCTA image capturing process including more man-hours than that of the OCTA image capturing process of the input image.

Although description is omitted in the present exemplary embodiment, a high-quality image generated from a plurality of images to be used as output data of teaching data may be generated from a plurality of images that have been subjected to positional alignment. For example, the position alignment process may be performed in the following manner. More specifically, one of the plurality of images is selected as a template, a degree of similarity with another image is obtained while changing the position and angle of the template, a positional shift amount with respect to the template is obtained, and the respective images are corrected based on the positional shift amount. Another type of arbitrary existing position alignment process may also be performed.

In the case of aligning the positions of three-dimensional images, the positional alignment of the three-dimensional images may be performed by decomposing the three-dimensional images into a plurality of two-dimensional images and integrating the two-dimensional images each having independently undergone the positional alignment. In addition, the positional alignment of the two-dimensional images can be performed by decomposing the two-dimensional images into one-dimensional images and integrating the one-dimensional images each having independently undergone positional alignment. This positional alignment may not be performed on the image but on the data used to generate the image.

In the present exemplary embodiment, in a case where the image quality improvement performability determination unit determines that the input image can be managed by the image quality improvement unit, the process proceeds to step S540, and the image quality improvement process is started by the image quality improvement unit. In contrast to this, the display control unit 101-05 may display the determination result obtained by the image quality improvement performability determination unit on the display unit 104, and the image quality improvement unit may start the image quality improvement processing according to an instruction from the examiner. At this time, the display control unit 101-05 may display the input image and the image capturing condition such as the image capturing area acquired for the input image on the display unit 104 together with the determination result. In this case, since the image quality improvement processing is performed after the examiner determines whether the determination result is correct, the image quality improvement processing based on the erroneous determination can be reduced.

The display control unit 101-05 may also display the input image and the image capturing condition such as the image capturing area acquired for the input image on the display unit 104, and the image quality improvement unit may start the image quality improvement processing according to an instruction from the examiner without performing the determination by using the image quality improvement performability determination unit.

[ fifth exemplary embodiment ]

Now, an image processing apparatus according to a fifth exemplary embodiment will be described with reference to fig. 14. In the present exemplary embodiment, a description will be given of the following examples: the display control unit 101-05 displays the processing result of the image quality improvement unit in the image processing apparatus 101 (or the image processing unit 101-04) on the display unit 104. In the present exemplary embodiment, a description will be given with reference to fig. 14, but the display screen is not limited thereto. The image quality improvement processing can be similarly applied to a display screen in which a plurality of images obtained at different dates and times are displayed side by side as performed later. The image quality improvement processing can be similarly applied to a display screen (like an image capture confirmation screen) in which the examiner confirms success/failure of image capture immediately after image capture.

Unless otherwise specified, the configuration and processing of the image processing apparatus according to the present exemplary embodiment are similar to those of the image processing apparatus 101 according to the first exemplary embodiment. Therefore, hereinafter, the image processing apparatus according to the present exemplary embodiment will be described mainly based on the difference from the image processing apparatus according to the first exemplary embodiment.

The display control unit 101-05 can display the plurality of high-quality images generated by the image quality improvement unit and the low-quality images that have not undergone the image quality improvement on the display unit 104. Thus, a low-quality image and a high-quality image can be output according to the instruction of the examiner.

Hereinafter, fig. 14 illustrates an example of the interface 3400. Fig. 14 illustrates an overall screen 3400, a patient tab 3401, an image capture tab 3402, a report tab 3403, and a settings tab 3404. The slash shading of the report tab 3403 indicates the activation state of the report screen. In the present exemplary embodiment, a description will be given of an example in which a report screen is displayed. In Im3406, an En-Face image Im3407 of OCTA is displayed superimposed on the SLO image Im 3405. The SLO image is a front image of the fundus acquired by a Scanning Laser Ophthalmoscope (SLO) optical system (not shown). The report screen includes En-Face images Im3407 and Im3408, En-Face image Im3409 in luminance, and tomographic images Im3411 and Im3412 of OCTA.

Boundary lines

3413 and 3414 indicating the vertical ranges of the En-Face images Im3407 and Im3408 of the OCTA, respectively, are displayed superimposed on the tomographic image. The button 3420 is a button for issuing an execution instruction of the image quality improvement processing. As described below, the button 3420 may be a button for issuing a display instruction of a high-quality image.

In the present exemplary embodiment, the execution of the image quality improvement processing is performed by specifying the button 3420, or it is determined whether the image quality improvement processing is to be executed based on the information stored in the database. First, a description will be given of an example of switching between display of a high-quality image and display of a low-quality image by the designation button 3420 in accordance with an instruction from the examiner. A description will be given assuming that the target image of the image quality improvement processing is an En-Face image of the oca. When the screen is turned to a report screen by the examiner designating the report tab 3403, En-Face images Im3407 and Im3408 of the oca having low image quality are displayed. Thereafter, by the examiner specifying button 3420, the image quality improvement unit performs image quality improvement processing on the images Im3407 and Im3408 displayed on the screen. After the image quality improvement processing is completed, the display control unit 101-05 displays the high-quality image generated by the image quality improvement unit on the report screen. Since Im3406 displays image Im3407 superimposed on SLO image Im3405, image Im3406 also displays an image that has been subjected to image quality improvement processing. Then, the display of the button 3420 becomes an activated state so that it can be seen that the image quality improvement processing has been performed. The execution timing of the processing in the image quality improvement unit is not limited to the timing at which the examiner designates the button 3420. The types of En-Face images Im3407 and Im3408 of the occa to be displayed when the report screen is opened are preliminarily identified, and thus the image quality improvement processing may be performed when the screen is transitioned to the report screen. The display control unit 101-05 can also display a high-quality image on the report screen at the timing when the button 3420 is pressed. Further, the types of images to be subjected to the image quality improvement processing according to an instruction from the examiner or when the screen is changed to the report screen need not be two types. Processing may be performed on images that will most likely be displayed. For example, processing may be performed on multiple En-Face images of the OCTA, such as the superficial layer (Im2910), the deep layer (Im2920), the outer layer (Im2930), the choroidal vascular network (Im2940) as illustrated in fig. 13. In this case, the image obtained by performing the image quality improvement processing may be temporarily stored in the memory, or may be stored in the database.

Now, a description will be given of the following cases: the image quality improvement processing is performed based on the information stored in the database. In the case where the execution state of the image quality improvement processing is stored in the database, when the picture is changed to the report picture, the image quality improvement processing is executed and the obtained high-quality image is displayed by default. By displaying the button 3420 in the activated state by default, the examiner can then recognize that the image quality improvement processing has been performed and that the obtained high-quality image is displayed. In the case where the examiner displays a low-quality image that has not been subjected to the image quality improvement processing, the low-quality image can be displayed by designating the button 3420 and canceling the activated state. In a case where the examiner restores the displayed image to a high-quality image, the examiner designates a button 3420. Whether to perform the image quality improvement processing (to be stored in the database) is specified hierarchically by, for example, specifying collectively all data stored in the database or specifying for each piece of image capturing data (each examination). For example, in the case where a state in which the image quality improvement processing is performed is stored for the entire database, in the case where an inspector stores a state in which the image quality improvement processing is not performed for individual image-captured data (individual inspection), when the image-captured data is displayed next time, display is performed in a state in which the image quality improvement processing is not performed. In order to store the execution state of the image quality improvement processing for each piece of image capturing data (each examination), a user interface (not shown) (for example, a save button) may be used. In addition, when the displayed data is changed to another image capturing data (another examination) or another patient data (for example, the screen becomes a display screen other than the report screen according to an instruction from the examiner), a state in which the image quality improvement processing is performed may be stored based on a display state (for example, the state of the button 3420). With this configuration, in the case where whether or not the image quality improvement processing is to be performed is not specified for each piece of image capturing data (for each check), the processing can be performed based on the information specified for the entire database, and in the case where whether or not the image quality improvement processing is to be performed is specified for each piece of image capturing data (for each check), the processing can be performed independently based on the information.

In the present exemplary embodiment, images Im3407 and Im3408 are displayed as En-Face images of the OCTA, but the En-Face images of the OCTA to be displayed may be changed by designation of an inspector. Therefore, change of the image when execution of the image quality improvement processing (the button 3420 in the activated state) is designated will be described.

The changing of the image is performed using a user interface (not shown), such as a combo box. For example, when the examiner changes the image type from the superficial layer to the choroidal vascular network, the image quality improvement unit performs image quality improvement processing on the choroidal vascular network image, and the display control unit 101-05 displays the high-quality image generated by the image quality improvement unit on the report screen. In other words, the display control unit 101-05 can change, at least partially, the display of the high-quality image in the first depth range to the display of the high-quality image in the second depth range different from the first depth range according to an instruction from the examiner. At this time, by changing the first depth range to the second depth range according to an instruction from the examiner, the display control unit 101-05 changes the display of the high-quality image in the first depth range to the display of the high-quality image in the second depth range. As described above, in the case where a high-quality image has been generated for an image that is highly likely to be displayed when the screen is changed to the report screen, it is only necessary for the display control unit 101-05 to display the generated high-quality image. The method of changing the image type is not limited to the above method, and an En-Face image of the oca may be generated at a different depth range set by changing the reference layer and the offset value. In this case, when the reference layer or the offset value is changed, the image quality improvement unit performs the image quality improvement processing on an arbitrary En-Face image of the oca, and the display control unit 101-05 displays the high-quality image on the report screen. The changing of the reference layer or the offset value may be performed using a user interface (not shown), such as a combo box or a text box. In addition, by dragging either one of the boundary lines 3413 and 3414 (moving layer boundaries) displayed superimposed on the tomographic images Im3411 and Im3412, the generation range of the En-Face image of the oca can be changed. When the boundary line is changed by dragging, the execution command of the image quality improvement process is continuously executed. Thus, the image quality improvement unit may constantly perform the processing in response to the execution command, or may perform the processing after changing the layer boundary by the drag. Alternatively, the execution command of the image quality improvement processing is continuously executed, but the previous command may be cancelled and the latest command may be executed when the next command is issued. In some cases, the image quality improvement processing takes a long time. Therefore, even if the command is executed at any of the above-described timings, it sometimes needs to take a long time before displaying a high-quality image. In view of the above, during a period from when the depth range for generating the En-Face image of the occa is set according to an instruction from the examiner until the high-quality image is displayed, the En-Face image of the occa (low-quality image) corresponding to the set depth range may be displayed. In other words, when the above-described depth range is set, the En-Face image (low-quality image) of the oca corresponding to the set depth range may be displayed, and when the image quality improvement processing ends, the display of the En-Face image (low-quality image) of the oca may be changed to the display of the high-quality image. Alternatively, the information indicating that the image quality improvement processing is being performed may be displayed during a period from when the above-described depth range is set until a high-quality image is displayed. These can be applied not only to the case on the premise of the state in which execution of the image quality improvement processing has been designated (the button 3420 is in the activated state), but also to the period before the high-quality image is displayed when an execution instruction of the image quality improvement processing is issued, for example, according to an instruction from an inspector.

In the present exemplary embodiment, a description has been given of the following examples: images Im3407 and Im3408 of different layers are displayed as En-Face images of the oca, and a low-quality image and a high-quality image are switched to be displayed, but the configuration is not limited thereto. For example, images Im3407 and Im3408 may be adjacently displayed as a low quality En-Face image of OCTA and a high quality En-Face image of OCTA, respectively. In the case of switching the display of the images, the images are switched at the same position, whereby the comparison of the changed portions can be easily performed. In the case where images are displayed adjacently, the images can be displayed simultaneously, and therefore comparison of the entire images can be easily performed.

The analysis units 101 to 46 may perform image analysis on the high-quality images generated by the image quality improvement processing. The image analysis of the En-Face image of the OCTA, the image quality of which has been improved, can detect a position (blood vessel region) corresponding to a blood vessel from the image by applying an arbitrary binarization process. By obtaining the ratio of the positions detected in the image corresponding to the blood vessels, the region density can be analyzed. In addition, by thinning the position corresponding to the blood vessel which has been subjected to the binarization processing, an image having a line width of one pixel can be obtained, and a ratio of the blood vessel independent of the thickness (also referred to as skeleton density) can also be obtained. Using these images, the area or shape (e.g., circularity) of the avascular region (FAZ) may be analyzed. As an analysis method, the above numerical value may be calculated from the entire image, or may be calculated for a specified region of interest (ROI) based on an instruction of an examiner (user) using a user interface (not shown). The setting of the ROI is not always specified by the examiner, and an automatically predetermined region may be specified. The various parameters described above are examples of the analysis result related to the blood vessel, and the parameter may be any parameter as long as the parameter is related to the blood vessel. The analysis units 101-46 may perform various types of image analysis processing. In other words, a description has been given of an example in which the analysis units 101 to 46 analyze the En-Face image of the OCTA, but the analysis is not limited thereto. The analysis units 101-46 can simultaneously perform retinal layer segmentation, layer thickness measurement, head three-dimensional shape analysis, and lamina cribrosa analysis on the OCT acquired images. In connection with this, the analysis units 101 to 46 can execute part or all of various types of image analysis processing according to instructions issued by the examiner via any input device.

At this time, the display control unit 101-05 displays the high-quality image generated by the image quality improvement unit and the analysis result obtained by the analysis unit 101-46 on the display unit 104. The display control unit 101-05 may output the high quality image and the analysis result to a separate display unit or device. The display control unit 101-05 may also display only the analysis result on the display unit 104. Further, in the case where the analysis units 101 to 46 output a plurality of analysis results, the display control unit 101 to 05 may output part or all of the plurality of analysis results to the display unit 104 or another device. For example, the display control unit 101-05 may display the analysis result regarding the blood vessel in the En-Face image of the OCTA as a two-dimensional map on the display unit 104. The display control unit 101-05 may also display a value indicating the analysis result regarding the blood vessel in the En-Face image of the OCTA superimposed on the En-Face image of the OCTA on the display unit 104. In this way, image analysis is performed using a high-quality image in the image processing apparatus 101 according to the present exemplary embodiment, whereby the accuracy of analysis can be improved.

Now, execution of the image quality improvement processing at the time of the screen transition will be described with reference to fig. 14A and 14B. Fig. 14A illustrates a screen example in which the OCTA image illustrated in fig. 14B is displayed in an enlarged manner. In fig. 14A, a button 3420 is also displayed similarly to fig. 14B. The screen illustrated in fig. 14B is changed to the screen in fig. 14A by, for example, double-clicking the OCTA image. By pressing the off button 3430, the screen illustrated in fig. 14A transitions to the screen in fig. 14B. The screen transition method is not limited to the method described herein, and a user interface (not shown) may be used.

In a case where execution of the image quality improvement processing is designated at the screen transition time (the button 3420 is in the activated state), the state is also maintained at the screen transition time. More specifically, in the case where the screen is changed to the screen illustrated in fig. 14A in the state where the high-quality image is displayed on the screen in fig. 14B, the high-quality image is also displayed on the screen illustrated in fig. 14A. Then, the button 3420 is set to the activated state. The same applies to the case where the picture in fig. 14A is changed to the picture in fig. 14B. By designating the button 3420 on the screen in fig. 14A, the display can also be switched to a low-quality image.

The screen transition is not limited to the screens illustrated in fig. 14A and 14B. The screen transition while maintaining the display state of the high-quality image is sufficient as long as the screen transition will be a screen displaying the same image capture data, such as a display screen for follow-up or a panoramic display screen. In other words, an image corresponding to the state of the button 3420 on the display screen before transition is displayed on the display screen after transition. For example, if the button 3420 is in the activated state on the display screen before the transition, a high-quality image is displayed on the display screen after the transition. For example, if the activation state of the button 3420 is canceled on the display screen before the transition, a low-quality image is displayed on the display screen after the transition. If the button 3420 is brought into the activated state on the display screen for follow-up, a plurality of images obtained at different dates and times (different inspection days) displayed adjacently on the display screen for follow-up can be switched to a high-quality image. In other words, if the button 3420 enters the activated state on the display screen for follow-up, the activated state can be completely reflected in a plurality of images obtained at different dates and times.

Fig. 11 illustrates an example of a display screen for follow-up. If the tab 3801 is selected according to an instruction from the examiner, a display screen for follow-up is displayed as shown in fig. 11. At this time, the depth range may be changed by the examiner selecting the depth range of the measurement target region from the predetermined depth range groups (3802 and 3803) displayed in the list box. For example, the retinal surface layer is selected in the list box 3802, and the retinal deep layer is selected in the list box 3803. The analysis result of the moving contrast image of the retinal surface layer is displayed in the upper display region, and the analysis result of the moving contrast image of the retinal deep layer is displayed in the lower display region. In other words, when the depth range is selected, the display of the plurality of images obtained at different dates and times becomes together a parallel display of the analysis results of the plurality of motion contrast images in the selected depth range.

At this time, if the display of the analysis result is set to the unselected state, the display may be changed together to the parallel display of a plurality of motion contrast images obtained at different dates and times. If the button 3420 is designated according to an instruction from the examiner, the display of the plurality of motion contrast images becomes the display of the plurality of high quality images together.

In the case where the display of the analysis result is in the selected state, when the button 3420 is designated in accordance with an instruction from the examiner, the display of the analysis results of the plurality of motion contrast images together becomes the display of the analysis results of the plurality of high-quality images. The display of the analysis result may be a superimposed display of the analysis result on the image in an arbitrary transparency. At this time, the change in the display of the analysis result may be, for example, a change in a state in which the analysis result is superimposed on the displayed image in an arbitrary transparency. In addition, the change in the display of the analysis result may be, for example, a change in the display of an image (e.g., a two-dimensional map) obtained by performing a blending process of the analysis result and the image in an arbitrary transparency.

The type of layer boundary and offset position to be used to specify the depth range may be changed together from a user interface such as user interfaces 3805 and 3806. By displaying the tomographic images together and moving the layer boundary data superimposed on the tomographic images in accordance with an instruction from the examiner, the depth ranges of the plurality of motion contrast images obtained at different dates and times can all be changed. At this time, if a plurality of tomographic images obtained at different dates and times are adjacently displayed and the above-described movement is performed on one tomographic image, the layer boundary data can also be similarly moved on the other tomographic images. For example, the presence or absence of the image projection method and the projection artifact suppression processing can be changed by selection from a user interface such as a context menu. A selection screen may also be displayed by selecting the selection button 3807, and an image selected from an image list displayed on the selection screen may be displayed. An arrow 3804 displayed in an upper part of fig. 11 is a mark indicating a currently selected examination, and a reference examination (baseline) is an examination selected in follow-up image capturing (leftmost image of fig. 11). A mark indicating a reference check may also be displayed on the display unit 104.

In a case where the "show difference" check box 3808 is designated, the distribution of the measurement values (map or sector map) for the reference image is displayed on the reference image. In this case, a difference measurement value map is displayed. The difference measurement value map is based on a difference between a measurement value distribution calculated for the reference image in the area corresponding to the other examination day and a measurement distribution calculated for the image displayed in the area. As a result of the measurement, a trend graph (a graph of the measured values of the image of the corresponding inspection day obtained by the time-varying measurement) may be displayed on the report screen. In other words, time-series data (e.g., time-series chart) of a plurality of analysis results corresponding to a plurality of images obtained at different dates and times can be displayed. At this time, the analysis results relating to dates and times other than the dates and times corresponding to the plurality of displayed images can also be displayed as time-series data in a state distinguishable from the plurality of analysis results corresponding to the plurality of displayed images (for example, the colors of the respective points on the time-series graph are different depending on the presence or absence of image display). The regression line (curve) of the trend graph and the corresponding formula may also be displayed on the report screen.

In the present exemplary embodiment, the motion contrast image has been described, but the present exemplary embodiment is not limited to the motion contrast image. The image related to the processing (such as display, image quality improvement, and image analysis) according to the present exemplary embodiment may be a tomographic image. Further, the image is not limited to the tomographic image, and may be a different image such as an SLO image, a fundus photograph, or a fluorescein fundus photograph. In this case, the user interface for performing the image quality improvement processing may include a user interface for specifying the performance of the image quality improvement processing for a plurality of images of different types and a user interface for selecting an arbitrary image from the plurality of images of different types and specifying the performance of the image quality improvement processing.

With this configuration, the display control unit 101-05 can display an image processed by the image quality improvement unit (not illustrated) according to the present exemplary embodiment on the display unit 104. At this time, as described above, in the case where at least one of the display of the high-quality image, the display of the analysis result, or the plurality of conditions relating to the depth range of the front image to be displayed is in the selected state, even if the display screen transitions, the selected state can be maintained.

In addition, in the case where at least one of the plurality of conditions is in the selected state, even if the state is changed to a state in which another condition is selected, the selected state of the at least one condition can be maintained, as described above. For example, in the case where the display of the analysis result is in the selected state, the display control unit 101-05 may change the display of the analysis result of the low-quality image to the display of the analysis result of the high-quality image according to an instruction from the examiner (for example, if the button 3420 is designated). In the case where the display of the analysis result is in the selected state, the display control unit 101-05 may change the display of the analysis result of the high-quality image to the display of the analysis result of the low-quality image according to an instruction from the examiner (for example, if the designation of the button 3420 is cancelled).

In the case where the display of the high-quality image is in the unselected state, the display control unit 101-05 may change the display of the analysis result of the low-quality image to the display of the low-quality image according to an instruction from the inspector (for example, if the designation of the display of the analysis result is cancelled). In the case where the display of the high-quality image is in the unselected state, the display control unit 101-05 may change the display of the low-quality image to the display of the analysis result of the low-quality image according to an instruction from the inspector (for example, if the display of the analysis result is specified). In the case where the display of the high-quality image is in the selected state, the display control unit 101-05 may change the display of the analysis result of the high-quality image to the display of the high-quality image according to an instruction from the inspector (for example, if the designation of the display of the analysis result is cancelled). In the case where the display of the high-quality image is in the selected state, the display control unit 101-05 may change the display of the high-quality image to the display of the analysis result of the high-quality image according to an instruction from the inspector (for example, if the display of the analysis result is specified).

A case will be considered where the display of the high-quality image is in the unselected state and the display of the analysis result of the first type is in the selected state. In this case, the display control unit 101-05 may change the display of the first type of analysis result of the low-quality image to the display of the second type of analysis result of the low-quality image according to an instruction from the examiner (for example, if the display of the second type of analysis result is specified). A case will be considered where the display of the high-quality image is in the selected state and the display of the first type of analysis result is in the selected state. In this case, the display control unit 101-05 may change the display of the first type of analysis result of the high-quality image to the display of the second type of analysis result of the high-quality image according to an instruction from the examiner (for example, if the display of the second type of analysis result is specified).

In the display screen for follow-up, these display changes can be completely reflected in the plurality of images obtained at different dates and times, as described above. The display of the analysis result may be a superimposed display of the analysis result on the image in an arbitrary transparency. The display of the analysis result may be a superimposed display of the analysis result on the image in an arbitrary transparency. At this time, the change in the display of the analysis result may be, for example, a change in a state in which the analysis result is superimposed on the displayed image in an arbitrary transparency. The change in the display of the analysis result may be, for example, a change in the display of an image (e.g., a two-dimensional map) obtained by performing a blending process of the analysis result and the image with arbitrary transparency.

(modification 1)

In the above-described exemplary embodiment, the display control unit 101-05 may display an image selected from among the high-quality image and the input image generated by the image quality improvement unit according to an instruction from the examiner on the display unit 104. The display control unit 101-05 can also switch the display on the display unit 104 from the captured image (input image) to a high-quality image according to an instruction from the examiner. In other words, the display control unit 101-05 can change the display of the low-quality image to the display of the high-quality image according to an instruction from the examiner. In addition, the display control unit 101-05 can change the display of the high-quality image to the display of the low-quality image according to an instruction from the examiner.

Further, the image quality improvement unit in the image processing apparatus 101 (or the image processing unit 101-04) may perform, in accordance with an instruction from the examiner, the start of the image quality improvement processing performed by the image quality improvement engine (learned model for image quality improvement) (input of an image to the image quality improvement engine), and the display control unit 101-05 may display the high-quality image generated by the image quality improvement unit on the display unit 104. In contrast to this, if an input image is captured by an image capturing apparatus (tomographic image capturing apparatus 100), the image quality improvement engine may automatically generate a high-quality image based on the input image, and the display control unit 101-05 may display the high-quality image on the display unit 104 according to an instruction from the examiner. The image quality improvement engine includes a learned model that performs the above-described image quality enhancement processing (image quality improvement processing).

These types of processing can also be similarly performed for the output of the analysis result. In other words, the display control unit 101-05 can change the display of the analysis result of the low-quality image to the display of the analysis result of the high-quality image according to an instruction from the examiner. In addition, the display control unit 101-05 may change the display of the analysis result of the high-quality image to the display of the analysis result of the low-quality image according to an instruction from the examiner. The display control unit 101-05 may also change the display of the analysis result of the low-quality image to the display of the low-quality image according to an instruction from the examiner. The display control unit 101-05 may also change the display of the low-quality image to the display of the analysis result of the low-quality image according to an instruction from the examiner. The display control unit 101-05 can also change the display of the analysis result of the high-quality image to the display of the high-quality image according to an instruction from the examiner. The display control unit 101-05 can also change the display of the high-quality image to the display of the analysis result of the high-quality image according to an instruction from the examiner.

The display control unit 101-05 may also change the display of the analysis result of the low-quality image to another type of display of the analysis result of the low-quality image according to an instruction from the examiner. The display control unit 101-05 may also change the display of the analysis result of the high-quality image to another type of display of the analysis result of the high-quality image according to an instruction from the examiner.

The display of the analysis result of the high-quality image may be a superimposed display of the analysis result of the high-quality image on the high-quality image with an arbitrary transparency. The display of the analysis result of the low-quality image may also be a superimposed display of the analysis result of the low-quality image on the low-quality image with an arbitrary transparency. At this time, the change in the display of the analysis result may be, for example, a change in a state in which the analysis result is superimposed on the displayed image in an arbitrary transparency. The change in the display of the analysis result may also be, for example, a change in the display of an image (e.g., a two-dimensional map) obtained by performing a blending process of the analysis result and the image with arbitrary transparency.

In the various exemplary embodiments described above, the processing to be performed on the set region of interest is not limited to analysis processing, and may be, for example, image processing. The image processing may be any image processing such as contrast processing, gradation conversion processing, super-resolution processing, or smoothing processing. Even after the display screen is transitioned to another display screen, a blended image obtained by performing blending processing with the transmittance set before the transition can be displayed. For example, after the display screen is shifted to the display screen for follow-up, a plurality of mixed images obtained by performing the mixing process at the transmittance set before the shift may be adjacently displayed as a plurality of images obtained at different dates and times. Further, when a similar slide bar is displayed on the display screen for follow-up and the transmittance is set (changed) according to an instruction from the examiner, the set transmittances may be reflected together on a plurality of images obtained at different dates and times. In other words, if the transmittance is set (changed), a plurality of mixed images obtained by performing the mixing process at the set transmittance can be displayed. The screens on which the mixing process can be performed are not limited to these display screens. It is only necessary that the blending process can be performed on at least one of the image capture confirmation screen, the report screen, and the preview screen (display screen on which various live moving images are displayed) for various types of adjustment before image capture.

(modification 2)

In the various exemplary embodiments and modifications described above, the transmittance (transmission coefficient) to be used in the mixing process need not always be set according to an instruction from the inspector, and may be set automatically, or may be set semi-automatically. For example, a learned model obtained by performing machine learning using learning data may be used; in the learning data, a medical image such as at least one of an OCT image and an occa image of a mutually corresponding region is set as input data, and a transmittance set according to an instruction from an examiner is set as correct data (teaching data). In other words, the transmittance setting unit may be configured to generate a new transmittance from a medical image such as at least one of an OCT image and an occa image of the mutually corresponding region using the above-described learned model. At this time, the above-described learned model may be, for example, a learned model obtained by additionally performing learning using learning data in which the transmittance determined (changed) according to an instruction from the examiner is set as correct data. The above-described learned model may be, for example, a learned model obtained by additionally performing learning using learning data in which a transmittance that changes from a new transmittance (a transmittance obtained using the learned model) according to an instruction from the examiner is set as correct data. With this configuration, for example, it is possible to acquire a new transmittance to be set in consideration of the tendency of the transmittance desired by the examiner for the medical image. In other words, the transmittance setting unit customized by the inspector can be accurately formed. This can enhance the efficiency of diagnosis by the examiner. For example, the OCT image and the occa image of the mutually corresponding region may be images obtained using at least part of the common interference signal.

The above-described learned model may be obtained by machine learning using learning data. Machine learning includes, for example, deep learning including a multi-layer neural network. At least in part of a multi-layer neural network, for example, a Convolutional Neural Network (CNN) may be used as the machine learning model. Furthermore, techniques related to autoencoders may be used, at least in part of a multi-layer neural network. In addition, learning may be performed using techniques related to back propagation. However, the machine learning is not limited to the deep learning, but may be any learning as long as the learning uses a model that can extract (represent) a learned feature amount such as an image by performing the learning with respect to the model itself. In addition, the learned model is a model preliminarily trained (learned) using appropriate learning data for a machine learning model based on an arbitrary machine learning algorithm. However, additional learning may also be performed without prohibiting further learning of the learned model. In addition, the learning data includes pairs of input data and output data (correct data). Although the learning data is referred to as teaching data in some cases, the correct data is referred to as teaching data in other cases. In addition, the learned model can be updated by additional learning to be customized to a model suitable for, for example, an operator. The learned model in this modification is not limited to the learned model obtained by additionally performing learning, but may be any learned model as long as the learned model is obtained by performing learning using learning data including the medical image and the information on the transmittance.

The above-described learned model may be a learned model obtained by performing learning using learning data including input data having a set of a plurality of medical images of different types of a predetermined region of the subject. At this time, examples of the input data included in the learning data include input data having a set of a motion contrast front image and a luminance front image (or a luminance tomographic image) of the fundus and input data having a set of a tomographic image (B-scan image) and a color fundus image (or a fluorescein fundus image) of the fundus. The plurality of medical images of different types may be any medical images as long as the medical images are acquired by different modalities, different optical systems, or different principles. The above-described learned model may be a learned model obtained by performing learning using learning data including input data having a set of a plurality of medical images of different regions of the subject. At this time, examples of the input data included in the learning data include input data having a set of a tomographic image of the fundus (B-scan image) and a tomographic image of the anterior segment (B-scan image), and input data having a set of a three-dimensional OCT image of the fundus macula and a circular scan (or raster scan) tomographic image of the fundus optic nerve head. The input data included in the learning data may be a plurality of medical images of different regions and different types of the subject. At this time, examples of the input data included in the learning data include input data having a set of a tomographic image of an anterior ocular segment and a color fundus image. The above-described learned model may be a learned model obtained by performing learning using learning data including input data having a set of a plurality of medical images of different image capturing field angles of a predetermined region of the subject. The input data included in the learning data may be an image obtained by combining a plurality of medical images obtained by time-dividing a predetermined region into a plurality of regions, such as a panoramic image. The input data included in the learning data may be input data of a set of a plurality of medical images having predetermined regions of the subject obtained at different dates and times.

From the default setting of the new transmittance obtained using the above-described learned model, the transmittance may be changeable according to an instruction from the examiner. Further, whether to use the changed transmittance as learning data for additional learning may be selectable according to an instruction from the inspector. In addition, by the ROI being set on the mixed image, it is possible to select to use the transmittance set (changed) when the SOI is set as the learning data for additional learning at the same time as the setting.

(modification 3)

The display control units 101-05 in the various exemplary embodiments and modifications described above can display the analysis results such as the layer thickness of the desired layer and various blood vessel densities on the report screen of the display screen. The display control unit 101-05 may also display, as an analysis result, a value (distribution) of a parameter related to a target area including at least one of: optic nerve head portion, macular region, vascular region, nerve fiber band, vitreous body region, macular region, choroidal region, scleral region, cribrosa region, retinal boundary end, optic cell, blood vessel wall boundary, blood vessel outer boundary, ganglion cell, corneal region, horn region and Schlemm's canal (canal of Schlemm). At this time, by analyzing, for example, a medical image to which reduction processing of various artifacts is applied, an accurate analysis result can be displayed. The artifact may be, for example, an artifact region resulting from light absorption caused by a blood vessel region, a projection artifact, or a band-like artifact in a front image generated in the main scanning direction of the measurement light depending on the state (e.g., motion, blink) of the subject's eye. The artifact may be any artifact as long as the artifact is, for example, an image capture failure region randomly generated on a medical image of a predetermined region of the subject each time image capture is performed. In addition, the value (distribution) of the parameter related to the region including at least one of the above-described various artifacts (image capturing failure region) may be displayed as the analysis result. In addition, the value (distribution) of the parameter related to the region including at least one of abnormal regions such as drusen, neovasculature, achromatosis (hard exudate) or pseudodrusen may be displayed as the analysis result.

The analysis result may be displayed by a sector or an analysis map indicating the statistical value corresponding to each divided area. The analysis result may be an analysis result generated using a learned model (an analysis result generation engine for a learned model of analysis result generation) obtained by performing learning using an analysis result of the medical image as learning data. At this time, the learned model may be a learned model obtained by learning using learning data including a medical image and an analysis result of the medical image or learning data including a medical image and an analysis result of a medical image of a different type from the medical image. In addition, the learned model may be a learned model obtained by learning using learning data including input data having a set of a plurality of medical images of different types of a predetermined region such as a luminance front image and a motion contrast front image. The luminance front image corresponds to the En-Face image of the tomographic image, and the motion contrast front image corresponds to the En-Face image of the OCTA. In addition, analysis results obtained using high-quality images generated by the learned model for image quality improvement may be displayed. The model for learning of image quality improvement may be a model obtained by performing learning of learning data in which a first image is set as input data and a second image having higher image quality than the first image is set as correct data. At this time, the second image may be a high-quality image on which contrast improvement or noise reduction is performed by, for example, superimposition processing of the plurality of first images (for example, averaging processing of the plurality of first images obtained by performing position alignment).

The input data included in the learning data may be a high-quality image generated by a model for learning of image quality improvement, or may be a set of a low-quality image and a high-quality image. The learning data may be data obtained as correct data (supervised learning) by labeling (annotating) the input data with information including, for example, at least one of: analysis values (e.g., average, median) obtained by analyzing the analysis area, a table including the analysis values, an analysis map, and a position of the analysis area such as a sector in the image. The analysis result obtained by the learned model for analysis result generation may be displayed according to an instruction from the examiner. For example, the image processing unit 101-04 may generate an image analysis result related to at least one medical image of a plurality of medical images to be subjected to the blending process from the at least one medical image using a learned model for analysis result generation (different from a learned model for image quality improvement). In addition, for example, the display control unit 101-05 may display the image analysis result obtained from the above-described at least one medical image using the learned model for analysis result generation on the display unit 104.

The display control units 101-05 in the various exemplary embodiments and modifications described above may display various diagnostic results such as glaucoma or age-related macular degeneration on the report screen of the display screen. At this time, by analyzing, for example, a medical image to which the above-described reduction processing of various artifacts is applied, an accurate diagnosis result can be displayed. As a result of the diagnosis, the position of the identified abnormal region may be displayed on the image, or the state of the abnormal region may be displayed by characters. In addition, the classification result (e.g., Curtin classification) of the abnormal region may be displayed as a diagnosis result. As a result of the classification, information indicating, for example, the likelihood of each abnormal region (for example, a numerical value indicating a percentage) may be displayed. Alternatively, information necessary for the doctor to confirm the diagnosis may be displayed as the diagnosis result. As the above-described necessary information, for example, a recommendation such as additional image capturing can be considered. For example, in the case where an abnormal region is detected in a blood vessel region in an OCTA image, it may be shown that fluorescein image capturing which uses a contrast agent and can observe the blood vessel more finely than the OCTA will be additionally performed.

The diagnosis result may be generated using a learned model (diagnosis result generation engine, learned model for diagnosis result generation) obtained by learning using the diagnosis result of the medical image as learning data. The learned model may be a learned model obtained by learning using learning data including a medical image and a diagnosis result of the medical image or learning data including a medical image and a diagnosis result of a medical image of a different type from the medical image. In addition, it is possible to display a diagnosis result obtained by using a high-quality image generated by a learned model for image quality improvement. For example, the image processing unit 101-04 may generate a diagnosis result related to at least one medical image of a plurality of medical images to be subjected to the blending process from the at least one medical image using a learned model for diagnosis result generation (different from a learned model for image quality improvement). Further, for example, the display control unit 101-05 may display the diagnosis result obtained from the above-described at least one medical image using the learned model for generating the diagnosis result on the display unit 104.

The input data included in the learning data may be a high-quality image generated by a model for learning of image quality improvement, or may be a set of a low-quality image and a high-quality image. In addition, the learning data may be data obtained as correct data (supervised learning) by labeling (annotating) the input data with information including, for example, at least one of: a diagnosis name, a type or state (degree) of a lesion (abnormal region), a position of the lesion in the image, a position of the lesion with respect to the target region, a finding (interpretation finding), a basis of the diagnosis name (for example, positive medical support information, etc.), and a basis of denying the diagnosis name (for example, negative medical support information). The diagnosis result obtained by the learned model for diagnosis result generation may be displayed according to an instruction from the examiner.

The display control unit 101-05 in the above-described various exemplary embodiments and modifications may display the object recognition result (object detection result) or the division result of the above-described target region, artifact, or abnormal region on the report screen of the display screen. At this time, for example, a rectangular frame may be displayed in the vicinity of the object on the image in a superimposed manner. Alternatively, for example, colors may be displayed at an object in the image in an overlapping manner. The object recognition result or the segmentation result may be a result generated using a learned model (an object recognition engine, a learned model for object recognition, a segmentation engine, a learned model for segmentation) obtained by performing learning using learning data obtained by labeling (annotating) a medical image with information indicating object recognition or segmentation as correct data. The analysis result generation or the diagnosis result generation may be obtained by using the object recognition result or the segmentation result. For example, the process of analysis result generation or diagnosis result generation may be performed on a target region obtained by the process of object recognition or segmentation.

In the case where an abnormal region is detected, the image processing unit 101-04 may use a generative countermeasure network (GAN) or a variational self-encoder (VAE). For example, a depth convolution gan (dcgan) including a generator obtained by learning the generation of a tomographic image and a discriminator obtained by learning the discrimination between a new tomographic image generated by the generator and a real fundus front image may be used as the machine learning model.

For example, in the case of using DCGAN, a hidden variable is obtained by encoding an input tomographic image by a discriminator, and a generator generates a new tomographic image based on the hidden variable. Thereafter, a difference between the input tomographic image and the generated new tomographic image can be extracted as an abnormal region. For example, in the case of using VAE, an input tomographic image is encoded by an encoder to obtain a hidden variable, and the hidden variable is decoded by a decoder to generate a new tomographic image. Thereafter, a difference between the input tomographic image and the generated new tomographic image can be extracted as an abnormal region. The tomographic image has been described as an example of the input data, but a front image of the anterior eye or a fundus image may also be used.

Further, the image processing unit 101-04 may detect an abnormal region using a Convolution Automatic Encoder (CAE). In the case of CAE, the same image is learned as input data and output data at the time of learning. With this configuration, if an image having an abnormal region at the time of estimation is input, an image having no abnormal region is output according to the learned tendency. Thereafter, a difference between an image input to the CAE and an image output from the CAE may be extracted as an abnormal region. In addition, in this case, not only the tomographic image but also the front image of the anterior eye and the fundus image can be used as input data.

In these cases, the image processing unit 101-04 may generate, as the information on the abnormal region, information on a difference between the medical image obtained using the generative countermeasure network or the auto encoder for each different region identified by the segmentation process and the medical image input to the generative countermeasure network or the auto encoder. With this configuration, the image processing unit 101-04 can expect to detect an abnormal region quickly and accurately. The automatic encoder includes, for example, VAE and CAE. For example, the image processing unit 101-04 may generate, as the information on the abnormal region, information on a difference between a medical image obtained from at least one medical image of a plurality of medical images to be subjected to the blending process using a generative countermeasure network or an automatic encoder and the at least one medical image. In addition, for example, the display control unit 101-05 may display information on a difference between a medical image obtained from the above-mentioned at least one medical image using a generative countermeasure network or an automatic encoder and the at least one medical image as information on an abnormal region on the display unit 104.

In the case of a diseased eye, the image characteristics vary depending on the type of disease. Therefore, the learned model used in the various exemplary embodiments and modifications described above may be generated or prepared for each type of disease or for each abnormal region. In this case, the image processing apparatus 101 may select, for example, a model to be used for learning of processing in accordance with an input (instruction) from an operator regarding a disease type or an abnormal region of the subject's eye. The learned model prepared for each type of disease or for each abnormal region is not limited to the learned model to be used for detecting the retina layer or generating the region label image, and may be, for example, a learned model to be used by an engine for image evaluation or an engine for analysis. At this time, the image processing apparatus 101 can identify the disease type or abnormal region of the subject eye from the image using the learned model prepared separately. In this case, the image processing apparatus 101 may automatically select a learned model to be used for the above-described processing based on a disease type or an abnormal region that has been identified using a separately prepared learned model. The model for learning to identify a disease type or an abnormal region of the subject eye may perform learning using a pair of learning data in which a tomographic image or a fundus image is set as input data and a disease type or an abnormal region in these images is set as output data. As input data of the learning data, a tomographic image or a fundus image may be set as the input data alone, or a combination of these may be set as the input data.

In particular, the learned model for diagnostic result generation may be a learned model obtained by learning with learning data including input data having a set of a plurality of medical images of different types of a predetermined region of the subject. At this time, for example, input data including a set of a motion contrast frontal image and a luminance frontal image (or a luminance tomographic image) of the fundus can be regarded as input data included in the learning data. Further, for example, input data including a set of a tomographic image (B-scan image) and a color fundus image (or a fluorescein fundus image) of the fundus can also be regarded as input data included in the learning data. The plurality of medical images of different types may be any medical images as long as the medical images are acquired by different modalities, different optical systems, or different principles.

In particular, the learned model for diagnostic result generation may be a learned model obtained by learning with learning data including input data having a set of a plurality of medical images of different regions of the subject. At this time, for example, input data including a set of a tomographic image of the fundus (B-scan image) and a tomographic image of the anterior segment (B-scan image) may be regarded as input data included in the learning data. Further, input data including a set of a three-dimensional OCT image (three-dimensional tomographic image) of the fundus macula lutea and a circular scan (or raster scan) tomographic image of the fundus optic nerve head may also be regarded as input data included in the learning data.

The input data included in the learning data may be a plurality of medical images of different regions and different types of the subject. At this time, for example, input data including a set of a tomographic image and a color fundus image of the anterior ocular segment may be regarded as input data included in the learning data, for example. The above learning model may be a learned model obtained by learning with learning data including input data having a set of a plurality of medical images of different image capturing field angles of a predetermined region of the subject. The input data included in the learning data may be an image obtained by combining a plurality of medical images obtained by time-dividing a predetermined region into a plurality of regions, such as a panoramic image. At this time, by using a wide field angle image such as a panoramic image as learning data, there is a possibility that the feature amount of the image can be accurately acquired for the reason that the information amount is larger than the narrow field angle image. Therefore, the result of each process can be improved. For example, in the case where abnormal regions are detected at a plurality of positions of the wide-field angle image at the time of estimation (at the time of prediction), enlarged images of the respective abnormal regions can be sequentially displayed. With this configuration, abnormal regions at a plurality of positions can be effectively inspected, so for example, the convenience of an inspector can be enhanced. At this time, for example, each position on the wide-field angle image in which the abnormal region is detected may be selectable by the examiner, and a magnified image of the abnormal region at the selected position may be displayed. The input data included in the learning data may be input data of a set of a plurality of medical images having predetermined regions of the subject obtained at different dates and times.

The display screen on which at least one of the above analysis result, diagnosis result, object recognition result, and segmentation result is displayed is not limited to the report screen. Such a display screen may be displayed on at least one of, for example, an image capture confirmation screen, a display screen for follow-up, and a preview screen (a display screen on which various live moving images are displayed) for various types of adjustment before image capture. For example, by displaying the at least one result obtained using the above-described learned model on the image capture confirmation screen, the inspector can check an accurate result even immediately after image capture. For example, the above-described display change between the low-quality image and the high-quality image may be a display change between the analysis result of the low-quality image and the analysis result of the high-quality image.

The various learned models described above can be obtained by machine learning using learning data. Types of machine learning include, for example, deep learning consisting of a multi-layer neural network. At least in part of a multi-layer neural network, for example, a Convolutional Neural Network (CNN) may be used as the machine learning model. Techniques related to autoencoders may be used, at least in part, in multi-layer neural networks. Learning may also be performed using techniques related to back propagation. However, the machine learning is not limited to the deep learning, but may be any learning as long as the model used for the learning can extract (represent) the feature amount of the learning data such as the image by performing the learning by the model itself. The machine learning model refers to a learning model based on a machine learning algorithm such as deep learning. The learned model is a model preliminarily trained (learned) using appropriate learning data for a machine learning model based on an arbitrary machine learning algorithm. However, additional learning may also be performed without prohibiting further learning of the learned model. The learning data is composed of pairs of input data and output data (correct data). Although the learning data is referred to as teaching data in some cases, the correct data is referred to as teaching data in other cases.

GPUs can perform efficient computations by processing large amounts of data simultaneously. Therefore, in the case where multiple learning is performed using a learning model such as deep learning, it is effective to perform processing using the GPU. In view of the above, in the processing to be executed by the image processing unit 101-04 serving as an example of the learning unit (not illustrated), in this modification, the GPU is used in addition to the CPU. Specifically, in the case of executing a learning program including a learning model, learning is performed by the CPU and the GPU in a coordinated manner for calculation. In the processing of the learning unit, the calculation may be performed only by the CPU or the GPU. The processing unit (estimation unit) that performs processing using the above-described various learned models may use a GPU similarly to the learning unit. The learning unit may include an error detection unit (not shown) and an update unit (not shown). The error detection unit obtains an error between correct data and output data output from an output layer of the neural network based on input data input to an input layer. The error detection unit may use a loss function to calculate an error between the correct data and the output data from the neural network. The updating unit updates the inter-node connection weighting coefficients of the neural network so that the error becomes smaller, for example, based on the error obtained by the error detecting unit. The updating unit updates the connection weighting coefficients using, for example, back propagation. Back propagation is a method of adjusting the inter-node connection weighting coefficients of each neural network so as to reduce the above-described error.

As a machine learning model for image quality improvement or segmentation, a U-net type machine learning model may be applied. The U-net type machine learning model has the function of an encoder including a plurality of layers including a plurality of down-sampling layers and the function of a decoder including a plurality of layers including a plurality of up-sampling layers. In the U-net type machine learning model, among the same-sized layers (mutually corresponding layers) formed as the plurality of layers of the decoder, the position information (spatial information) masked out among the plurality of layers formed as the encoder is made unusable (for example, using a jump connection).

Machine learning models for, e.g., image quality improvement or segmentation, may use, e.g., Full Convolution Network (FCN) or SegNet. The machine learning model may also use a machine learning model that performs object recognition on the respective regions according to a desired configuration. The machine learning model that performs object recognition may use, for example, Region Cnn (RCNN), fast RCNN, or faster RCNN. A machine learning model that performs object recognition on each region may also use a youonly Look one (YOLO) or a Single Shot Detector or a Single Shot multi box Detector (SSD).

The machine learning model may be, for example, a capsule network (CapsNet). In a typical neural network, individual units (individual neurons) are configured to output scalar values, thereby reducing spatial information about, for example, spatial positional relationships (relative positions) between features in an image. With this configuration, learning can be performed to reduce the influence of, for example, local distortion or parallel shift of an image. In contrast, in the capsule network, for example, the spatial information is configured to be held as a vector by respective units (respective capsules) configured to output the spatial information. With this configuration, learning can be performed in consideration of, for example, a spatial positional relationship between features in an image.

The image quality improvement engine (model for learning of image quality improvement) may be a learned model obtained by additional learning using learning data including at least one high-quality image generated by the image quality improvement engine. At this time, whether to use a high-quality image as learning data for additional learning can be made selectable by an instruction from the inspector. The application of these configurations is not limited to the learned model for image quality improvement, and these configurations can also be applied to the various learned models described above. In generating correct data for learning the above-described various learned models, a learned model for correct data generation for generating correct data such as a label (comment) may be used. At this time, the learned model for correct data generation may be a learned model obtained by additionally (sequentially) learning correct data obtained by the examiner performing labeling (annotation). In other words, the learned model for correct data generation may be a learned model obtained by performing additional learning on the learned data in which unmarked data is set as input data and marked data is set as output data. In a plurality of continuous frames such as moving images, the result of a frame determined to have a result with low accuracy can be corrected in consideration of the results of segmentation of preceding and following frames or object recognition. At this time, the corrected result may be additionally learned as correct data according to an instruction from the inspector.

In the various exemplary embodiments and modifications described above, in the case where the regions of the subject's eye are detected using the learned model for object recognition or the learned model for segmentation, predetermined image processing may also be applied to the respective detected regions. For example, a case where at least two regions among the vitreous region, the retinal region, and the choroidal region are detected will be considered. In this case, when image processing such as contrast adjustment is performed on the detected at least two regions, adjustment suitable for the respective regions can be performed by using different image processing parameters. By displaying the images on which the adjustment suitable for the respective regions has been performed, the operator can appropriately diagnose the diseases of the respective regions. The configuration using image processing parameters different for the detected respective regions can be similarly applied to the region of the subject eye detected without using, for example, a learning model.

(modification 4)

On the preview screen in the various exemplary embodiments and modifications described above, the above-described learned model may be used every at least one frame of the live moving image. At this time, in the case where a plurality of live moving images of different areas or different types are displayed on the preview screen, a learning model corresponding to each live moving image may be used. With this configuration, the processing time can be shortened even for, for example, live moving images, whereby the examiner can obtain highly accurate information before starting image capturing. For example, failure to retake can be reduced, and thus the accuracy and efficiency of diagnosis can be enhanced.

The plurality of live moving images may be, for example, a moving image of an anterior ocular segment for alignment in XYZ directions or a front moving image of the fundus for focus adjustment of a fundus observation optical system or OCT focus adjustment. The plurality of live moving images may also be tomographic images of the fundus for, for example, coherence gate adjustment (adjustment of an optical path length difference between a measurement optical path length and a reference optical path length) of OCT. At this time, the above-described various types of adjustment may be performed in such a manner that the region detected using the above-described model for learning of object recognition or the model for learning of segmentation satisfies a predetermined condition. For example, various types of adjustment such as OCT focus adjustment may be performed such that a value (e.g., a contrast value or an intensity value) related to a predetermined retinal layer such as a vitreous region or a Retinal Pigment Epithelium (RPE) detected using a learned model for object recognition or a learned model for segmentation exceeds a threshold value (or reaches a peak). For example, coherence gate adjustment of OCT may be performed such that a predetermined retina layer such as a vitreous body region or RPE detected using a learned model for object recognition or a learned model for segmentation is located at a predetermined position in the depth direction.

In these cases, an image quality improvement unit (not illustrated) in the image processing apparatus 101 (or the image processing units 101 to 04) can generate a high-quality moving image by performing image quality improvement processing on the moving image using the learned model. The imaging control unit 101-03 may perform drive control of the optical member such as the reference mirror 221, which changes the image capturing range so that any of the different regions identified by the division processing is located at a predetermined position in the display region in a state where a high-quality moving image is displayed. In this case, the imaging control unit 101-03 may automatically perform the alignment process based on the highly accurate information so that the desired region is located at a predetermined position of the display region. The optical member that changes the image capturing range may be, for example, an optical member that adjusts the position of the coherence gate. In particular, the optical member may be, for example, a reference mirror 221. The coherence gate position can be adjusted by changing the optical member that measures the optical path length difference between the optical path length and the reference optical path length. The optical member may be, for example, a mirror (not shown) for changing the optical path length of the measurement light. The optical member that changes the image capturing range may be, for example, the stage unit 100-2.

The moving image to which the above-described learned model can be applied is not limited to the live moving image. The moving image may be, for example, a moving image stored in the storage unit 101-02. At this time, for example, a moving image obtained by performing position alignment for every at least one frame of the tomographic images of the fundus stored in the storage unit 101-02 may be displayed on the display screen. For example, where it is desired to preferentially view a vitreous region, a reference frame may be selected based on conditions, such as the vitreous region being present on the frame as much as possible. At this time, each frame is a tomographic image (B-scan image) in the XZ direction. Then, a moving image obtained by performing positional alignment of another frame with respect to the selected reference frame in the XZ direction can be displayed on the display screen. At this time, for example, high-quality images (high-quality frames) sequentially generated every at least one frame of the moving image by the learned model for image quality improvement may be continuously displayed.

As the above-described position alignment method between frames, the same method may be applied to the position alignment method in the X direction and the position alignment method in the Z direction (depth direction), or different methods may be applied to all the position alignments. The position alignment in the same direction may be performed a plurality of times using different methods. For example, after coarse position alignment is performed, fine position alignment may be performed. The method of the positional alignment includes, for example, (rough) positional alignment (in the Z direction) using retina layer boundaries obtained by performing segmentation processing on tomographic images (B-scan images), (precise) positional alignment (in the X direction or the Z direction) using correlation information (similarities) between a reference image and a plurality of regions obtained by dividing the tomographic images, (in the X direction) using one-dimensional projection images generated for the respective tomographic images (B-scan images), and (in the X direction) using two-dimensional frontal images. In addition, after the position alignment is roughly performed for each pixel, precise position alignment may be performed for each sub-pixel.

During the various types of adjustment, there is a possibility that an image of an image capturing target such as the retina of the subject eye has not been successfully captured. Thus, there is a possibility that a high-quality image cannot be accurately obtained because the difference between the medical image input to the learned model and the medical image used as the learning data is large. In view of the above, if the evaluation value of the image quality evaluation of the tomographic image (B-scan) exceeds the threshold value, the display of the high-quality moving image (continuous display of high-quality frames) can be automatically started. If the evaluation value of the image quality evaluation of the tomographic image (B-scan) exceeds the threshold, the state may become a state (activated state) in which the examiner can specify the image quality improvement button. The image quality improvement button is a button for specifying execution of the image quality improvement processing. The image quality improvement button may be a button for specifying display of a high-quality image.

A model for learning of image quality improvement that is different for respective image capturing modes having different scanning patterns may be prepared, and a model for learning of image quality improvement corresponding to the selected image capturing mode may be selected. A learned model for image quality improvement obtained by performing learning on learning data including various medical images obtained in different image capturing modes can be used.

(modification 5)

In the various exemplary embodiments and modifications described above, in the case where the learned model is performing additional learning, it may be difficult to perform output (estimation/prediction) using the learned model that is currently performing additional learning. Therefore, it is desirable to prohibit the input of medical images to a model currently performing learning of additional learning. Another learned model, which is the same as the learned model currently performing the additional learning, may be prepared as the assisted learned model. At this time, it is desirable that the model inputting the medical image to the assisted learning is made executable during the additional learning. After the additional learning is completed, the learned model that has undergone the additional learning is evaluated, and if the learned model has no problem, the assisted learned model may be replaced with the learned model that has undergone the additional learning. If the learned model has any problems, then an auxiliary learned model may be used. For the evaluation of the learned model, for example, a learned model for classification for separating a high-quality image obtained by the learned model for image quality improvement from other types of images may be used. The model for learning of classification may be, for example, a learned model obtained by learning using learning data including, as input data, a plurality of images including a high-quality image and a low-quality image obtained by the model for learning of image quality improvement and data labeled (annotated) with the types of these images as correct data. At this time, the type of image of the input data at the time of estimation (at the time of prediction) may be displayed together with information (for example, a numerical value indicating a percentage) indicating the likelihood of each type of image included in the correct data at the time of learning. In addition to the above-described images, the input data of the learned model for classification may include a high-quality image on which contrast improvement or noise reduction is performed by superimposition processing of a plurality of low-quality images (for example, averaging processing of a plurality of low-quality images obtained by performing positional alignment).

A learned model obtained by performing learning on each image capturing area may be selectively used. Specifically, a selection unit (not shown) that selects any one of a plurality of learned models including a first learned model obtained using learning data including a first image capturing region (for example, lung, subject eye) and a second learned model obtained using learning data including a second image capturing region different from the first image capturing region may be included. At this time, the image processing unit 101-04 may include a control unit (not shown) that performs additional learning on the selected learned model. The control unit may search data including a pair of an image capturing area corresponding to the selected learned model and a captured image of the image capturing area according to an instruction from the operator, and perform learning using data obtained by the search as learning data for the selected learned model as additional learning. The image capturing area corresponding to the selected learned model may be acquired from information on the header of the data or manually input by the examiner. In addition, data search may also be performed from a server of an external facility such as a hospital or a research station via a network. With this configuration, additional learning can be efficiently performed for each image capturing area using the captured image of the image capturing area corresponding to the learned model.

The selection unit and the control unit may include software modules executed by a processor such as a Micro Processing Unit (MPU) or a CPU of the image processing unit 101-04. The selection unit and the control unit may be formed of circuits having ASIC-specific functions or separate devices.

When learning data for extra learning is acquired from a server of an external facility such as a hospital or a research station via a network, it is desirable to reduce a decrease in reliability due to tampering in extra learning or system failure. In view of the above, the validity of learning data for additional learning can be detected by checking consistency using a digital signature or hash. Thus, learning data for additional learning can be protected. At this time, in a case where the validity of the learning data for additional learning cannot be detected as a result of performing the consistency check by the digital signature or the hash, a warning of the detection result is indicated, and additional learning using the learning data is not performed. The installation location of the server is not limited, and the type of the server may be, for example, any one of a cloud server, a fog server, and an edge server.

(modification 6)

In the various exemplary embodiments and modifications described above, the instruction from the examiner may be an instruction issued by voice, in addition to an instruction issued manually (for example, an instruction issued using a user interface). In this case, a machine learning engine including, for example, a speech recognition engine (speech recognition model, learned model for speech recognition) obtained by machine learning may be used. In addition, the manually issued instruction may be an instruction issued by inputting a character using, for example, a keyboard or a touch panel. At this time, a machine learning engine including, for example, a character recognition engine (character recognition model, model for learning of character recognition) obtained by machine learning may be used. The instruction from the examiner may be an instruction issued by a gesture. In this case, a machine learning engine including a gesture recognition engine (gesture recognition model, model for learning of gesture recognition) obtained by machine learning may be used.

The instruction from the examiner may be a result of line-of-sight detection of the examiner on the display screen on the display unit 104. For example, the line of sight detection result may be a pupil detection result using a moving image of the examiner obtained by performing image capturing from the periphery of the display screen on the display unit 104. In this case, the above-described object recognition engine may be used for pupil detection from a moving image. The instruction from the examiner may be an instruction issued by a weak electric signal such as brain waves or a flow in the body.

In this case, the learning data may be learning data in which character data or voice data (waveform data) indicating an instruction to display a result obtained by the processing of the above-described various learned models is set as input data and an execution command for displaying the result obtained by the processing of the various learned models on the display unit 104 is set as correct data. The learning data may be, for example, learning data in which character data or voice data indicating an instruction to display a high-quality image obtained by a learned model for image quality improvement is set as input data and an execution command to display the high-quality image and an execution command to change an image quality improvement button to an activated state are set as correct data. The learning data may be any learning data as long as, for example, the content of the instruction indicated by the character data or the voice data and the content of the execution command correspond to each other. The voice data may be converted into character data using an acoustic model or a language model. The processing of reducing the noise data superimposed on the voice data may be performed using waveform data obtained by a plurality of microphones. The instruction issued by characters or voice and the instruction issued using a mouse or a touch panel may be made selectable according to an instruction from the examiner. In addition, enabling/disabling of the instruction issued by characters or voice may be made selectable according to an instruction from the examiner.

Machine learning includes the above-described deep learning, for example, a Recurrent Neural Network (RNN) may be used in at least one layer of a multi-layer neural network. As an example of the machine learning model according to this modification, an RNN of a neural network as processing time series information will be described with reference to fig. 9A and 9B. In addition, a long short term memory network (hereinafter, LSTM) as one type of RNN will be described with reference to fig. 10A and 10B.

Fig. 9A illustrates the structure of an RNN as a machine learning model. RNN 3520 has a circular structure in the network, and at time t, data x is input ^t3510 and output data h ^t3530. The RNN 3520 has a loop structure in the network, whereby the state of the current time can be shifted to the next state. Thus, time series information can be processed. Fig. 9B illustrates an example of input-output of a parameter vector at time t. Data x ^t3510 includes N (Params1 to ParamsN) pieces of data. Data h output from RNN 3520^t3530 includes N (Params1 to ParamsN) pieces of data corresponding to the input data.

In reverse propagation, however, long-term information cannot be processed in the RNN,thus LSTM is sometimes used. The LSTM may learn long-term information by including a forgetting gate (forget gate), an input gate, and an output gate. Fig. 10A illustrates the structure of the LSTM. In the LSTM 3540, information to be transferred to the next time t through the network includes an internal state c of the network called a cell (cell)^t-1And output data h^t-1. The lower case characters (c, h, x) illustrated in fig. 10A indicate vectors.

Fig. 10B illustrates details of LSTM 3540. In fig. 10B, the LSTM 3540 includes a forgetting gate network FG, an input gate network IG, and an output gate network OG, and each of these is a sigmoid layer. Thus, each element outputs a vector of values ranging from 0 to 1. The forgetting gate network FG determines the amount of past information to be held, and the input gate network IG determines the value to be updated. The cell update candidate network CU is an activation function tanh layer. This creates a vector of new candidate values to be added to the cell. The output gate network OG selects the cell candidate's element and selects the amount of information to be transmitted to the next time.

The model of LSTM described above has a basic form and thus the network is not limited to that illustrated here. For example, the connections between the networks may change. A quasi-recurrent neural network (QRNN) may be used instead of the LSTM. Further, the machine learning model is not limited to neural networks, and a lifting or support vector machine may be used. In the case where the instruction from the examiner is an input performed by a character or a voice, a technique related to natural language processing (e.g., Sequence to Sequence) may be applied. A dialogue engine (dialogue model, model for learning of dialogue) that responds to an inspector by using output of characters or voice can be applied.

(modification 7)

In the various exemplary embodiments and modifications described above, a high-quality image may be stored in the storage unit 101-02 according to an instruction from an inspector. In this case, after an instruction from the examiner to store a high-quality image, at the time of registration of a file name, a file name including information (e.g., characters) indicating that a displayed image is an image generated by processing using a learned model for image quality improvement (image quality improvement processing) at any part (e.g., the foremost part or the rearmost part) of the file name may be displayed as a recommended file name in an editable state according to the instruction from the examiner.

On various display screens such as a report screen, when a high-quality image is displayed on the display unit 104, a display indicating that the displayed image is a high-quality image generated by processing using a learned model for image quality improvement may be displayed together with the high-quality image. In this case, the examiner can easily recognize, based on the display, that the displayed high-quality image is not an image acquired by image capturing, and thus it is possible to reduce erroneous diagnosis or enhance diagnosis efficiency. The display indicating that the displayed image is a high-quality image generated by processing using the learned model for image quality improvement may take any form as long as the display makes the input image and the high-quality image generated by the processing distinguishable. For the processing using the above-described various learned models and the processing using the learned models for image quality improvement, a display indicating that the displayed result is a result generated by processing using the type of learned models may be displayed together with the result.

At this time, a display screen such as a report screen may also be stored as image data in the storage unit 101-02 according to an instruction from an examiner. For example, the report screen may be stored in the storage unit 101-02 as one image in which images with improved image quality are arranged and a display indicating that these images are high-quality images generated by processing using a learned model for image quality improvement.

For display indicating that the displayed image is a high-quality image generated by processing using the learned model for image quality improvement, display indicating the type of learning data used for learning by the learned model for image quality improvement may be displayed on the display unit 104. The display may include, for example, a description of correct data of the learning data and the type of input data, and any display regarding correct data such as the input data and an image capturing area included in the correct data. For processing using the above-described various learned models and processing using a learned model for image quality improvement, a display indicating the type of learning data used for learning by the type of learned model may be displayed on the display unit 104.

Information (e.g., characters) indicating that the displayed image is an image generated by processing using the learned model for image quality improvement may be displayed or stored in a manner superimposed on, for example, a high-quality image. In this case, the superimposed point of the information may be any point on the image as long as the area (for example, the edge of the image) does not overlap with the area in which the target area serving as the image capturing target is displayed. Regions that do not overlap are determined and information may be superimposed on the determined regions.

In the case where the activation state of the image quality improvement button (image quality improvement processing start) is set by default on the default display screen of the report screen, a report image corresponding to the report screen including a high-quality image may be transmitted to a server such as the external storage unit 102 in accordance with an instruction from the examiner. In the case where the activation state of the image quality improvement button is set by default, at the end of the inspection (for example, when the display screen is changed from the image capture confirmation screen or the preview screen to the report screen according to an instruction from the inspector), a report image corresponding to a report screen including a high-quality image may be (automatically) transmitted to the server. At this time, a report image generated based on various settings among the default settings may be transmitted to the server. The various settings relate to at least one of generating a depth range of the En-Face image on a default display screen of the report screen, analyzing the presence or absence of superimposition of the map, whether the image is a high-quality image, and whether the screen is a display screen for follow-up.

(modification 8)

In the various exemplary embodiments and modifications described above, an image obtained by a first type of learned model (for example, a high-quality image, an image indicating an analysis result such as an analysis chart, an image indicating an object recognition result, and an image indicating a segmentation result) among the various learned models described above may be input to a second type of learned model different from the first type. At this time, results (e.g., analysis results, diagnosis results, object recognition results, and segmentation results) obtained by the processing of the second type of learned model may be generated.

An image to be input to a second type of learned model different from the first type may be generated from an image input to the first type of learned model by using results (e.g., analysis results, diagnosis results, object recognition results, and segmentation results) obtained by processing of the first type of learned model among the various types of learned models described above. At this time, the generated image is most likely an image suitable as an image to be processed by the second type of learned model. Therefore, the accuracy of images (e.g., a high-quality image, an image indicating an analysis result such as an analysis chart, an image indicating an object recognition result, and an image indicating a segmentation result) obtained by inputting the generated image to the second type of learned model can be improved.

In addition, the various types of learned models described above may be learned models obtained by learning using learning data including a two-dimensional medical image of the subject, or may be learned models obtained by learning using learning data including a three-dimensional medical image of the subject.

In addition, similar case image search using an external database stored in a server may be performed using an analysis result or a diagnosis result obtained by the processing of the above-described learned model as a search key. In the case where a plurality of images stored in a database by machine learning are managed in a state where their respective feature amounts have been attached as accompanying information, a similar case image search engine (similar case image search model, a model for learning of similar case image search) using the images themselves as search keys may be used. For example, the image processing unit 101-04 (different from the model for learning of image quality improvement) may search for at least one medical image of a plurality of medical images to be subjected to the blending process for a similar case image related to the at least one medical image using the model for learning of similar case image search. For example, the display control unit 101-05 may display a similar case image obtained from the above-described at least one medical image using the learned model for similar case image search on the display unit 104.

(modification 9)

The generation processing of the motion contrast data in the various exemplary embodiments and modifications described above is not limited to the configuration performed based on the luminance values of the tomographic images. The various types of processing described above can be applied to the interference signal acquired by the tomographic image capturing apparatus 100, a signal obtained by performing fourier transform on the interference signal, a signal obtained by performing arbitrary processing on the signal, and tomographic data including a tomographic image based on these signals. Also in these cases, effects similar to the above-described configuration can be obtained. For example, an optical fiber system using an optical coupler as a dividing unit is used, but a spatial optical system using a collimator and a beam splitter may be used. The configuration of the tomographic image capturing apparatus 100 is not limited to the above-described configuration, and a part of the configuration included in the tomographic image capturing apparatus 100 may be a configuration separate from the tomographic image capturing apparatus 100. In the above configuration, the michelson interferometer is used as the interference optical system of the tomographic image capturing apparatus 100, but the configuration of the interference optical system is not limited thereto. For example, the interference optical system of the tomographic image capturing apparatus 100 may include, for example, a mach-zehnder interferometer. A spectral domain OCT (SD-OCT) device using an SLD as a light source has been described as the OCT device, but the configuration of the OCT device is not limited thereto. For example, the present invention can also be applied to any type of OCT apparatus such as a scanning source OCT (SS-OCT) apparatus that uses a wavelength scanning light source and can scan the wavelength of emitted light. The present invention can also be applied to a Line-OCT apparatus (or SS Line-OCT apparatus) using linear light. The present invention can also be applied to a full-field OCT apparatus (or a SS full-field OCT apparatus) using area light. The image processing unit 101-04 acquires the interference signal acquired by the tomographic image capturing apparatus 100 and the three-dimensional tomographic image generated by the image processing unit 101-04, but the configuration in which the image processing unit 101-04 acquires such a signal and image is not limited thereto. For example, the image processing unit 101-04 may acquire these signals from a server or an image capturing apparatus connected via, for example, a Local Area Network (LAN), a Wide Area Network (WAN), or the internet.

The learned model may be set in the image processing unit 101-04. The learned model may be formed by a software module executed by a processor such as a CPU. The learned model may be set in another server connected to the image processing unit 101-04. In this case, the image processing unit 101-04 can perform the image quality improvement processing using the learned model by connecting to a server including the learned model via any network such as the internet.

During the process of generating motion contrast data, an image quality improvement engine may be suitably applied. For example, the image quality of the tomographic image before the decorrelation value is obtained can be preliminarily improved by using an image quality improvement engine prepared for the tomographic image. In the case where the NOR is 3 or more, at least 2 pieces of motion contrast data may be generated, and image quality may also be improved by averaging a plurality of pieces of motion contrast data. In this case, the image quality of each piece of motion contrast data before the averaging process can be preliminarily improved by the image quality improvement engine. Alternatively, the image quality improvement engine may be applied to motion contrast data that has undergone averaging. By using volume data (three-dimensional motion contrast data) as the motion contrast data, the image quality of the volume data can be improved by an image quality improvement engine for three-dimensional data preliminarily formed by known 3D-UNet. Further, in the case where the NOR is 3 or more, at least 2 pieces of three-dimensional motion contrast data may be generated, and final volume data may be obtained by averaging these. In this case, the image quality improvement engine may be applied to at least one of the volume data that has not been subjected to averaging and the volume data that has been subjected to averaging processing. Further, after the OCTA front images are respectively generated from the plurality of pieces of volume data, the averaging process may be performed on the OCTA front images. Similarly, the image quality improvement engine may be applied to at least one of the oca front images that have not been subjected to averaging and the oca front images that have been subjected to averaging processing. In this way, when generating an OCTA front image from motion contrast data, various modifications can be made, particularly in the case where the NOR is 3 or more, and the image quality improvement engine can be applied to any data regardless of whether the data is two-dimensional data or three-dimensional data.

(modification 10)

The images to be processed by the image processing apparatus 101 or the image processing method according to the various exemplary embodiments and modifications described above include medical images acquired using an arbitrary modality (image capturing apparatus, image capturing method). The medical image to be processed may include a medical image acquired by an arbitrary image capturing apparatus and an image created by the image processing apparatus 101 or the image processing method according to the above-described exemplary embodiment and modification.

The medical image to be processed is an image of a predetermined region of the subject, and the image of the predetermined region includes at least part of the predetermined region of the subject. In addition, the medical image may include another region of the subject. The medical image may be a still image or a moving image, and may be a monochrome image or a color image. Further, the medical image may be an image representing the structure (configuration) of a predetermined region, or may be an image representing the function thereof. The images representing functions include images representing the state of blood flow movement (e.g., blood flow and blood flow velocity), such as an OCTA image, a doppler OCT image, a functional magnetic resonance imaging (fMRI) image, and an ultrasound doppler image. The predetermined region of the subject may be determined according to an image capturing target, and includes organs such as a human eye (subject eye), a brain, a lung, an intestine, a heart, a pancreas, a kidney, and a liver, and arbitrary regions such as a head, a chest, legs, and arms.

The medical image may be a tomographic image of the subject, or may be a frontal image. The front image includes, for example, a fundus front image, a front image of the anterior segment, a fundus image obtained by fluorescein image capturing, and an En-Face image generated using data in at least a partial range in the depth direction of an image capturing target of data (three-dimensional OCT data) obtained by OCT. The En-Face image may be an En-Face image (motion contrast front image) of the OCTA generated using data in at least a partial range in the depth direction of the image capturing target of the three-dimensional OCTA data (three-dimensional motion contrast data). Three-dimensional OCT data and three-dimensional motion contrast data are examples of three-dimensional medical image data.

The motion contrast data is data indicating a change between pieces of volume data obtained by controlling the same area (the same position) of the subject's eye to perform a plurality of scans with the measurement light. At this time, the volume data includes a plurality of tomographic images obtained at different positions. By obtaining data indicating a change between a plurality of tomographic images obtained at substantially the same position at respective different positions, motion contrast data as volume data can be obtained. The motion contrast frontal image is also referred to as an OCT blood vessel imaging (OCTA) correlated with the OCTA frontal image (En-Face image of the OCTA) for measuring blood flow motion, and the motion contrast data is also referred to as OCTA data. The motion contrast data may be obtained as, for example, a decorrelation value, a discrete value, or a value obtained by dividing a maximum value by a minimum value (maximum value/minimum value) of two tomographic images or interference signals corresponding to the two tomographic images, and may be obtained by any known method. At this time, the two tomographic images can be obtained by, for example, controlling the same region (the same position) of the subject eye to perform a plurality of scans with the measurement light.

The En-Face image is a front image generated by projecting data of a range between two layer boundaries in the XY direction, for example. At this time, a frontal image is generated by projecting or integrating data corresponding to a depth range defined based on two reference surfaces onto a two-dimensional plane, the depth range being at least part of volume data (three-dimensional tomographic image) obtained using optical interference. The En-Face image is a frontal image generated by projecting data of volume data corresponding to a depth range determined based on the detected retina layers onto a two-dimensional plane. As a method of projecting data corresponding to a depth range defined based on two reference surfaces onto a two-dimensional plane, for example, a method of setting a representative value of data within the depth range as a pixel value on the two-dimensional plane may be used. The representative value may include a value such as an average value, a median value, or a maximum value of pixel values within a range in the depth direction of a region surrounded by two reference surfaces. For example, the depth range related to the En-Face image may be a range including a predetermined number of pixels in a deeper direction or a shallower direction based on one of two layer boundaries related to the detected retina layers. For example, the depth range related to the En-Face image may be a range that is changed (offset) with respect to a range between two layer boundaries related to the detected retina layers according to an instruction of the operator.

The image capturing apparatus is an apparatus for capturing an image to be used for diagnosis. The image capturing apparatus includes, for example, an apparatus that obtains an image of a predetermined region by emitting light, radiation such as X-rays, electromagnetic waves, or ultrasonic waves to the predetermined region of the subject and an apparatus that obtains an image of the predetermined region by detecting the radiation emitted from the subject. More specifically, the image capturing apparatus according to the various exemplary embodiments and modifications described above includes at least an X-ray image capturing apparatus, a CT apparatus, an MRI apparatus, a PET apparatus, a SPECT apparatus, an SLO apparatus, an OCT apparatus, an oca apparatus, a fundus camera, and an endoscope.

The OCT apparatus may include a time domain OCT (TD-OCT) apparatus and a fourier domain OCT (FD-OCT) apparatus. The fourier domain OCT devices may include spectral domain OCT (SD-OCT) devices and swept source OCT (SS-OCT) devices. As the SLO apparatus and the OCT apparatus, an adaptive optics SLO (AO-SLO) apparatus and an adaptive optics OCT (AO-OCT) apparatus using an adaptive optics optical system may be included. As the SLO apparatus and the OCT apparatus, a polarization-sensitive SLO (PS-SLO) apparatus and a polarization-sensitive OCT (PS-OCT) apparatus for visualizing a polarization phase difference and information on depolarization may be included.

[ other exemplary embodiments ]

The present invention can also be realized by executing the following processing. More specifically, the process is the following process: software (program) for realizing one or more functions of the various exemplary embodiments and the modifications described above is supplied to a system or an apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or the apparatus reads and executes the program.

The present invention can also be realized by the following processing: software (program) for realizing one or more functions of the various exemplary embodiments and modifications described above is supplied to a system or an apparatus via a network or various storage media, and a computer of the system or the apparatus reads and executes the program. The computer includes one or more processors or circuits and includes a plurality of individual computers or a network of individual processors or circuits for reading and executing computer-executable commands.

At this time, the processor or circuit may include a Central Processing Unit (CPU), a Micro Processing Unit (MPU), a Graphic Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA). The processor or circuit may include a Digital Signal Processor (DSP), a Data Flow Processor (DFP), or a Neural Processing Unit (NPU).

The present invention is not limited to the above-described exemplary embodiments, and various changes and modifications may be made without departing from the spirit and scope of the invention. Accordingly, the appended claims are intended to illustrate the scope of the invention.

The present application claims the benefits of japanese patent application nos. 2019-044263 and 2019-044265 filed on day 11 at 3/2019, japanese patent application No.2019-068895 filed on day 29 at 3/2019, and japanese patent application No.2019-183351 filed on day 3/2019, which are hereby incorporated by reference in their entirety.

Claims

1. An image processing apparatus for performing processing on at least either one of an OCT image and an OCT angiography OCTA image of a mutually corresponding region in a subject acquired by optical coherence tomography OCT, the image processing apparatus comprising:

a mixing processing section for generating a mixed image obtained by performing mixing processing at a predetermined transmittance using the OCT image and the OCTA image;

display control means for displaying the generated blended image on display means;

a transmittance setting member for setting a predetermined transmittance;

a specifying section that specifies at least either one of the OCT image and the occa image as a target on which processing is to be performed;

a selection section for selecting whether analysis or processing is to be performed on the specified target image;

setting means for setting a region of interest in the displayed mixed image; and

execution means for executing the selected processing on the region of interest set in the specified target image.

2. The image processing apparatus according to claim 1,

wherein at least any one of the OCT image and the OCTA image has an attribute indicating two or more classifications for each pixel, and

wherein the mixing processing section generates a mixed image based on the attribute and the predetermined transmittance.

3. The image processing apparatus according to claim 2, wherein the image having the attribute has an attribute classified based on a pixel value of the image.

4. The image processing apparatus according to claim 3, wherein the attribute is set based on whether or not the pixel value exceeds a threshold value.

5. The image processing apparatus according to any one of claims 2 to 4, wherein the image having the attribute has an attribute set according to a preset partial region of the image.

6. The image processing apparatus according to claim 5, wherein the setting means sets the partial region.

7. The image processing apparatus according to any one of claims 2 to 6, wherein, in a case where the image having the attribute is an OCTA image, the attribute is an attribute based on a likelihood that a pixel is a blood vessel.

8. The image processing apparatus according to any one of claims 2 to 7, wherein in a case where the attribute is an attribute indicating a predetermined classification, the mixing processing means fixes the transmittance corresponding to the pixel of the image having the attribute to 0 or 1.

9. An image processing apparatus comprising:

a display control section for displaying, on a display section, a mixed image obtained by performing mixing processing with variable transmittance according to an instruction on an OCT image and an OCTA image of a mutually corresponding region in a subject acquired by OCT;

and an execution unit configured to execute processing on a region of interest set in at least one of the OCT image and the occa image.

10. The image processing apparatus according to any one of claims 1 to 9, wherein the display control means displays a result of the analysis processing of the set region of interest on the display means.

11. The image processing apparatus according to any one of claims 1 to 10, wherein a new transmittance is set from at least one of the OCT image and the occa image using a learned model obtained by performing learning using learning data in which a medical image is set as input data and a transmittance to be used in the mixing process is set as correct data.

12. The image processing apparatus according to claim 11, wherein the learned model is a learned model obtained by additionally performing learning using learning data in which a transmittance set according to an instruction from an examiner is set as correct data.

13. The image processing apparatus according to claim 11 or 12, wherein the learned model is a learned model obtained by additionally performing learning using learning data in which a transmittance changed from a new transmittance according to an instruction from an inspector is set as correct data.

14. The image processing apparatus according to any one of claims 1 to 13, wherein the mixing processing is performed by performing weighted average processing on pixel values of mutually corresponding positions of the OCT image and the OCTA image.

15. An image processing apparatus comprising:

a display control unit configured to display, on a display unit, a mixed image obtained by performing mixing processing with variable transmittance using a first medical image and a second medical image of a different type from the first medical image of mutually corresponding regions in a subject according to an instruction of an operator;

an execution unit configured to execute processing on a region of interest set in at least one of the first medical image and the second medical image.

16. The image processing apparatus according to any one of claims 1 to 15, wherein the display control means displays, on the display means, a medical image having higher image quality than at least one medical image of the plurality of medical images to be subjected to the blending processing, which is obtained from the at least one medical image, using a model for learning of image quality improvement obtained by learning the medical image of the subject.

17. The image processing apparatus according to claim 16, wherein the display control means displays, on the display means, an image analysis result obtained from the at least one medical image using a learned model different from a learned model for image quality improvement.

18. The image processing apparatus according to claim 16 or 17, wherein the display control means displays a diagnosis result obtained from the at least one medical image using a learned model different from the learned model for image quality improvement on the display means.

19. The image processing apparatus according to any one of claims 16 to 18, wherein the display control means displays information on a difference between a medical image obtained from the at least one medical image using a generative countermeasure network or an automatic encoder and the at least one medical image on the display means as information on an abnormal region.

20. The image processing apparatus according to any one of claims 16 to 19, wherein the display control means displays a similar case image obtained from the at least one medical image using a learned model different from a learned model for image quality improvement on the display means.

21. The image processing apparatus according to any one of claims 16 to 20, wherein the display control means displays an object detection result or a segmentation result obtained from the at least one medical image using a learned model different from a learned model for image quality improvement on display means.

22. The image processing apparatus according to any one of claims 16 to 21, wherein the display control means displays an image, information, or result obtained by inputting the plurality of medical images to the learned model on the display means.

23. The image processing apparatus according to any one of claims 1 to 22, wherein the instruction of the operator about changing the transmittance is information obtained using a model learned by at least one of a model used for learning of character recognition, a model used for learning of voice recognition, and a model used for learning of gesture recognition.

24. An image processing method comprising:

displaying a mixed image obtained by performing mixing processing with variable transmittance using an OCT image and an OCTA image of a mutually corresponding region in a subject acquired by OCT according to an instruction of an operator on a display section;

setting a region of interest in the displayed blended image; and

processing is performed on a region of interest set in at least one of the OCT image and the occa image.

25. An image processing method comprising:

displaying, on a display section, a mixed image obtained by performing mixing processing with variable transmittance using a first medical image of a mutually corresponding region in a subject and a second medical image of a different type from the first medical image in accordance with an instruction of an operator;

setting a region of interest in the displayed blended image; and

processing is performed on a region of interest set in at least one of the first medical image and the second medical image.

26. A program for causing a computer to execute an image processing method according to claim 24 or 25.