WO2021100694A1

WO2021100694A1 - Image processing device, image processing method, and program

Info

Publication number: WO2021100694A1
Application number: PCT/JP2020/042764
Authority: WO
Inventors: 治嵯峨野; 律也富田
Original assignee: キヤノン株式会社
Priority date: 2019-11-22
Filing date: 2020-11-17
Publication date: 2021-05-27
Also published as: JP2021079042A; JP7254682B2

Abstract

Provided is an image processing device equipped with: an evaluation unit which, using multiple frontal images corresponding to different depth ranges of three-dimensional volume data of a subject's eye, acquires multiple pieces of information corresponding to said multiple frontal images, the multiple pieces of information being indicative of evaluation conducted as to the presence of an object domain; and a determination unit which, using said multiple pieces of information, determines at least one of the multiple frontal images as an image to be outputted.

Description

Image processing equipment, image processing methods, and programs

The present invention relates to an image processing apparatus, an image processing method, and a program.

It is known that the morphology of blood vessels in the retina can be observed by optical interference tomography angiography (OCTA: Optical Coherence Tomography Angiography). Patent Document 1 relates to a general OCT (Optical Coherence Tomography) and OCTA imaging device and its optical system, a method for generating motion contrast data, and a technique for projecting motion contrast data on a two-dimensional plane within a predetermined depth range. Are listed.

JP-A-2018-89160

A three-dimensional OCT image or a three-dimensional OCTA image (motion contrast image) is projected within a predetermined depth range in order to extract structural features (including blood vessels) of the eye including the retina, vitreous body, and choroid. It may generate a two-dimensional frontal image. In this case, it is common to define a specific layer of the retinal layer as a reference for a depth range in which the structure can be easily extracted with respect to the target structure, and to project in that depth range.

However, depending on individual differences, the morphology of the structure, its state (degree of lesion progression), etc., the characteristics of the target area such as the structure to be confirmed may not be confirmed on the image or may not be clearly seen. There was something. Therefore, one of the objects of the embodiment of the present invention is to make it possible to easily confirm the target area.

The image processing apparatus according to one embodiment of the present invention is information indicating an evaluation in which the existence of a target region is evaluated by using a plurality of front images corresponding to different depth ranges of the three-dimensional volume data of the eye to be inspected. It includes an evaluation unit that acquires a plurality of information corresponding to the plurality of front images, and a determination unit that determines at least one of the plurality of front images as an output image using the plurality of information.

Further features of the present invention will become apparent from the description of the following exemplary examples with reference to the accompanying drawings.

A schematic configuration example of the OCT apparatus according to the first embodiment is shown. A schematic functional configuration example of the image processing apparatus according to the first embodiment is shown. It is a figure for demonstrating the image generation part which concerns on Example 1. FIG. An example of the GUI according to the first embodiment is shown. An example of the GUI according to the first embodiment is shown. An example of the GUI according to the first embodiment is shown. A flowchart of a series of processes according to the first embodiment is shown. The flowchart of the front image generation processing which concerns on Example 2 is shown. It is a figure for demonstrating the image generation part which concerns on Example 2. FIG. It is a figure for demonstrating the neural network which concerns on Example 2. FIG. It is a figure for demonstrating the learning data which concerns on Example 2. FIG. The flowchart of the front image generation processing which concerns on Example 2 is shown. An example of the GUI according to the second embodiment is shown. An example of the GUI according to the second embodiment is shown. It is a figure for demonstrating the neural network which concerns on the modification 2 of Example 2. FIG. An example of the learning data according to the modified example 2 of the second embodiment is shown. It is a figure for demonstrating the image generation part which concerns on Example 3. FIG. The flowchart of the front image generation processing which concerns on Example 3 is shown. It is a flowchart of the front image generation processing which concerns on modification 1 of Example 3. It is a figure for demonstrating the relationship between a plurality of OCTA front images and a plurality of evaluation values which concerns on modification 1 of Example 3. FIG. It is a figure for demonstrating the OCTA front image to be displayed and the depth range which concerns on modification 1 of Example 3. FIG. It is a figure for demonstrating the relationship between a plurality of OCTA front images and a plurality of evaluation values which concerns on modification 1 of Example 3. FIG. It is a figure for demonstrating the OCTA front image to be displayed and the depth range which concerns on modification 1 of Example 3. FIG. It is a flowchart of the front image generation processing which concerns on modification 2 of Example 3. It is a flowchart of the front image generation processing which concerns on modification 3 of Example 3. It is a figure for demonstrating the 2D image and the depth range of the sieve plate which concerns on Example 4. FIG. It is a figure for demonstrating the setting method of the depth range which concerns on Example 4. FIG. An example of the configuration of the neural network used as the machine learning model according to the modification 3 is shown. An example of the configuration of the neural network used as the machine learning model according to the modification 3 is shown. An example of the configuration of the neural network used as the machine learning model according to the modification 3 is shown. An example of the configuration of the neural network used as the machine learning model according to the modification 3 is shown.

Hereinafter, examples of the present invention will be specifically described with reference to the accompanying drawings.

However, the dimensions, materials, shapes, relative positions of the components, etc. described in the following examples are arbitrary, and can be changed according to the configuration of the device to which the present invention is applied or various conditions. Also, in the drawings, the same reference numerals are used between the drawings to indicate elements that are the same or functionally similar.

In the following, the machine learning model refers to a learning model based on a machine learning algorithm. Specific algorithms for machine learning include the nearest neighbor method, the naive Bayes method, a decision tree, and a support vector machine. In addition, deep learning (deep learning) in which features and coupling weighting coefficients for learning are generated by themselves using a neural network can also be mentioned. As appropriate, any of the above algorithms that can be used can be applied to the following examples and modifications. The teacher data refers to learning data and is composed of a pair of input data and output data. The correct answer data refers to the output data of the learning data (teacher data).

The trained model is a model in which training (learning) is performed in advance using appropriate teacher data (learning data) for a machine learning model that follows an arbitrary machine learning algorithm such as deep learning. .. However, although the trained model is obtained by using appropriate training data in advance, it is not that no further training is performed, and additional training can be performed. Additional learning can be performed even after the device has been installed at the site of use.

In Examples 1 to 3, an example of generating a frontal image for confirming a neovascularization (CNV: Choroidal Neovascularization) derived from exudative age-related macular degeneration (AMD) will be described. To do. On the other hand, for example, a frontal image for confirming the lamina cribrosa of the optic nerve head described in Example 4, the choroidal layer (Sattler layer, Haller layer) described in Example 5, or the capillary aneurysm of the retinal blood vessel. The present invention can also be applied to the case of producing.

(Example 1)
Hereinafter, with reference to FIGS. 1 to 8, the image processing device and the image processing method of the ophthalmic device according to the first embodiment of the present invention, particularly the optical coherence tomography device (OCT device) used in an ophthalmic clinic or the like will be described. To do. Hereinafter, a method for displaying a new blood vessel (CNV) using OCTA according to this example will be described.

(OCT optical system)
FIG. 1 shows a schematic configuration example of the OCT apparatus according to this embodiment. The OCT device according to this embodiment is provided with an optical interference unit 100, a scanning optical system 200, an image processing device 300, a display unit 310, a pointing device 320, and a keyboard 321. The optical interference unit 100 is provided with a low coherence light source 101 that emits near-infrared light, an optical branching unit 103, a collimating optical system 111, an adaptive optics system 112, and a reference mirror 113. Further, the optical interference unit 100 is provided with a collimating optical system 122, a diffraction grating 123, an imaging lens 124, and a line sensor 125. The light emitted from the light source 101 propagates through the optical fiber 102a and is divided into measurement light and reference light by the optical branching portion 103. The measurement light divided by the optical branching portion 103 is incident on the optical fiber 102b and guided to the scanning optical system 200. On the other hand, the reference light divided by the optical branching portion 103 is incident on the optical fiber 102c and guided to the reference mirror 113. The optical branching portion 103 may be configured by using, for example, an optical fiber coupler or the like.

The reference light incident on the optical fiber 102c is emitted from the fiber end, enters the dispersion adaptive optics system 112 via the collimating optical system 111, and is guided to the reference mirror 113. The reference light reflected by the reference mirror 113 follows the optical path in the opposite direction and is incident on the optical fiber 102c again. The dispersion-compensated optical system 112 corrects the dispersion of the optical system in the scanning optical system 200 and the eye E to be inspected as the object to be measured. The reference mirror 113 is configured to be driveable in the optical axis direction by a drive unit including a motor or the like (not shown), and changes the optical path length of the reference light relative to the optical path length of the measurement light. Can be done. On the other hand, the measurement light incident on the optical fiber 102b is emitted from the fiber end and incident on the scanning optical system 200. These light sources 101 and a driving unit (not shown) are controlled under the control of the image processing device 300.

Next, the scanning optical system 200 will be described. The scanning optical system 200 is an optical system configured to be movable relative to the eye E to be inspected. A scanning optical system 200, a collimating optical system 202, a scanning unit 203, and a lens 204 are provided. The scanning optical system 200 is configured to be able to be driven in the front-back, up-down, left-right directions with respect to the eye axis of the eye E to be inspected by a driving unit (not shown) controlled by the image processing device 300. The image processing device 300 can align the scanning optical system 200 with respect to the eye E to be inspected by controlling a drive unit (not shown).

The measurement light emitted from the fiber end of the optical fiber 102b is substantially parallelized by the collimating optical system 202 and incident on the scanning unit 203. The scanning unit 203 has two galvano mirrors whose mirror surfaces can be rotated, one of which deflects light in the horizontal direction and the other of which deflects light in the vertical direction, and the light incident under the control of the image processing apparatus 300. Bias. As a result, the scanning unit 203 can scan the measurement light on the fundus Er of the eye E to be inspected in two directions, the main scanning direction in the paper surface and the sub-scanning direction in the direction perpendicular to the paper surface. The main scanning direction and the sub-scanning direction are not limited to this, and may be any direction that is orthogonal to the depth direction of the eye E to be inspected and intersects with each other. Further, the scanning unit 203 may be configured by using any changing means, and may be configured by using, for example, a MEMS mirror or the like capable of deflecting light in two axial directions with one sheet.

The measurement light scanned by the scanning unit 203 forms an illumination spot on the fundus Er of the eye E to be inspected via the lens 204. When in-plane deflection is received by the scanning unit 203, each illumination spot moves (scans) on the fundus Er of the eye E to be inspected. The reflected light at the illumination spot position follows the optical path in the opposite direction, enters the optical fiber 102b, and returns to the optical branch portion 103.

As described above, the reference light reflected by the reference mirror 113 and the measurement light reflected by the fundus Er of the eye E to be inspected are returned to the optical branching portion 103 as return light and interfere with each other to generate interference light. The interference light that has passed through the optical fiber 102d and is emitted to the collimating optical system 122 is substantially parallelized and enters the diffraction grating 123. The diffraction grating 123 has a periodic structure and disperses the input interference light. The dispersed interference light is imaged on the line sensor 125 by the imaging lens 124 whose focusing state can be changed. The line sensor 125 is connected to the image processing device 300, and outputs a signal corresponding to the intensity of the light applied to each sensor unit to the image processing device 300.

Further, the OCT apparatus may be provided with a fundus camera (not shown) for capturing a frontal image of the fundus of the eye E to be inspected, an optical system of a scanning laser Ophthalmoscope (SLO), or the like. In this case, a part of the SLO optical system may have an optical path common to a part of the scanning optical system 200.

(Image processing device)
FIG. 2 shows a schematic functional configuration example of the image processing device 300. As shown in FIG. 2, the image processing device 300 is provided with a reconstruction unit 301, a motion contrast image generation unit 302, a layer recognition unit 303, an image generation unit 304, a storage unit 305, and a display control unit 306. .. The image processing device 300 according to the present embodiment is connected to the optical interference unit 100 using the spectrum domain (SD) method, and can acquire the output data of the line sensor 125 of the optical interference unit 100. The image processing device 300 may be connected to an external device (not shown) to acquire an interference signal of the eye to be inspected, a tomographic image, or the like from the external device.

The reconstruction unit 301 generates the tomographic data of the eye E to be inspected by converting the acquired output data (interference signal) of the line sensor 125 into a wave number and Fourier transforming it. Here, the tomographic data refers to data including information on the tomography of the subject, and includes a signal obtained by subjecting an interference signal by OCT to Fourier transform, a signal obtained by subjecting the signal to an arbitrary process, and the like. In addition, the reconstruction unit 301 can also generate a tomographic image as tomographic data based on the interference signal. The reconstructing unit 301 may generate tomographic data based on the interference signal of the eye to be inspected acquired by the image processing device 300 from the external device. Although the OCT apparatus according to this embodiment includes the SD type optical interference unit 100, it may also include a time domain (TD) type or wavelength sweep (SS) type optical interference unit.

The motion contrast image generation unit 302 generates motion contrast data from a plurality of tomographic data. The method of generating the motion contrast data will be described later. The motion contrast image generation unit 302 can generate three-dimensional motion contrast data from a plurality of three-dimensional tomographic data. In the following, three-dimensional tomographic data and three-dimensional motion contrast data are collectively referred to as three-dimensional volume data.

The layer recognition unit 303 analyzes the generated tomographic data of the eye E to be inspected and performs segmentation to identify an arbitrary layer structure in the retinal layer. The segmented result serves as a reference for the projection range when generating the OCTA front image as described later. For example, the layer boundary line shapes detected by the layer recognition unit 303 include ILM, NFL / GCL, GCL / IPL, IPL / INL, INL / OPL, OPL / ONL, IS / OS, OS / RPE, RPE / Choroid, and There are 10 types of BM. The object detected by the layer recognition unit 303 is not limited to this, and may be any structure included in the eye E to be inspected. As the segmentation method, any known method may be used.

The image generation unit 304 generates an image for display from the generated tomographic data and motion contrast data. The image generation unit 304 generates, for example, an En-Face image of brightness obtained by projecting or integrating 3D tomographic data on a 2D plane and an OCTA front image obtained by projecting or integrating 3D motion contrast data on a 2D plane. be able to. The display control unit 306 outputs the generated display image to the display unit 310. The storage unit 305 stores tomographic data and motion contrast data generated by the reconstruction unit 301, an image for display generated by the image generation unit 304, definitions of a plurality of depth ranges, definitions applied by default, and the like. be able to. The image generation unit 304 can generate an OCTA front image and an En-Face image having brightness according to the depth range acquired from the storage unit 305. The method of generating the OCTA front image and the like will be described later. Further, the storage unit 305 may include software or the like in order to realize each unit. The image generation unit 304 can also generate a fundus frontal image based on a signal acquired from a fundus camera (not shown) or an SLO optical system.

Further, the display unit 310, the pointing device 320, and the keyboard 321 are connected to the image processing device 300. The display unit 310 can be configured by using any monitor.

The pointing device 320 is a mouse provided with a rotary wheel and buttons, and can specify an arbitrary position on the display unit 310. Although the mouse is used as the pointing device in this embodiment, any pointing device such as a joystick, a touch pad, a trackball, a touch panel, or a stylus pen may be used.

As described above, the OCT device according to the present embodiment is configured by using the optical interference unit 100, the scanning optical system 200, the image processing device 300, the display unit 310, the pointing device 320, and the keyboard 321. In this embodiment, the optical interference unit 100, the scanning optical system 200, the image processing device 300, the display unit 310, the pointing device 320, and the keyboard 321 have separate configurations, but all or one of them is configured. The parts may be integrally formed. For example, the display unit 310 and the pointing device 320 may be integrally configured as a touch panel display. Similarly, a fundus camera and an SLO optical system (not shown) may be configured as separate devices.

The image processing device 300 may be configured using, for example, a general-purpose computer. The image processing device 300 may be configured by using a dedicated computer of the OCT device. The image processing device 300 includes a storage medium including a CPU (Central Processing Unit) and an MPU (Micro Processing Unit) (not shown), and a memory such as an optical disk and a ROM (Read Only Memory). Each component other than the storage unit 305 of the image processing device 300 may be composed of a software module executed by a processor such as a CPU or MPU. In addition, each component may be composed of a circuit that performs a specific function such as an ASIC, an independent device, or the like. The storage unit 305 may be configured by any storage medium such as an optical disk or a memory.

The image processing device 300 includes a processor such as a CPU and a storage medium such as a ROM, which may be one or a plurality. Therefore, when each component of the image processing device 300 is connected to at least one or more processors and at least one storage medium, and at least one or more processors executes a program stored in at least one storage medium. It may be configured to work. The processor is not limited to the CPU and MPU, and may be a GPU (Graphics Processing Unit), an FPGA (Field-Programmable Gate Array), or the like. Further, each component of the image processing device 300 may be realized by a separate device.

(Control method for tomographic imaging)
Next, a control method for taking a tomographic image of the eye E to be inspected will be described using the OCT apparatus according to this embodiment.

First, the examiner seats the patient who is the subject in front of the scanning optical system 200, inputs alignment, patient information, and the like, and then starts OCT imaging. The light emitted from the light source 101 passes through the optical fiber 102a and is divided into a measurement light toward the eye E to be inspected and a reference light toward the reference mirror 113 at the optical branching portion 103.

The measurement light directed to the eye E to be inspected passes through the optical fiber 102b, is emitted from the fiber end, is substantially parallelized by the collimating optical system 202, and is incident on the scanning unit 203. The scanning unit 203 has a galvano mirror, and the measurement light deflected by the mirror irradiates the eye E to be inspected via the lens 204. Then, the reflected light reflected by the eye E to be inspected follows the path in the reverse direction and is returned to the optical branching portion 103.

On the other hand, the reference light directed to the reference mirror 113 passes through the optical fiber 102c, is emitted from the fiber end, and reaches the reference mirror 113 through the collimating optical system 111 and the dispersion compensation optical system 112. The reference light reflected by the reference mirror 113 is returned to the optical branching portion 103 by following the path in the reverse direction.

The measurement light and the reference light that have returned to the optical branching portion 103 interfere with each other, become interference light, enter the optical fiber 102d, are substantially collimated by the collimated optical system 122, and enter the diffraction grating 123. The interference light input to the diffraction grating 123 is imaged on the line sensor 125 by the imaging lens 124. As a result, the line sensor 125 can be used to obtain an interference signal at one point on the eye E to be inspected.

The interference signal acquired by the line sensor 125 is output to the image processing device 300. The interference signal output from the line sensor 125 is 12-bit integer format data. The reconstruction unit 301 performs wave number transform, fast Fourier transform (FFT), and absolute value transform (acquisition of amplitude) on the 12-bit integer format data, and performs a fault in the depth direction at one point on the eye E to be inspected. Generate data. The data format of the interference signal and the like may be arbitrarily set according to the desired configuration.

After acquiring the interference signal at one point on the eye E to be inspected, the scanning unit 203 drives the galvano mirror and scans the measurement light at one adjacent point on the eye E to be inspected. The line sensor 125 detects the interference light based on the measurement light and acquires the interference signal. The reconstruction unit 301 generates tomographic data in the depth direction at the adjacent point on the eye E to be inspected based on the interference signal of the adjacent point. By repeating this series of control, tomographic data (two-dimensional tomographic data) relating to one tomographic image in one transverse direction (main scanning direction) of the eye E to be inspected can be generated.

Further, the scanning unit 203 drives a galvano mirror and scans the same location (same scanning line) of the eye E to be examined a plurality of times to acquire a plurality of tomographic data (two-dimensional tomographic data) at the same location of the eye E to be inspected. .. Further, the scanning unit 203 drives the galvano mirror to move the measurement light minutely in the sub-scanning direction orthogonal to the main scanning direction, and a plurality of tomographic data (adjacent scanning lines) at another location (adjacent scanning line) of the eye E to be inspected. Two-dimensional fault data) is acquired. By repeating this control, it is possible to acquire tomographic data (three-dimensional tomographic data) relating to a plurality of three-dimensional tomographic images in a predetermined range of the eye E to be inspected.

In the above, one tomographic data at one point of the eye E to be inspected is acquired by performing FFT processing on a set of interference signals obtained from the line sensor 125. However, it is also possible to divide the interference signal into a plurality of sets, perform FFT processing on each of the divided interference signals, and acquire a plurality of tomographic data from one interference signal. According to this method, it is possible to acquire more tomographic data than the number of times that the same part of the eye E to be inspected is actually scanned.

(Motion contrast data generation)
Next, a method of generating motion contrast data from tomographic data in the image processing apparatus 300 will be described.

The complex number format tomographic data generated by the reconstruction unit 301 is output to the motion contrast image generation unit 302. First, the motion contrast image generation unit 302 corrects the positional deviation of a plurality of tomographic data (two-dimensional tomographic data) at the same location of the eye E to be inspected. As the method for correcting the misalignment, any known method may be used. For example, the reference fault data may be selected as a template, and the amount of misalignment with the template may be acquired as the amount of misalignment with respect to each tomographic data. ..

The motion contrast image generation unit 302 obtains a decorrelation value between the two two-dimensional tomographic data in which the positional deviation has been corrected by the following equation (1).

Here, Axz indicates the amplitude of the tomographic data A at the position (x, z), and Bxz indicates the amplitude of the tomographic data B at the same position (x, z). The resulting decorrelation value Mxz takes a value from 0 to 1, and the larger the difference between the two amplitude values, the closer to 1.

The motion contrast image generation unit 302 obtains a plurality of decorrelation values by repeating the above decorrelation calculation for the number of acquired tomographic data, and obtains the average value of the plurality of decorrelation values to obtain the final motion. Acquire contrast data. The motion contrast image generation unit 302 can generate a motion contrast image by arranging the acquired motion contrast data at the corresponding pixel positions. Here, the motion contrast data is obtained based on the amplitude of the complex number data after the FFT, but the method of obtaining the motion contrast data is not limited to the above method. The motion contrast data may be obtained based on the phase information of the complex number data, or the motion contrast data may be obtained based on both the amplitude and phase information. It is also possible to obtain motion contrast data based on the real part and the imaginary part of the complex number data. Further, the motion contrast image generation unit 302 may perform the same processing on each pixel value of the two-dimensional tomographic image to obtain the motion contrast data.

Further, in the above method, the motion contrast data is acquired by calculating the decorrelation value of the two values, but the motion contrast data may be obtained based on the difference between the two values, or the ratio of the two values may be obtained. Motion contrast data may be obtained based on the above. Further, the motion contrast data may be obtained based on the variance value of the tomographic data. Further, in the above, the final motion contrast data is obtained by obtaining the average value of the acquired plurality of decorrelation values, but the final values, the difference, the maximum value, the median value, etc. of the plurality of decorrelation values are obtained. Motion contrast data may be used. The two tomographic data used when acquiring the motion contrast data may be the data acquired at a predetermined time interval.

(Generation of OCTA front image)
Next, in the image processing apparatus 300, a procedure for defining a depth range for generating an OCTA front image will be described.

The OCTA front image is a front image obtained by projecting or integrating a three-dimensional motion contrast image (three-dimensional motion contrast data) onto a two-dimensional plane in an arbitrary depth range. The depth range can be set arbitrarily. Generally, from the retina to the choroidal side, the superficial layer of the retina (SCP: Superficial Capillary), the deep layer of the retina (Deep Capillary), the outer layer of the retina (Outer Retina), and the radial peri-papillary capillaries (RPC) Depth ranges such as choriocapillaris and Lamina Cribrosa are defined.

Each definition is defined for the layer boundary of the retina, for example, the superficial retinal layer is defined as ILM + 0 μm to GCL / IPL + 50 μm. Here, GCL / IPL means the boundary between the GCL layer and the IPL layer. Further, in the following, an offset amount such as +50 μm or -100 μm means that a positive value shifts to the choroid side and a negative value shifts to the pupil side.

When confirming new blood vessels of exudative age-related macular degeneration, the outer layer of the retina and the choroidal capillary plate are often used as the depth range. The outer layer of the retina is often defined as OPL / ONL + 0 μm to RPE / Choroid + 0 μm, but as will be described later, the depth range can be adjusted depending on the size of CNV, the location of occurrence (depth position), and the like.

As a method of projecting the data corresponding to the depth range on the two-dimensional plane, for example, a method of using the representative value of the data in the depth range as the pixel value on the two-dimensional plane can be used. Here, the representative value can include a value such as an average value, a median value, or a maximum value of pixel values within the depth range.

The brightness En-Face image is a front image obtained by projecting or integrating a three-dimensional tomographic image on a two-dimensional plane in an arbitrary depth range. The brightness En-Face image may be generated in the same manner as the OCTA front image by using a three-dimensional tomographic image instead of the three-dimensional motion contrast image. Further, the brightness En-Face image may be generated by using the three-dimensional tomographic data.

Further, the depth range of the OCTA front image and the En-Face image is determined based on the retinal layer detected by the segmentation process of the two-dimensional tomographic data (or the two-dimensional tomographic image) of the three-dimensional volume data. Can be done. Further, the depth range may be a range including a predetermined number of pixels in a deeper direction or a shallower direction with reference to one of the two layer boundaries relating to the retinal layer detected by these segmentation processes.

Further, the depth range may be configured so that it can be changed according to a desired configuration. For example, the depth range can be a range modified (offset) according to the operator's instructions from the range between the two layer boundaries with respect to the detected retinal layer. At this time, the operator can change the depth range by, for example, moving an index indicating the upper limit or the lower limit of the depth range superimposed on the tomographic image.

(Image generator)
FIG. 3 is a diagram for explaining the image generation unit 304. The image generation unit 304 includes a projection range control unit 341 and a front image generation unit 342. The projection range control unit 341 of the front image is based on the motion contrast image generated by the motion contrast image generation unit 302, the layer recognition result by the layer recognition unit 303, and the depth range stored in the storage unit 305. Specify the 3D motion contrast data used for generation. The front image generation unit 342 projects or integrates the motion contrast data specified by the projection range control unit 341 on a two-dimensional plane to generate an OCTA front image.

Similarly, the projection range control unit 341 uses the three-dimensional tomographic image (three-dimensional tomographic data), the layer recognition result, and the three-dimensional tomographic data used to generate the En-Face image of the brightness based on the depth range. Can be identified. In this case, the front image generation unit 342 can project or integrate the tomographic data specified by the projection range control unit 341 on a two-dimensional plane to generate an En-Face image of brightness.

(Report screen)
FIG. 4 shows an example of the GUI 400 for displaying an image including an OCTA front image generated by the image generation unit 304. The GUI 400 shows a tab 401 for screen selection, and in the example shown in FIG. 4, the report screen (Report tab) is selected. In addition to the report tab, the GUI 400 may include a patient screen (Patient tab) for selecting a patient, an imaging screen (OCT Capture tab) for performing imaging, and the like.

The inspection selector 408 is provided on the left hand side of the report screen, and the display area is provided on the right hand side. The examination selector 408 displays a list of examinations performed so far for the currently selected patient, and when one of them is selected, the display control unit 306 displays the examination result in the display area on the right side of the report screen. Let me.

In the display area, an SLO image 406 generated by using an SLO optical system (not shown) is shown, and an OCTA front image is superimposed and displayed on the SLO image 406. Further, in the display area, the first OCTA front image 402, the first tomographic image 403, the brightness En-Face image 407, the second OCTA front image 404, and the second tomographic image 405 are displayed. .. A pull-down is provided on the upper part of the En-Face image 407, and EnfaceImage1 is selected. This means that the depth range of the En-Face image is the same as the depth range of OCTA Image 1 (first OCTA front image 402). By the pull-down operation, the operator can make the depth range of the En-Face image the same as the depth range of the OCTA Image 2 (second OCTA front image 404) and the like.

On the first tomographic image 403, the depth range when the first OCTA front image 402 is generated is shown by a broken line. In the example of GUI400, the depth range of the first OCTA front image 402 is the superficial layer of the retina (SCP). Further, the second OCTA front image 404 is an image generated using data in a depth range different from that of the first OCTA front image 402. On the second tomographic image 405, the depth range when the second OCTA front image 404 is generated is shown by a broken line. Here, the depth range of the second OCTA front image 404 is CNV, and in this example, it is in the range of OPL / ONL + 50 μm to BM + 10 μm.

The depth range of the first OCTA front image 402 and the second OCTA front image 404 can be set according to the pull-down operation provided on the upper part of these images. Further, these depth ranges may be set in advance or may be set according to the operation of the operator. Here, the image generation unit 304 can function as a target designation unit for designating an extraction target (target area) to be the target of the following extraction processing based on the setting. The pull-down for setting the depth range of the OCTA front image corresponds not only to the depth range for each layer such as the superficial layer of the retina described above, but also to an abnormal part or the like that is desired to be extracted by the following processing such as CNV. Ranges may be included.

In this embodiment, when the operator double-clicks the second OCTA front image 404, the display control unit 306 switches the screen to be displayed on the display unit 310 from the GUI 400 shown in FIG. 4 to the GUI 500 shown in FIG. The GUI500 includes OCTA frontal images 501,505,509,513, corresponding tomographic images 503,507,511,515, and depth ranges 502,506,510,514 for four different depth range settings. Is displayed.

In connection with this, the image generation unit 304 designates the CNV corresponding to the depth range of the second OCTA front image 404 as the extraction target (target area). The image generator 304 has corresponding OCTA front images 501,505,509, based on four depth ranges 502,506,510,514 pre-stored in the storage unit 305 for the CNV designated as the extraction target. Generate 513. The display control unit 306 causes the display unit 310 to display the generated OCTA front image 501,505,509,513, the corresponding tomographic images 503,507,511,515, and the depth range 502,506,510,514.

In this example, as the depth range 502 for the leftmost image (OCTA front image 501), a depth range assuming a Type 1 CNV is set, and is BM + 0 μm to BM + 20 μm. Here, the type 1 CNV means a CNV below the RPE / Choroid.

The depth range 506 for the second image from the left (OCTA front image 505) is a depth range assuming a very small CNV slightly above the BM, and is BM-20 μm to BM + 0 μm. The depth range 510 for the third image from the left (OCTA front image 509) is a depth range assuming a large CNV generated above the BM, and is BM-100 μm to BM + 0 μm. The depth range 514 for the fourth image from the left (OCTA front image 513) is a depth range that covers the entire outer layer of the retina, and is a depth range (OPL + 50 μm to BM + 10 μm) assuming a considerably large CNV.

Selection buttons

504, 508, 521, 516 are displayed at the bottom of the GUI 500, and the operator selects the selection button to display an OCTA front image (front image to be displayed) preferable for use in diagnosis or the like. You can select from the displayed OCTA front image.

In the embodiment, when the operator presses the selection button 512, the display control unit 306 switches the screen displayed on the display unit 310 to the GUI 600 of the report screen shown in FIG. The GUI 600 is similar to the GUI 400, but the second OCTA front image 404 and the second tomographic image 405 show the depth range for the OCTA front image 509 and OCTA front image 509 based on the conditions selected by the operator. It has been switched to the tomographic image 511.

As described above, in this embodiment, the OCTA front image relating to a plurality of depth ranges according to the extraction target is provided to the operator, and the operator selects a preferable image to display the OCTA front image for the optimum depth range. be able to. This can reduce the risk of oversight of lesions and reduce additional work such as image quality adjustment by doctors and laboratory technicians.

Next, a series of processes according to this embodiment will be described with reference to FIGS. 7 and 8. FIG. 7 is a flowchart of a series of processes according to this embodiment, and FIG. 8 is a flowchart of a front image generation process according to this embodiment. When a series of processing is started, first, in step S701, the image processing apparatus 300 acquires a three-dimensional interference signal relating to the eye to be inspected E from the line sensor 125 of the optical interference unit 100, and the reconstruction unit 301 acquires the three-dimensional tomographic data. Is generated and acquired. At this time, the reconstruction unit 301 can also generate a three-dimensional tomographic image based on the three-dimensional tomographic data. The image processing device 300 may acquire a three-dimensional interference signal, three-dimensional interference data, a three-dimensional tomographic image, or the like related to the eye E to be inspected from a connected external device (not shown).

When the 3D tom data is acquired by the reconstruction unit 301, the motion contrast image generation unit 302 generates and acquires the 3D motion contrast data (3D motion contrast image) based on the 3D tom data.

Next, in step S702, the image generation unit 304 specifies the extraction target (target area) according to the preset setting or the instruction of the operator. At this time, the layer recognition unit 303 can segment the three-dimensional tomographic data and acquire the layer recognition result. Further, the image generation unit 304 generates the first OCTA front image 402 and the like based on the three-dimensional volume data, the layer recognition result, the setting of the predetermined depth range, and the like, and the display control unit 306 displays the GUI 400 on the display unit 310. It may be displayed in. In this case, the operator can input an instruction regarding the extraction target by operating a pull-down or the like regarding the OCTA front image. When the extraction target is specified, the process proceeds to step S703.

In step S703, the image generation unit 304 starts the front image generation process according to this embodiment. In the front image generation process according to the present embodiment, first, in step S801, the image generation unit 304 specifies a plurality of depth ranges stored in the storage unit 305 corresponding to the designated extraction target. Further, the projection range control unit 341 of the image generation unit 304 specifies the three-dimensional motion contrast data used for generating the OCTA front image based on the specified plurality of depth ranges, three-dimensional motion contrast data, and layer recognition result. .. The front image generation unit 342 generates a plurality of OCTA front images corresponding to a plurality of depth ranges based on the specified three-dimensional motion contrast data.

In step S802, the display control unit 306 causes the display unit 310 to display the generated plurality of OCTA front images. At this time, the display control unit 306 can display the information regarding the corresponding depth range on the display unit 310 together with the generated plurality of OCTA front images. Here, the information regarding the corresponding depth range may be numerical information indicating the depth range, a broken line indicating the depth range on the tomographic image, or both.

In step S803, the operator specifies a preferable OCTA front image for diagnosis or the like from a plurality of OCTA front images displayed on the display unit 310. The image processing device 300 selects the OCTA front image to be displayed according to the instruction of the operator. The instruction by the operator may be given, for example, by selecting a selection button in the GUI 500 shown in FIG.

When the image processing device 300 selects the OCTA front image to be displayed, in step S704, the display control unit 306 causes the display unit 310 to display the selected OCTA front image. As a result, it is possible to display an OCTA front image in which it is easy to confirm the target structure. In this embodiment, the OCTA front image is generated and displayed, but the generated / displayed image may be an En-Face image having brightness. In this case, the same process as the above process may be performed by using the tomographic data instead of the motion contrast data.

As described above, the image processing device 300 according to this embodiment includes an image generation unit 304 and a display control unit 306. The image generation unit 304 also functions as a target designation unit for designating an extraction target from the three-dimensional volume data of the eye E to be inspected. The display control unit 306 causes the display unit 310 to display a plurality of front images corresponding to different depth ranges of the three-dimensional volume data side by side using the information of the designated target area. The projection range control unit 341 of the image generation unit 304 determines the depth range for generating a plurality of front images using the designated information of the extraction target. In particular, in the image processing apparatus 300 according to the present embodiment, the extraction target is a neovascularization (CNV), and the three-dimensional volume data is three-dimensional motion contrast data. Further, each depth range for generating a plurality of front images is, for example, a depth range within a range of 0 to 50 μm from the outer layer of the retina or the Bruch's membrane to the choroid side.

According to such a configuration, by providing the operator with a front image relating to a plurality of depth ranges according to the extraction target, a front image in which it is easy to confirm the target area such as the target structure is displayed. be able to. This can reduce the risk of oversight of lesions and reduce additional work such as image quality adjustment by doctors and laboratory technicians.

In this embodiment, the projection range control unit 341 of the image generation unit 304 determines the depth range for generating a plurality of front images based on the designation of the extraction target. Here, the projection range control unit 341 of the image generation unit 304 is, for example, the type of three-dimensional volume data, the layer or depth range to be extracted, the number of front images to be generated, the depth range to generate the front image, and the front. It can serve as an example of a determinant that determines at least one of the intervals in the depth range that produces the image. The determination unit may be configured as a component separate from the image generation unit 304.

Further, in this embodiment, an image corresponding to a plurality of depth ranges is displayed separately from the GUI 400 shown in FIG. 4, as in the GUI 500 shown in FIG. 5, but the present invention is not limited to this. For example, images in a plurality of depth ranges may be displayed side by side on the GUI 400. Further, the images in a plurality of depth ranges may be displayed while being temporally switched to the display area of the second OCTA front image 404 of the GUI 400, or may be switched and displayed according to the instruction of the operator. In this case, an image in a preferable depth range may be selected from the images displayed by switching by an operation on the GUI 400.

In this embodiment, the OCTA front image corresponding to a plurality of depth ranges is displayed as in the GUI 500 shown in FIG. 5, but the present invention is not limited to this. For example, the images whose projection method has been changed may be displayed side by side together with the depth range, and the operator may select the images. Here, the projection method may be any known method such as maximum value projection or average value projection. Even within the same depth range, the appearance of the front image changes depending on the projection method. Therefore, in such a case, the operator can be made to select the front image corresponding to the preferable projection method.

Further, in this embodiment, the generated image is displayed on the display unit 310, but for example, it may be output to an external device such as an external server. Further, the different depth ranges corresponding to the plurality of front images may be partially overlapping depth ranges. It should be noted that these contents can be similarly applied to the following various examples and modifications.

(Modified Example 1)
In the first embodiment, an example is shown in which the operator double-clicks the front image to display the front image in a plurality of depth ranges, but the processing for displaying the front image in a plurality of depth ranges is limited to this. I can't. For example, at the stage when the report screen of the GUI 400 is displayed, a front image corresponding to a plurality of depth ranges set for the target disease may be displayed and the operator may select the front image.

In another example, it is determined whether or not the eye E to be inspected has an abnormality such as CNV when the examination is performed, and when it is determined that there is an abnormality, a front image for a plurality of depth ranges is displayed. The operator may be allowed to select the most suitable image. In addition, when an OCTA test is performed on a patient having a disease such as a patient with exudative age-related macular degeneration having CNV, a front image corresponding to a plurality of depth ranges set for the disease is displayed. , The operator may be allowed to select the most suitable image. The determination as to whether or not there is an abnormality may be performed by any known method.

(Example 2)
In Example 1, a plurality of OCTA front images corresponding to a plurality of depth ranges are displayed, and the operator selects a preferred image among them to provide an OCTA front image projected in a preferred depth range. The image processing apparatus according to the second embodiment is further provided with an image evaluation unit 343 in the image generation unit 304, and can provide the operator with information indicating the evaluation of the existence of the extraction target in the image together with the OCTA front image. different.

Hereinafter, the image processing apparatus according to this embodiment will be described with reference to FIGS. 9 to 13. The configuration of the image processing apparatus according to the present embodiment is the same as the configuration of the image processing apparatus according to the first embodiment except that the image evaluation unit 343 is added to the image generation unit 304. The description will be omitted using reference numerals. Hereinafter, the image processing apparatus according to the present embodiment will be described focusing on the difference from the image processing apparatus 300 according to the first embodiment.

FIG. 9 is a diagram for explaining the image generation unit 304 according to this embodiment. As shown in the figure, the image generation unit 304 according to this embodiment is provided with an image evaluation unit 343 in addition to the projection range control unit 341 and the front image generation unit 342.

The image evaluation unit 343 evaluates the OCTA front image corresponding to a plurality of depth ranges generated by the front image generation unit 342, and acquires information indicating an evaluation indicating the presence of new blood vessels (CNV) in each OCTA front image. To do. The information indicating the evaluation may be an evaluation value, or may be information indicating the presence or absence of existence and the possibility thereof. For example, the information indicating the evaluation may be information that the eye to be inspected has or does not have an extraction target such as CNV, or there is a suspicion that an extraction target such as CNV exists. In this embodiment, the image evaluation unit 343 acquires an evaluation value from the OCTA front image using a trained model trained using a neural network as a machine learning model.

FIG. 10A shows an example of a neural network used as a machine learning model, and FIG. 10B shows an example of learning data according to this embodiment. In the neural network, the feature points of the input data are extracted, and the output data is estimated from the feature points according to the weights between the nodes determined according to the learning.

In this embodiment, the OCTA front image is used as the input data of the learning data, and the evaluation value evaluated for the presence of CNV in the OCTA front image is used as the output data of the learning data. The evaluation value is a value of 0 to 1, and indicates whether or not CNV is included in the OCTA front image. The maximum value of the evaluation value is 1, and the larger the value, the higher the probability that the OCTA front image contains CNV. In the example shown in FIG. 10B, six types of OCTA front images are shown as input data and three levels of values are shown as output data, but in reality, more OCTA front images are used as input data and labeling related to output data is performed. The number of stages may be increased. In neural network training, OCTA front image that becomes input data by performing so-called augmentation such as rotating the image, flipping it upside down, left and right, and changing the cropping range of the image. May be increased.

As the OCTA front image used as the input data of the learning data and the input data at the time of actual operation, an image from which the projection artifacts have been removed can be used. Here, the projection artifact is a phenomenon in which blood vessels such as the surface layer of the retina are reflected in a layer below the surface layer. Any known method may be used as the algorithm for removing the projection artifacts.

Further, as the input data of the learning data, not only the OCTA front image of various examples including CNV but also the image of a healthy eye can be used together. Further, the OCTA front image of another diseased eye may be included in the input data of the learning data to be trained.

As the output data of the learning data, an evaluation value in which a doctor or the like evaluates the presence of CNV in the OCTA front image which is the input data of the learning data is used. In the example shown in FIG. 10B, the evaluation values of three stages of 0, 0.5, and 1 are used as the output data of the training data, but even if the evaluation values of a larger number of stages are used as described above. Good. Further, the evaluation standard may be arbitrary, and for example, the evaluation value may be determined according to the clarity of CNV, or the evaluation value may be set to 1 when CNV appears even a little.

When data is input to the trained model of such a machine learning model, the data according to the design of the machine learning model is output. For example, output data that is likely to correspond to the input data is output according to the tendency of learning using the learning data. In the trained model according to the present embodiment, when the OCTA front image is input, the evaluation value for evaluating the presence of CNV in the input OCTA front image is output according to the learning tendency.

In the trained model that has undergone such learning, the ratio of the evaluation value corresponding to the input data to the evaluation value of each stage regarding the output data of the training data is output according to the configuration of the machine learning model. .. In this case, the image evaluation unit 343 may calculate the final evaluation value from the ratio of the evaluation values of each stage output from the trained model. For example, the image evaluation unit 343 may calculate the final evaluation value by multiplying each evaluation value by the corresponding ratio and dividing the added value by the total of the ratios. In this case, for example, when the ratio of the evaluation value of 0 is 0.2, the ratio of the evaluation value of 0.5 is 0.8, and the ratio of the evaluation value of 1 is 0, the image evaluation unit 343 is final. 0.4 can be calculated as a typical evaluation value. The method of calculating the final evaluation value is not limited to this, and any method may be used, for example, the final evaluation value having a ratio higher than other ratios.

Next, the front image generation process according to this embodiment will be described with reference to FIG. FIG. 11 is a flowchart of the front image generation process according to the present embodiment. Since the flow of a series of processes other than the front image generation process is the same as the series of processes according to the first embodiment, the description thereof will be omitted. Further, since step S1101 is the same step as step S801 according to the first embodiment, the description thereof will be omitted. When a plurality of OCTA front images are generated in step S1101, the process proceeds to step S1102.

In step S1102, the image evaluation unit 343 evaluates each of the generated plurality of OCTA front images using the trained model, and acquires an evaluation value for evaluating the presence of CNV in each OCTA front image. When a plurality of evaluation values corresponding to the plurality of OCTA front images are acquired, the process proceeds to step S1103.

In step S1103, the display control unit 306 displays a plurality of OCTA front images corresponding to a plurality of depth ranges side by side together with their respective evaluation values and depth ranges. Here, FIG. 12 shows an example of a GUI that the display control unit 306 displays on the display unit 310. The GUI 1200 shown in FIG. 12 is similar to the GUI 500 shown in FIG. 5, but the

evaluation values

1217, 1218, 1219, 1220 of each OCTA front image are placed on the upper part of each OCTA

front image

501, 505, 509, 520. It is shown.

In step S1104, the operator specifies a preferable OCTA front image for diagnosis or the like from a plurality of OCTA front images displayed on the display unit 310 based on the OCTA front image and its evaluation value. The image processing device 300 selects the OCTA front image to be displayed according to the instruction of the operator. At this time, the operator can specify the OCTA front image to be displayed by, for example, operating the

selection buttons

504, 508, 512, 516. Since the subsequent processing is the same as the processing according to the first embodiment, the description thereof will be omitted.

In this embodiment, the evaluation value (information indicating the evaluation) that evaluates the existence of the extraction target in the front image is displayed together with the plurality of front images. By referring to these evaluation values, the operator can more accurately select the optimum image. Therefore, for example, even when the image quality difference between the plurality of front images is small, it is possible to reduce the individual difference by the operator when selecting the image to be displayed.

When displaying the selected front image on the display unit 310, information indicating the evaluation of the front image (evaluation value 1310) may be displayed together with the selected front image as in the GUI 1300 shown in FIG. Good. In this case, information indicating the evaluation of the front image can be confirmed on the report screen as well.

As described above, the image processing device 300 according to this embodiment includes an image generation unit 304 and an image evaluation unit 343. The image generation unit 304 generates a plurality of front images corresponding to different depth ranges of the three-dimensional volume data of the eye E to be inspected. The image evaluation unit 343 obtains a plurality of information corresponding to the plurality of front images, which is information indicating the evaluation of evaluating the existence of the target region by using the plurality of front images. Further, the image generation unit 304 can function as an example of a determination unit that determines a front image (output image) to be displayed using the plurality of information. In particular, the image generation unit 304 according to this embodiment uses the plurality of information to determine at least one of the plurality of front images as a front image to be displayed. Further, the image processing device 300 includes a display control unit 306 that controls the display of the display unit 310. The display control unit 306 according to the present embodiment causes the display unit 310 to display the acquired information indicating the plurality of evaluations side by side with the plurality of front images. The determination unit may be configured as a component separate from the image generation unit 304.

According to such a configuration, by providing the operator with front images relating to a plurality of depth ranges according to the extraction target together with information indicating the evaluation of the existence of the extraction target, the target structure or the like can be targeted. It is possible to display a front image in which the area can be easily confirmed. This can reduce the risk of oversight of lesions and reduce additional work such as image quality adjustment by doctors and laboratory technicians. Further, by displaying a plurality of front images side by side together with a plurality of information indicating the evaluation, it is possible for the operator to easily specify an appropriate front image for diagnosis or the like from the plurality of front images.

Although the example of generating and displaying the OCTA front image as the front image has been described in this embodiment, the brightness En-Face image may be generated and displayed as the front image as in the first embodiment. Further, also in the case of this embodiment, if the operator thinks that it is not preferable, the image processing device 300 may be configured so that the depth range of the generated front image can be manually adjusted. Further, as described above, in the present embodiment, the image generation unit 304 determines the image to be finally displayed, but the determined image may be output to an external device or the like. Therefore, the image generation unit 304 may be able to determine, for example, an output image to be output to the display unit 310 or an external device.

Further, the image evaluation unit 343 according to this embodiment acquires the evaluation value by using the trained model, but the present invention is not limited to this, and the evaluation value may be acquired by using so-called rule-based image processing. For example, when calculating an evaluation value for evaluating the presence of CNV, the image evaluation unit 343 removes granular noise from the tomographic image, then emphasizes the tubular region with a Hessian filter, and integrates the emphasized image. The evaluation value may be calculated. Further, the image evaluation unit 343 may binarize the emphasized image and acquire the evaluation value according to the presence of pixels exceeding the threshold value.

(Modification 1 of Example 2)
In the second embodiment, an example in which the evaluation value is displayed together with the front image has been described, but what is displayed together with the front image is not limited to the evaluation value. For example, it is known that there are two types of CNV derived from age-related macular degeneration. Type 1 is when there are new blood vessels below the RPE / Choroid, and type 2 is when there are new blood vessels from the bottom to the top of the RPE / Choroid. Different types have different appearances of CNV in the front image. Therefore, the machine learning model used by the image evaluation unit 343 may separately learn front images including these different types of CNVs. In this case, as the trained model, a plurality of trained models trained for each type of CNV may be prepared.

By performing such learning, the image evaluation unit 343 can calculate the evaluation value for each type of CNV. Further, as shown in FIG. 13, the display control unit 306 can display the CNV type having the higher evaluation value and the evaluation value thereof on the display unit 310. In such a case, the operator can confirm the type of age-related macular degeneration CNV contained in the front image in addition to the evaluation value of the front image.

In the example shown in FIG. 13, the CNV type and the evaluation value are displayed, but the present invention is not limited to this, and the evaluation value for each type may be displayed. If it is estimated that there is no CNV in the front image, it may be displayed that there is no CNV instead of the CNV type display.

In addition to determining the CNV type from the front image, the image evaluation unit 343 may determine the CNV type from the depth range of the front image. In the modified example, an example of generating and displaying an OCTA front image as a front image has been described, but as in the second embodiment, a brightness En-Face image may be generated and displayed as a front image. Further, also in the case of this modification, if the operator thinks that it is not preferable, the image processing device 300 may be configured so that the depth range of the generated front image can be manually adjusted.

(Modification 2 of Example 2)
In the second embodiment, the structure and training data of the neural network related to the trained model used by the image evaluation unit 343 have been described with reference to FIGS. 10A and 10B, but the present invention is not limited to this. In this modification, a configuration using a U-net type convolutional neural network (CNN) as an example of a machine learning model will be described.

Hereinafter, CNN will be described with reference to FIG. 14 as an example of the trained model according to this modified example. The trained model shown in FIG. 14 is composed of a plurality of layers responsible for processing and outputting an input value group. The types of layers included in the configuration 1401 of the trained model include a convolution layer, a Downsampling layer, an Upsampling layer, and a Merger layer.

The convolution layer is a layer that performs convolution processing on the input value group according to parameters such as the set filter kernel size, the number of filters, the stride value, and the dilation value. The number of dimensions of the kernel size of the filter may be changed according to the number of dimensions of the input image.

The downsampling layer is a layer that performs processing to reduce the number of output value groups to less than the number of input value groups by thinning out or synthesizing the input value groups. Specifically, as such a process, for example, there is a Max Polling process.

The upsampling layer is a layer that performs processing to increase the number of output value groups to be larger than the number of input value groups by duplicating the input value group or adding the interpolated value from the input value group. Specifically, as such a process, for example, there is a linear interpolation process.

The composite layer is a layer in which a value group such as an output value group of a certain layer or a pixel value group constituting an image is input from a plurality of sources, and the processing is performed by concatenating or adding them.

As the parameters set in the convolution layer group included in the configuration 1401 shown in FIG. 14, for example, the kernel size of the filter is 3 pixels in width, 3 pixels in height, and the number of filters is 64, so that a certain degree of accuracy is achieved. Can be processed. However, it should be noted that if the parameter settings for the layers and nodes that make up the neural network are different, the degree to which the tendency trained from the teacher data can be reproduced in the output data may differ. That is, in many cases, the appropriate parameters differ depending on the embodiment, and therefore, the values can be changed to preferable values as needed.

In addition to the method of changing the parameters as described above, there are cases where the CNN can obtain better characteristics by changing the configuration of the CNN. Better characteristics include, for example, higher processing accuracy, shorter processing time, and shorter training time for machine learning models.

The CNN configuration 1401 used in this modification has a function of an encoder composed of a plurality of layers including a plurality of downsampling layers and a function of a decoder composed of a plurality of layers including a plurality of upsampling layers. It is a net type machine learning model. In the U-net type machine learning model, position information (spatial information) that is ambiguous in a plurality of layers configured as encoders is displayed in layers of the same dimension (layers corresponding to each other) in a plurality of layers configured as a decoder. ) (For example, using a skip connection).

Although not shown, as an example of changing the configuration of the CNN, for example, a batch normalization layer or an activation layer using a rectifier liner unit may be incorporated after the convolution layer. ..

When data is input to the trained model of such a machine learning model, the data according to the design of the machine learning model is output. For example, output data that is likely to correspond to the input data is output according to the tendency trained using the training data. In this modified example, for the training data, the input data is the OCTA front image of the layer in which CNV occurs when age-related macular degeneration develops, the output data is white only in the region where CNV is present with respect to the OCTA front image, and the rest. Is composed of an image pair to be a binary image in black. An example of the learning data in this case is shown in FIG.

For learning, along with images of various examples of CNV, images of healthy eyes are also learned. All binary images of healthy eyes are black.

The trained model learned in this way can output a binary image as if only the region where the CNV exists was segmented. As a result, the image evaluation unit 343 can acquire a binary image showing the region where the CNV exists by inputting the OCTA front image into the trained model. The image evaluation unit 343 can calculate an evaluation value indicating the possibility that CNV is present based on the white area in the acquired binary image. As a method of calculating the evaluation value, for example, if the acquired binary image includes a white area, the evaluation value may be set to 1, or when the total area (number of pixels) of the white area is equal to or larger than the threshold value. , The evaluation value may be 1. Further, the threshold value may be set stepwise, and the evaluation value may be determined according to the threshold value when the total area of the white area exceeds.

Another use of such a trained model is to estimate the size of the CNV. In this case, the image evaluation unit 343 may calculate the area of the white region in the binary image acquired from the trained model as the size of the CNV. At this time, the display control unit 306 can display the size of the CNV included in the OCTA front image as well as the OCTA front image to be displayed.

Although an example of using a binary image as the output data of the learning data has been described, the value of the image used as the output data is not limited to the binary value. The image whose value is changed according to the evaluation value of CNV may be used as the output data of the learning data. The evaluation value of CNV in this case may be an evaluation value labeled by a doctor or the like as in Example 2. Further, as described in the first modification of the second embodiment, the CNV type may be distinguished and learned.

In this modified example, a binary image was acquired using a neural network, but it is not limited to this and can be realized by so-called rule-based image processing. For example, after removing granular noise from the OCT front image, the tubular area may be emphasized with a Hessian filter, and the emphasized image may be binarized. As the method for calculating the evaluation value in this case, the same method as the above-mentioned method may be used.

In this modification, an example of generating and displaying an OCTA front image as a front image has been described, but as in the second embodiment, a brightness En-Face image may be generated and displayed as a front image. Further, also in the case of this modification, if the operator thinks that it is not preferable, the image processing device 300 may be configured so that the depth range of the generated front image can be manually adjusted. Further, in this modification, the binary image is an image shown by binary values of white and black, but the binary image may be any two labels.

(Example 3)
In the second embodiment, the image evaluation unit 343 is provided in the image generation unit 304 to acquire and display the evaluation values for a plurality of OCTA front images. In the image processing apparatus according to the third embodiment, by providing the front image determination unit 344 in the image generation unit 304, the OCTA front image having an optimum depth range is automatically output without the intervention of an operator.

Hereinafter, the image processing apparatus according to this embodiment will be described with reference to FIGS. 16 and 17. The configuration of the image processing apparatus according to the present embodiment is the same as the configuration of the image processing apparatus according to the second embodiment except that the front image determination unit 344 is added to the image generation unit 304. The description will be omitted using reference numerals. Hereinafter, the image processing apparatus according to the present embodiment will be described focusing on the difference from the image processing apparatus 300 according to the second embodiment.

FIG. 16 is a diagram for explaining the image generation unit 304 according to this embodiment. As shown in the figure, the image generation unit 304 according to this embodiment is provided with a front image determination unit 344 in addition to the projection range control unit 341, the front image generation unit 342, and the image evaluation unit 343.

The front image determination unit 344 according to this embodiment should display the OCTA front image corresponding to the maximum evaluation value (evaluation value higher than other evaluation values) among the evaluation values calculated for the plurality of OCTA front images. Determine and select as OCTA front image.

Next, with reference to FIG. 17, the front image generation process according to this embodiment will be described. FIG. 17 is a flowchart of the front image generation process according to the present embodiment. Since the flow of a series of processes other than the front image generation process is the same as the series of processes according to the second embodiment, the description thereof will be omitted. Further, since steps S1701 and S1702 are the same steps as steps S1101 and S1102 according to the second embodiment, the description thereof will be omitted. When a plurality of evaluation values are acquired for the plurality of OCTA front images in step S1702, the process proceeds to step S1703.

In step S1703, the front image determination unit 344 determines the OCTA front image having the maximum evaluation value (higher than other evaluation values) as the OCTA front image to be displayed from among the generated plurality of OCTA front images. ·select. Since the subsequent processing is the same as the processing according to the second embodiment, the description thereof will be omitted.

In this embodiment, the front image determination unit 344 of the image generation unit 304 determines as the front image to display the front image corresponding to the evaluation value higher than the other evaluation values among the acquired plurality of evaluation values. Functions as an example of the department. As a result, the OCTA front image in the optimum depth range can be displayed without the intervention of an operator, and the processing efficiency can be improved. The front image determination unit 344 may be configured as a component separate from the image generation unit 304.

If the plurality of evaluation values for the plurality of OCTA front images are all below the threshold value (for example, 0.2), it is considered that CNV is not included, and a projection image in a predetermined depth range may be displayed. The threshold value in this case may be arbitrarily set according to the desired configuration. Further, in this case, it may be displayed together with the evaluation value or instead that there is no CNV.

Although the example of generating and displaying the OCTA front image as the front image has been described in this embodiment, the brightness En-Face image may be generated and displayed as the front image as in the second embodiment. Further, also in the case of this embodiment, if the operator thinks that it is not preferable, the image processing device 300 may be configured so that the depth range of the generated front image can be manually adjusted. It should be noted that these processes can also be applied to the following modified examples of this embodiment.

Further, in the present embodiment, the front image determination unit 344 selects the OCTA front image having the highest evaluation value, but the front image determination unit 344 may select the OCTA front image whose evaluation value exceeds the threshold value. In this case, the front image determination unit 344 may select a plurality of OCTA front images corresponding to the plurality of evaluation values when the plurality of evaluation values exceed the threshold value. In this case, the display control unit 306 may display the plurality of evaluation values and a plurality of OCTA front images corresponding thereto while switching between them. Further, the front image determination unit 344 may select the OCTA front image to be displayed independently from the plurality of OCTA front images according to the instruction of the operator.

(Modification 1 of Example 3)
In the third embodiment, by providing the front image determination unit 344, the front image in the optimum depth range is automatically selected as the front image (output image) to be displayed without the intervention of the operator. The method of determining the depth range corresponding to the front image to be displayed is not limited to this.

Hereinafter, a method of determining the depth range corresponding to the front image to be displayed according to the first modification of the third embodiment will be described with reference to FIGS. 18 to 22. In the third embodiment, the front image determination unit 344 selects the front image corresponding to the depth range in which the evaluation value is maximum, but in the present modification, the front image determination unit 344 has the evaluation value of the front image. The difference is that the depth ranges that are equal to or greater than the threshold value are connected to form the depth range of the front image to be displayed.

FIG. 18 is a flowchart of the front image generation process according to this modification. Since the flow of a series of processes other than the front image generation process is the same as the series of processes according to the third embodiment, the description thereof will be omitted. Further, since steps S1801 and S1802 are the same steps as steps S1701 and S1702 according to the third embodiment, the description thereof will be omitted. When a plurality of evaluation values are acquired for the plurality of OCTA front images in step S1802, the process proceeds to step S1803.

In step S1803, the front image determination unit 344 connects the depth ranges whose evaluation values are equal to or greater than the threshold value, and determines the depth range of the OCTA front image to be displayed. Here, FIG. 19 is an example in which a plurality of evaluation values are calculated for a plurality of OCTA front images. Assuming that the Bruch film is BM, in the example shown in FIG. 19, the depth range is sequentially from the depth range (a) “BM + 0 μm to BM + 20 μm” to the depth range (h) “BM-140 μm to BM-120 μm” every 20 μm. This is an example in which a plurality of OCTA front images are generated while shifting to the glass body side. In the example shown in FIG. 19, the image evaluation unit 343 acquires a plurality of evaluation values (numerical values on the right side of the figure) corresponding to the plurality of OCTA front images generated in this way.

In this example, assuming that the threshold value for the evaluation value is 0.3, OCTA corresponding to the depth range (b) "BM-20 μm to BM + 0 μm" to the depth range (f) "BM-100 μm to BM-80 μm". The evaluation value of the front image is equal to or higher than the threshold value. Therefore, the front image determination unit 344 connects the depth ranges from the depth range (b) to the depth range (f) and determines the depth range of the OCTA front image to be displayed. Specifically, the front image determination unit 344 determines the depth range of the OCTA front image to be displayed from BM + 0 μm, which is the lower limit of the depth range (b), to BM-100 μm, which is the upper limit of the depth range (f).

In step S1804, the projection range control unit 341 and the front image generation unit 342 generate an OCTA front image to be displayed based on the depth range determined by the front image determination unit 344. Since the subsequent processing is the same as that of the third embodiment, the description thereof will be omitted.

For the example shown in FIG. 19, the OCTA front image, its depth range, and the tomographic image generated by performing the above processing are shown in FIG. In the OCTA front image shown in FIG. 20, it can be seen that the CNV appears in such a manner that it can be easily confirmed.

Further, FIG. 21 is an example in which the processing of this modification is executed for a very small CNV. In the example shown in FIG. 21, the depth range is set from the depth range (a) “BM + 0 μm to BM + 20 μm” to the depth range (e) “BM-80 μm to BM-60 μm” for simplification of the description. In this example, the depth range in which the evaluation value is the threshold value of 0.3 or more is only the depth range (b), so that the front image determination unit 344 determines the final depth range to be the same range as the depth range (b). To do. The OCTA frontal image corresponding to the determined depth range, the depth range thereof, and the tomographic image are shown in FIG. In the OCTA front image shown in FIG. 22, it can be seen that the CNV appears in such a manner that the CNV can be easily confirmed with respect to the example shown in FIG. If the final depth range is the same as any one of the plurality of depth ranges for which the evaluation value has been calculated, the image generation unit 304 does not need to generate the OCTA front image of the depth range again.

As described above, in the image processing apparatus 300 according to the present modification, the front image determination unit 344 of the image generation unit 304 determines the second depth range based on the calculated plurality of evaluation values, and the determination is made. It functions as an example of a determination unit that determines an image generated using the second depth range as a front image to be displayed. In particular, the image generation unit 304 determines the second depth range by connecting the depth ranges corresponding to the evaluation values that are equal to or greater than the threshold value among the plurality of calculated evaluation values. The determination unit may be configured as a component separate from the image generation unit 304.

The image processing apparatus 300 according to the present modification generates a front image for a very thin depth range as compared with the third embodiment, and estimates the presence or absence of an extraction target while continuously changing the depth range. Therefore, the depth range of the front image to be displayed can be determined more finely. Therefore, it is possible to more appropriately generate an image in which the target area can be easily confirmed. In particular, by performing evaluation for each very thin depth range, even if there is a very small extraction target (for example, CNV), it can be detected without overlooking.

In this modified example, the depth of the depth range to be exploratoryly shifted (difference between the upper limit and the lower limit of the depth range) is fixed to 20 μm, but the depth of the depth range is not limited to this and is desired. It may be arbitrarily set according to the configuration of. When searching the same range, the number of candidate images increases as the depth of the depth range that shifts exploratoryly becomes thinner, and the amount of calculation for calculating the evaluation value increases accordingly. Further, if the depth of the depth range for exploratory shift becomes too thin, noise becomes stronger in the OCTA front image. On the other hand, as the depth of the exploratory shift depth range becomes deeper, the number of candidate images decreases and the amount of calculation decreases. On the other hand, the resolution for determining the optimum depth range also becomes coarse. The depth of the depth range may be determined in a balanced manner in consideration of such circumstances, and an effective range may be, for example, a thickness having a width of 10 μm to 50 μm.

The range to be searched may cover up to the outer layer of the retina, but as shown in this modification, a predetermined range may be searched toward the glass body side based on the Bruch membrane, or the upper limit of the outer layer of the retina may be used. You may search up to a certain OPL / ONL.

Further, in this modification, the boundary line when searching is determined based on the shape of the Bruch film (BM), but the shape of the boundary line when searching is not limited to this. However, since the Bruch membrane is located at the bottom of the retina and has a relatively flat shape, it is suitable for exploratory evaluation of the presence or absence of CNV. The boundary line for searching may be determined based on the shape of retinal pigment epithelium (RPE) or other layers instead of Bruch's membrane. Further, the boundary line for searching may be determined based on a straight line close to the shape of the retina of the Bruch's membrane in the macula.

The threshold value of the evaluation value when the front image determination unit 344 determines the depth range to be displayed may be changed according to the instruction of the operator. In this case, the degree of drawing of the CNV can be adjusted according to the preference of the operator.

(Modification 2 of Example 3)
In the first modification of the third embodiment, the depth range of the front image to be displayed by connecting the depth ranges whose evaluation values are equal to or higher than a certain value is determined, but once the evaluation values of the continuous depth ranges are less than the threshold value, the depth range is determined again. It may exceed the threshold. In other words, the evaluation value of the continuous depth range may be "Futayama". In the second modification of the third embodiment, in response to such a case, the upper limit and the lower limit of the range in which the evaluation becomes a certain value or more are selected, and the range between the upper limit and the lower limit is determined and selected as the depth range.

FIG. 23 is a flowchart showing a front image generation process according to this modification. Since the flow of a series of processes other than the front image generation process is the same as the series of processes according to the third embodiment, the description thereof will be omitted. Further, since step S2301, step S2302, and step S2304 are the same steps as steps S1801, step S1802, and step S1804 according to the first modification of the third embodiment, the description thereof will be omitted. When a plurality of evaluation values are acquired for the plurality of OCTA front images in step S2302, the process proceeds to step S2303.

In step S2303, the front image determination unit 344 integrates the depth ranges whose evaluation values are equal to or greater than the threshold value, and determines the depth range of the OCTA front image to be displayed. At this time, unlike step S1803 in the first modification of the third embodiment, the front image determining unit 344 includes a depth range in which the evaluation value is less than the threshold value between the depth ranges in which the evaluation value is equal to or more than the threshold value. Also, the depth range of the OCTA front image to be displayed is determined based on the upper limit and the lower limit in a plurality of depth ranges whose evaluation values are equal to or higher than the threshold value.

For example, in the example shown in FIG. 19, when the evaluation value of the depth range (h) is 0.3, the depth range (h) is added to the evaluation value of the depth range (b) to the depth range (f). The evaluation value of is also equal to or higher than the threshold value, but the evaluation value is lower than the threshold value in the depth range (g). Even in this case, the front image determination unit 344 according to the present modification has the lower limit BM + 0 μm of the depth range (b) whose evaluation value is equal to or higher than the threshold value to the upper limit BM of the depth range (h) whose evaluation value is equal to or higher than the threshold value. Up to −140 μm is determined as the depth range of the OCTA front image to be displayed. Since the subsequent processing is the same as that of the first modification of the third embodiment, the description thereof will be omitted.

As described above, the front image determination unit 344 of the image generation unit 304 according to this modification is more than the other depth positions in the depth range corresponding to the evaluation value equal to or more than the threshold value among the acquired plurality of evaluation values. The second depth range is determined with the shallow depth position as the upper limit and the depth position deeper than the other depth positions as the lower limit. By such processing, even when the evaluation value becomes "Futayama" in this modified example, the depth range of the OCTA front image to be displayed can be appropriately determined. In this modified example, an example in which the evaluation value is "Futayama" has been described, but the depth of the OCTA front image to be displayed can be displayed by performing the same processing even when there are three or more peaks. The range can be determined appropriately. The front image determination unit 344 may be configured as a component separate from the image generation unit 304.

(Modification 3 of Example 3)
In the modified example 3 of the third embodiment, the front image determination unit 344 finely adjusts and displays the upper limit and the lower limit of the depth range centered on the depth range where the evaluation value is the maximum (higher than other evaluation values). Determine the depth range of the OCTA front image to be.

FIG. 24 is a flowchart of the front image generation process according to this modification. Since the flow of a series of processes other than the front image generation process is the same as the series of processes according to the third embodiment, the description thereof will be omitted. Further, since steps S2401 and S2402 are the same steps as steps S1701 and S1702 according to the third embodiment, the description thereof will be omitted. When a plurality of evaluation values are acquired for the plurality of OCTA front images in step S2402, the process proceeds to step S2403.

In step S2403, the front image determination unit 344 selects the depth range corresponding to the OCTA front image having the maximum evaluation value as the central depth range from the depth ranges corresponding to the plurality of OCTA front images. For example, in the example of FIG. 19, the depth range (d) “BM-60 μm to BM-40 μm” corresponding to the maximum evaluation value 0.7 is selected.

Next, in step S2404, the front image determination unit 344 sets a plurality of depth ranges in which the upper and lower limits of the depth range are finely adjusted, centering on the depth range in which the evaluation value is the maximum. Here, the fine adjustment is the upper limit of the selected depth range by a depth narrower than the depth of the depth range (difference between the upper limit and the lower limit of the depth range) that is exploratoryly shifted when the evaluation value is obtained first. And set multiple depth ranges with at least one of the lower bounds moved.

For example, in the example of FIG. 19, the depth of the exploratory shift depth range when the evaluation value is obtained first is 20 μm. In this case, the front image determining unit 344 sets, for example, a depth range in which the upper limit BM-60 μm of the depth range (d) is moved to a shallower side or a deeper side by, for example, 10 μm or 5 μm. Similarly, the front image determining unit 344 sets, for example, a depth range in which the lower limit BM-40 μm of the depth range (d) is moved to a shallower side or a deeper side by, for example, 10 μm or 5 μm. Further, the front image determining unit 344 may set a depth range in which both the upper limit and the lower limit of the depth range (d) are moved to a shallow side or a deep side by, for example, 10 μm or 5 μm. The numerical values in the example are examples, and may be arbitrarily set according to a desired configuration. Further, the number of depth ranges to be set may be arbitrarily set according to a desired configuration.

In step S2405, the projection range control unit 341 and the front image generation unit 342 generate a plurality of OCTA front images based on the plurality of depth ranges set by the front image determination unit 344. In addition, the image evaluation unit 343 calculates a plurality of evaluation values for the generated plurality of OCTA front images.

In step S2406, the front image determination unit 344 should display the OCTA front image corresponding to the maximum evaluation value (evaluation value higher than other evaluation values) among the plurality of evaluation values calculated in step S2405. Select / determine as an image. Since the subsequent processing is the same as that of the third embodiment, the description thereof will be omitted.

As described above, the front image determination unit 344 of the image generation unit 304 according to this modification is centered on the depth range corresponding to the evaluation value higher than the other evaluation values among the acquired plurality of evaluation values. It functions as an example of a determination unit that determines the depth range of the second. In particular, the image generation unit 304 according to this modification sets a plurality of depth ranges in which the depth range corresponding to an evaluation value higher than the other evaluation values is increased or decreased, and a plurality of front surfaces corresponding to the plurality of depth ranges. Generate an image. In addition, the image evaluation unit 343 acquires a plurality of evaluation values of the plurality of front images corresponding to the plurality of depth ranges by using the plurality of front images corresponding to the plurality of depth ranges. Further, the front image determination unit 344 of the image generation unit 304 displays the front image corresponding to the evaluation value higher than the other evaluation values among the plurality of evaluation values of the plurality of front images corresponding to the plurality of depth ranges. Determined as a power front image (output image). The front image determination unit 344 may be configured as a component separate from the image generation unit 304.

By adjusting the depth range in this way and determining the front image of the depth range that has a higher evaluation value than other evaluation values as the front image to be displayed, the front surface that makes it easier to observe the extraction target (target area). Images can be generated and displayed. The threshold value of the evaluation value when the front image determination unit 344 determines the depth range to be displayed may be changed according to the instruction of the operator. In this case, the degree of drawing of the CNV can be adjusted according to the preference of the operator. Further, the depth (depth width) when finely adjusting the depth range may be changed according to the instruction of the operator. In this case, the depth range for calculating the evaluation value can be changed according to the instruction of the operator, and the depth range corresponding to each eye to be inspected can be appropriately set.

Further, in the present modification, the front image determination unit 344 selects the OCTA front image having the highest evaluation value, but the front image determination unit 344 may select the OCTA front image whose evaluation value exceeds the threshold value. In this case, the front image determination unit 344 may select a plurality of OCTA front images corresponding to the plurality of evaluation values when the plurality of evaluation values exceed the threshold value. In this case, the display control unit 306 may display the plurality of evaluation values and a plurality of OCTA front images corresponding thereto while switching between them. Further, the front image determination unit 344 may select the OCTA front image to be displayed independently from the plurality of OCTA front images according to the instruction of the operator.

(Example 4)
In Examples 1 to 3, OCTA frontal images were used to provide images in the optimum depth range for cases in which neovascularization (CNV) due to age-related macular degeneration is occurring. Here, the techniques described in Examples 1 to 3 can also be applied to confirm a structure called a sieve plate in the lower part of the optic nerve head. In Example 4, a process in which the lamina cribrosa below the optic nerve head is used as an extraction target (target region) will be described.

The lamina cribrosa is a mesh-like structure that supports the optic nerve at the bottom of the optic nerve head. It is known that the morphology of the lamina cribrosa correlates with the progression of glaucoma, and it is known that being able to display the morphology (particularly the thickness) is very meaningful in diagnosing glaucoma.

As for the sieve plate, it is difficult to recognize by image processing the tomographic image like the layered structure of the retina because a clear change in the layer structure and the accompanying change in brightness do not appear on the tomographic image. On the other hand, it is possible to confirm the structure on the mesh of the sieve plate by using a front image obtained by projecting tomographic data such as an En-Face image of brightness onto a two-dimensional plane.

Therefore, in this embodiment, the target area to be observed is set as the region of the sieve plate, and an En-Face image having a brightness that makes it easy to observe the sieve plate is generated. Since the configuration of the image processing apparatus according to the present embodiment is the same as the configuration of the image processing apparatus according to the first to third embodiments, the same reference numerals will be used and the description thereof will be omitted. Hereinafter, the image processing apparatus according to the present embodiment will be described focusing on the difference from the image processing apparatus 300 according to the first to third embodiments.

Hereinafter, the image processing apparatus according to this embodiment will be described with reference to FIGS. 25 and 26. In this embodiment, for example, in the GUI 400 described in Example 1, when the sieve plate is selected in the pull-down at the upper part of the brightness En-Face image 407, the image generation unit 304 is sieved as the target of the extraction process. Specify the board. The image generation unit 304 generates En-Face images having a plurality of corresponding brightness based on a plurality of depth ranges stored in the storage unit 305 in advance for the sieve plate designated as the extraction target. The display control unit 306 causes the display unit 310 to display the generated En-Face image of the brightness, the corresponding tomographic image, and the depth range according to the instruction of the operator or the like. The brightness En-Face image generation process according to this embodiment is performed except that the tomographic data is used instead of the motion contrast data and the depth range is the depth range set for the sieve plate. It may be the same as in Examples 1 to 3.

FIGS. 25 (a) to 25 (c) are examples of an En-Face image having a brightness showing the morphology of the sieve plate and a tomographic image showing a corresponding depth range. Here, the En-Face images of the brightness shown in FIGS. 25 (a) to 25 (c) are En-Face images of the sieve plate portion, and are three-dimensional tomographic data (three-dimensional tomography) for different depth ranges. Image) is a projected image. Further, in the tomographic images shown in FIGS. 25 (a) to 25 (c), the depth range of the corresponding En-Face image is displayed as a white straight line (solid line).

(A) to (c) of FIG. 25 are En-Face images having a depth range of +50 μm to +100 μm, +150 μm to +200 μm, and +300 μm to +350 μm from the depth position of the bottom of the retinal-vitreous interface of the optic nerve head, respectively. Is shown. The display mode indicating each depth range may be arbitrary. In the examples (a) to (c) of FIG. 25, the description of Line + 50 μm, Line + 100 μm, etc. is shown on the tomographic image. Line in the description indicates that the upper limit and the lower limit for determining the depth range are set by a straight line (Line). The numbers on the display represent the distance from the position indicated by the dotted line in the figure (the depth position at the bottom of the retinal-vitreous interface of the optic nerve head).

Note that each depth range for generating a plurality of En-Face images may be a depth range within a range of 0 to 500 μm from the boundary (interface) between the retina and the vitreous body of the papilla to the interstitial side. For example, each depth range may be arbitrarily set within a depth range on the choroidal side of approximately 100 μm to 500 μm from the boundary between the optic nerve head and the vitreous body.

(A) to (c) of FIG. 25 show En-Face images of brightness when the depth range is changed from the glass body side to the deep direction, respectively. In the En-Face image of the brightness shown in FIG. 25 (b), a mesh-like structure can be seen at a position corresponding to the optic nerve head, but the brightness En-Face shown in FIGS. 25 (a) and 25 (c). In the face image, the structure on the retina cannot be clearly observed.

In this way, by designating the sieve plate as the extraction target (target region) and performing the same processing as in Example 1, an En-Face image having brightness in a plurality of depth ranges is displayed on the GUI and sieved. It is possible to provide the operator with an image in which the board is easy to observe. In this case, for example, as in the first embodiment, when the En-Face image 407 of the GUI 400 is double-clicked, the display control unit 306 displays the GUI including the En-Face image having the brightness corresponding to a plurality of depth ranges related to the sieve plate. It can be displayed on the display unit 310.

Although the configuration for generating and displaying a plurality of brightness En-Face images as a plurality of front images related to the sieve plate has been described, an OCTA front image is generated instead of the brightness En-Face image and displayed side by side. You may. In this case, the front image may be generated by using the motion contrast data instead of the tomographic data.

Further, as in the process according to the second embodiment, the image evaluation unit 343 calculates an evaluation value for evaluating the presence of the sieve plate in the En-Face image having a plurality of brightness using the trained model. You may. In this case, for the training data of the trained model, the brightness En-Face image is used as the input data, and the evaluation value for evaluating the existence of the mesh-like structure (sieving plate) in the brightness En-Face image is used as the output data. can do. As the output data, an evaluation value obtained by a doctor or the like evaluating the existence of a mesh-like structure (sieving plate) in an En-Face image of brightness may be used. For example, an image in which holes in the sieve plate can be seen is evaluated. Evaluation may be performed based on criteria such as increasing the value. In this case, as shown in FIG. 25, the display control unit 306 can display the evaluation value of the brightness En-Face image on the display unit 310 together with the brightness En-Face image.

Note that the image evaluation unit 343 may calculate an evaluation value for evaluating the presence of the sieve plate in the brightness En-Face image by rule-based processing as in the second embodiment. Further, instead of the brightness En-Face image, an OCTA front image may be generated, evaluated, and displayed side by side.

Further, similarly to the process according to the third embodiment, an En-Face image having a brightness corresponding to the maximum evaluation value (evaluation value higher than other evaluation values) and an En-Face image having a brightness equal to or higher than the threshold value are displayed. It may be automatically selected and displayed.

Further, for the treatment according to the modifications 1 to 3 of Example 3, the sieve plate may be the extraction target (target region). For example, as shown in the first modification of the third embodiment, by finely adjusting the depth range in a state where the depth range used for projection is very thin, the network structure corresponding to the sieve plate can be in which depth range. It is possible to estimate whether it is distributed. As a result, not only the optimum depth range of the front image to be displayed can be determined, but also the thickness of the sieve plate can be obtained from the depth range in which the network structure is distributed, and the sieve plate can be segmented from tomographic data or the like. It is possible.

When observing the morphology of the sieve plate, it is possible to generate a front image in which the sieve plate can be easily observed by using an SS-OCT device using a wavelength sweep (SS: Swept Source) light source in the 1 μm band. it can. Further, it is known that the morphology of the sieve plate can be easily observed by using the En-Face image of brightness. However, the OCT device used for photographing the eye to be inspected and the front image to be generated are not limited to this, and for example, as described above, an SD-OCT device using an SD type optical interference unit or an OCTA front image may be used. Good.

In this example, as shown in FIGS. 25 (a) to 25 (c), the line horizontal to the tomographic image is based on the depth corresponding to the bottom of the retinal-vitreous interface of the optic nerve head. The depth range determined by was defined, and the front image with the changed depth range was generated and the evaluation value was calculated. However, the criteria for defining the depth range are not limited to this. For example, the reference may be defined by a line connecting the ends of the Bruch membrane.

Here, FIG. 26 is a diagram for explaining a reference line that defines a depth range by a line connecting the ends of the Bruch film (Bruch film edges P1 and P2). FIG. 26 shows a tomographic image of the optic disc. The tomographic image shown in FIG. 26 shows the vitreous-inner limiting membrane (ILM) boundary L1, the GCL / IPL boundary L2, the retinal pigment epithelium (RPE) L3, and the Bruch's membrane L4.

As shown in FIG. 26, the Bruch membrane L4 is generally continuously present in the retina but not in the optic disc. Here, the end of the Bruch's membrane that terminates around the optic nerve head is called the Bruch's membrane end, and appears as the Bruch's membrane ends P1 and P2 in the tomographic image.

Here, after specifying the Bruch film ends P1 and P2 using the result of layer recognition performed by the layer recognition unit 303, the straight line Z connecting the Bruch film ends P1 and P2 is set as the reference line of the depth range. Can be done. In this case, the depth range may be changed by moving the straight line Z up and down (shallower or deeper) by an operation such as dragging.

In this way, by determining the boundary that serves as a reference for the depth range for generating a frontal image with respect to the structure of the retina such as the Bruch's membrane, for example, even if the tomographic image captured under the imaging conditions is tilted, It is possible to obtain a frontal image projected substantially parallel to the structure of the retina. Therefore, it is possible to obtain a stable front image with respect to the structure of the retina.

Regarding the same processing as in Example 2 described above, a configuration using an En-Face image having a plurality of brightnesss as input data of training data related to the trained model and input data at the time of operation was described. On the other hand, for example, as input data for training data or input data during operation, the optic disc in the En-Face image of brightness is extracted, and an image masking other than the optic disc in the En-Face image of the brightness is used. You may use it. In this case, by not letting the neural network learn information other than the optic disc, unnecessary information is not input, so it can be expected that learning will be faster and the inference result (evaluation value) will be accurate. ..

(Example 5)
In Examples 1 to 3, the OCTA frontal image was used to describe a configuration for providing a frontal image in an optimum depth range for a case in which neovascularization (CNV) due to age-related macular degeneration is occurring. Further, in Example 4, a configuration for providing a frontal image in an optimum depth range for the morphology of the optic disc is described.

The same technique can be applied to choroid segmentation (separation of Sattler layer and Haller layer). The boundary between the Sattler layer and the Haller layer is also difficult to distinguish because the layer boundary is unclear on the tomographic image, as in the case of the sieve plate. On the other hand, by projecting onto a frontal image, it is possible to confirm the structural difference in blood vessels regarding the boundary between the Sattler layer and the Haller layer. The Sattler layer and the Haller layer can be separated by using a boundary where the structural difference of blood vessels is clearly changed. Further, in such a case, it is possible to generate an En-Face image and an OCTA front image of the brightness in each layer.

Therefore, in this embodiment, as in the first embodiment, the structure of the blood vessel related to the boundary between the Sattler layer and the Haller layer is specified as the extraction target (target area), and a plurality of depth ranges corresponding to the extraction target are specified. Generate an En-Face image. The plurality of depth ranges may be arbitrarily set within the range from the Bruch's membrane to the choroid or sclera, for example. As a result, the operator can easily confirm and identify the boundary between the Sattler layer and the Haller layer by generating and displaying a front image having an optimum depth range for confirming the boundary between the Sattler layer and the Haller layer. it can. It is also possible to obtain a frontal image in which it is easy to confirm the structure of the blood vessel with respect to the boundary between the Sattler layer and the Haller layer by performing the same processing as that described in Examples 2 and 3 and their modified examples.

When analyzing the choroidal layer, a frontal image that makes it easy to observe the choroidal layer can be generated by using an SS-OCT device that uses a wavelength sweep (SS) light source in the 1 μm band. Further, it is known that it is easy to observe the layer of the choroid by using the En-Face image of brightness. However, the OCT device used for photographing the eye to be inspected and the front image to be generated are not limited to this, and for example, as described above, an SD-OCT device using an SD type optical interference unit or an OCTA front image may be used. Good.

The boundary line that serves as a reference for the depth range may be a curved line that follows the layered structure of the choroid. Further, the layer shape of the Bruch film or RPE on the tomographic image or the boundary line thereof may be used to determine the shape of the boundary line for defining the depth range. In this case, the depth range may be moved by moving the reference boundary line by an operation such as dragging.

The same technique can also be applied to provide a frontal image of the optimum depth range for confirming the morphology of capillary aneurysms of retinal blood vessels. Similar to Example 1, a capillary aneurysm of a retinal blood vessel may be designated as an extraction target, and a plurality of En-Face images for a plurality of depth ranges corresponding to the extraction target may be generated. Capillary aneurysms generally exist in the superficial retinal layer (ILM to GCL / IPL + 50 μm) and deep retinal layer (GCL / IPL + 50 μm to INL / OPL + 70 μm). Therefore, each depth range for generating a plurality of En-Face images may be arbitrarily set in the depth range in the surface layer of the retina or the deep layer of the retina. By a process similar to the process described in Example 1, an operator can easily create a clear capillary aneurysm by generating and displaying a frontal image having an optimum depth range for confirming the capillary aneurysm of the retinal blood vessel. Can be confirmed in. It is also possible to obtain a frontal image in which a capillary aneurysm of a retinal blood vessel can be easily confirmed by performing the same treatment as described in Examples 2 and 3 and their modified examples.

(Modification example 1)
The display control unit 306 in the various examples and modifications described above may display analysis results such as a layer thickness of a desired layer and various blood vessel densities on the report screen of the display screen. In addition, the optic nerve head, macular region, vascular region, nerve fiber bundle, vitreous region, macular region, choroidal region, scleral region, lamina cribrosa region, retinal layer boundary, retinal layer boundary edge, photoreceptor cells, blood cells, The value (distribution) of the parameter relating to the site of interest including at least one such as the vascular wall, the vascular inner wall boundary, the vascular lateral boundary, the ganglion cell, the corneal region, the corner region, and Schlemm's canal may be displayed as the analysis result. At this time, for example, by analyzing a medical image to which various artifact reduction processes are applied, it is possible to display an accurate analysis result. The artifact is, for example, a false image region generated by light absorption by a blood vessel region or the like, a projection artifact, a band-shaped artifact in a front image generated in the main scanning direction of the measured light depending on the state of the eye to be inspected (movement, blinking, etc.), or the like. There may be. Further, the artifact may be any image loss region as long as it is randomly generated for each image taken on a medical image of a predetermined portion of the subject, for example. Further, the display control unit 306 may display the value (distribution) of the parameter relating to the region including at least one of the various artifacts (copy loss region) as described above on the display unit 310 as an analysis result. Further, the value (distribution) of the parameter relating to the region including at least one such as drusen, new blood vessel, vitiligo (hard vitiligo), and abnormal site such as pseudo-drusen may be displayed as the analysis result. The image analysis process may be performed by the image evaluation unit 343, or may be performed by an analysis unit different from the image evaluation unit 343 in the image processing device 300.

Further, the analysis result may be displayed in an analysis map, a sector showing statistical values corresponding to each divided area, or the like. The analysis result is a trained model (analysis result generation engine, trained model for analysis result generation) obtained by the image evaluation unit 343 or another analysis unit learning the analysis result of the medical image as training data. It may be generated by using. At this time, the trained model is trained using training data including a medical image and an analysis result of the medical image, training data including a medical image and an analysis result of a medical image of a type different from the medical image, and the like. It may be obtained by.

Further, the learning data may include the area label image generated by the segmentation process and the analysis result of the medical image using them. In this case, the image evaluation unit 343 analyzes a tomographic image or a frontal image from the result obtained by executing the segmentation process (for example, the detection result of the retinal layer) using, for example, the trained model for generating the analysis result. It can function as an example of an analysis result generation unit that generates results. In other words, the image evaluation unit 343 uses a trained model for generating analysis results different from the trained model for acquiring the evaluation results, and generates image analysis results for each of the different regions specified by the segmentation process. Can be done. The segmentation process may be the result of layer recognition performed by the layer recognition unit 303, or may be performed separately from the process of the layer recognition unit 303.

Further, the trained model is obtained by training using training data including input data in which a plurality of medical images of different types of predetermined parts are set, such as a luminance front image and a motion contrast front image. May be good. Here, the luminance front image corresponds to the luminance En-Face image, and the motion contrast front image corresponds to the OCTA En-face image.

Further, the training data includes, for example, analysis values (for example, average value, median value, etc.) obtained by analyzing the analysis area, a table including the analysis values, an analysis map, and the position of the analysis area such as a sector in the image. Information containing at least one may be data labeled (annotated) with input data as correct answer data (for supervised learning). In addition, the analysis result obtained by using the trained model for generating the analysis result may be displayed according to the instruction from the operator.

Further, the display control unit 306 in the above-described examples and modifications may display various diagnostic results such as glaucoma and age-related macular degeneration on the report screen of the display screen. At this time, for example, by analyzing a medical image to which various artifact reduction processes as described above are applied, it is possible to display an accurate diagnostic result. Further, as the diagnosis result, the position of the specified abnormal portion or the like may be displayed on the image, or the state or the like of the abnormal portion may be displayed by characters or the like. Further, the classification result of the abnormal part or the like (for example, Curtin classification) may be displayed as the diagnosis result. Further, as the classification result, for example, information indicating the certainty of each abnormal part (for example, a numerical value indicating the ratio) may be displayed. In addition, information necessary for the doctor to confirm the diagnosis may be displayed as a diagnosis result. As the necessary information, for example, advice such as additional shooting can be considered. For example, when an abnormal site is detected in the blood vessel region in the OCTA image, it may be displayed that fluorescence imaging using a contrast medium capable of observing the blood vessel in more detail than OCTA is performed.

The diagnosis result was generated using a trained model (diagnosis result generation engine, trained model for generation of diagnosis result) obtained by the image evaluation unit 343 learning the diagnosis result of the medical image as training data. It may be a thing. In addition, the trained model is based on training using training data including a medical image and a diagnosis result of the medical image, and training data including a medical image and a diagnosis result of a medical image of a type different from the medical image. It may be obtained.

Further, the learning data may include the area label image generated by the segmentation process and the diagnosis result of the medical image using them. In this case, the image evaluation unit 343 diagnoses the front image or the tomographic image from the result obtained by executing the segmentation process (for example, the detection result of the retinal layer) using, for example, the trained model for generating the diagnosis result. It can function as an example of a diagnostic result generation unit that generates results. In other words, the image evaluation unit 343 can generate a diagnostic result for each of the different regions specified by the segmentation process by using a trained model for generating a diagnostic result different from the trained model for acquiring the evaluation result. it can. The diagnosis result using the trained model may be generated by a diagnosis unit other than the image evaluation unit 343 in the image processing apparatus 300.

The learning data includes, for example, the diagnosis name, the type and state (degree) of the lesion (abnormal site), the position of the lesion in the image, the position of the lesion with respect to the region of interest, the findings (interpretation findings, etc.), and the basis of the diagnosis name (affirmation). Information including at least one such as (general medical support information, etc.) and grounds for denying the diagnosis name (negative medical support information), etc. are labeled (annotated) in the input data as correct answer data (for supervised learning). It may be data. In addition, according to the instruction from the examiner, the diagnosis result obtained by using the trained model for generating the diagnosis result may be displayed.

Here, the image evaluation unit 343 extracts the analysis result of the image acquired by using the trained model for generating the analysis result in the above Examples 2 to 5 as the CNV or the like which is the extraction target (target area) from the front image. It may be acquired as an evaluation result (information indicating the evaluation) for evaluating the existence of. Similarly, the image evaluation unit 343 may acquire the diagnosis result acquired by using the trained model for generating the diagnosis result as the evaluation result for evaluating the existence of the extraction target. For example, the image evaluation unit 343 can use the analysis result and the diagnosis result that CNV exists, which are obtained by using these trained models for the front image, as the evaluation result for the front image. Further, the image evaluation unit 343 can also use the analysis result or the diagnosis result regarding the artifact or the predetermined layer as the evaluation result for the front image.

For example, the image evaluation unit 343 can calculate the evaluation value as 1 when the analysis result or the diagnosis result indicating that the region of interest or the region of interest exists in the image is acquired. In addition, the image evaluation unit 343 may calculate an evaluation value according to the numerical value or area of the analysis result or the diagnosis result regarding the region of interest or the region of interest. For example, the image evaluation unit 343 may set a threshold value step by step and calculate an evaluation value according to a threshold value that exceeds the area of the region of interest or the region analyzed / diagnosed as the region of interest. Further, as in the above embodiment, the information indicating the evaluation acquired by the image evaluation unit 343 is not limited to the evaluation value, and may be information indicating the existence or nonexistence of the extraction target and its possibility. The above-mentioned various sites of interest, regions of interest, and artifacts can be used as examples of extraction targets (target regions).

In addition, the display control unit 306 related to the various examples and modifications described above displays object recognition results (object detection results) such as the above-mentioned attention portion, attention region, artifact, and abnormal portion on the report screen of the display screen. ) And the segmentation result may be displayed. At this time, for example, a rectangular frame or the like may be superimposed and displayed around the object on the image. Further, for example, colors and the like may be superimposed and displayed on the object in the image. The object recognition result and the segmentation result are learned obtained by learning the learning data in which the layer recognition unit 303 and the image evaluation unit 343 have labeled (annotated) the medical image with the information indicating the object recognition and the segmentation as the correct answer data. It may be generated using a completed model (object recognition engine, trained model for object recognition, segmentation engine, trained model for segmentation).

The image evaluation unit 343 evaluates the result of the object recognition process or segmentation process using the trained model for object recognition or the trained model for segmentation to evaluate the existence of CNV or the like to be extracted from the front image. It may be acquired as (information indicating evaluation). For example, the image evaluation unit 343 can use the label value or the like indicating CNV obtained by using these trained models for the front image as the evaluation result for the front image. Further, the image evaluation unit 343 can calculate the evaluation value as 1, for example, when an abnormal portion is detected. Further, the image evaluation unit 343 may calculate the evaluation value according to the area of the region detected as the abnormal portion. For example, the image evaluation unit 343 may set a threshold value step by step and calculate the evaluation value according to the threshold value when the area of the region detected as the abnormal portion exceeds the threshold value. Further, as in the above embodiment, the information indicating the evaluation acquired by the image evaluation unit 343 is not limited to the evaluation value, and may be information indicating the existence or nonexistence of the extraction target and its possibility. The above-mentioned various sites of interest, regions of interest, and artifacts can be used as examples of extraction targets (target regions).

The above-mentioned analysis result generation and diagnosis result generation may be obtained by using the above-mentioned object recognition result and segmentation result. For example, analysis result generation or diagnosis result generation processing may be performed on a region of interest obtained by object recognition or segmentation processing. Further, the object recognition process and the segmentation process using the trained model for object recognition and the trained model for segmentation are performed by the segmentation unit and the layer recognition unit 303 different from the image evaluation unit 343 in the image processing device 300. You may.

The image evaluation unit 343 may use a hostile generation network (GAN: Generative Adversarial Networks) or a variational autoencoder (VAE: Variational Auto-Encoder) when detecting an abnormal portion. For example, DCGAN (Deep Convolutional GAN) consisting of a generator obtained by learning the generation of a front image and a classifier obtained by learning the discrimination between a new front image generated by the generator and a real front image. Can be used as a machine learning model.

When DCGAN is used, for example, the classifier encodes the input front image to make it a latent variable, and the generator generates a new front image based on the latent variable. After that, the difference between the input front image and the generated new front image can be extracted as an abnormal part. When VAE is used, for example, the input front image is encoded by an encoder to be a latent variable, and the latent variable is decoded by a decoder to generate a new front image. After that, the difference between the input front image and the generated new front image can be extracted as an abnormal part.

Further, the image evaluation unit 343 may detect an abnormal part by using a convolutional autoencoder (CAE). When CAE is used, the same image is learned as input data and output data at the time of learning. As a result, when an image with an abnormal part is input to CAE at the time of estimation, an image without an abnormal part is output according to the learning tendency. After that, the difference between the image input to the CAE and the image output from the CAE can be extracted as an abnormal portion.

In these cases, the image evaluation unit 343 provides information on the difference between the image obtained by using the hostile generation network or the autoencoder (AE) for the front image and the front image input to the hostile generation network or the autoencoder. Can be generated as information about the abnormal site. As a result, the image evaluation unit 343 can be expected to detect the abnormal portion at high speed and with high accuracy. Here, the autoencoder includes VAE, CAE, and the like.

The image evaluation unit 343 can calculate the evaluation value as 1 when an abnormal portion is detected by such processing. Further, the image evaluation unit 343 may calculate the evaluation value according to the area of the region detected as the abnormal portion. For example, the image evaluation unit 343 may set a threshold value step by step and calculate the evaluation value according to the threshold value when the area of the region detected as the abnormal portion exceeds the threshold value.

Note that the image evaluation unit 343 can also use, for example, FCN (Fully Convolutional Network), SegNet, or the like as a machine learning model for detecting an abnormal portion. Further, a machine learning model that recognizes an object in a region unit according to a desired configuration may be used. As a machine learning model for recognizing an object, for example, RCNN (Region CNN), fastRCNN, or fasterRCNN can be used. Further, as a machine learning model for recognizing an object in each area, YOLO (You Only Look None) or SSD (Single Shot Detector or Single Shot MultiBox Detector) can also be used.

In the above, the configuration for acquiring the evaluation value for the abnormal part has been described, but the processing using GAN or AE is not limited to this. For example, the image evaluation unit 343 may use information regarding a difference such as a correlation value between an image acquired by using GAN or AE and an image input to GAN or AE as an evaluation value (information indicating evaluation). Even in this case, the image evaluation unit 343 can acquire information indicating the evaluation for evaluating the existence of the target region (lesion site, etc.) in the front image.

Also, for sick eyes, the image features differ depending on the type of illness. Therefore, the trained models used in the various examples and modifications described above may be generated and prepared for each type of disease or for each abnormal site. In this case, for example, the image processing device 300 can select a trained model to be used for processing according to an input (instruction) of the type of disease of the eye to be inspected, an abnormal part, or the like from the operator. The trained models prepared for each type of disease and abnormal site are not limited to trained models for object recognition and segmentation, and are trained, for example, used in an engine for image evaluation and an engine for analysis. It may be a model. At this time, the image processing device 300 may identify the type of disease or abnormal site of the eye to be inspected from the image by using a separately prepared trained model. In this case, the image processing device 300 may automatically select the trained model to be used for the above processing based on the type of disease or abnormal site identified by using the trained model prepared separately. it can. The trained model for identifying the disease type and abnormal site of the eye to be inspected uses tomographic images, fundus images, frontal images, etc. as input data, and the disease type and abnormal site in these images as output data. Learning may be performed using a pair of training data. Here, as the input data of the training data, a tomographic image, a fundus image, a frontal image, or the like may be used alone as input data, or a combination thereof may be used as input data.

Further, in particular, the trained model for generating the diagnosis result may be a trained model obtained by training with the training data including the input data including a set of a plurality of medical images of different types of the predetermined part of the subject. Good. At this time, as the input data included in the training data, for example, input data in which a motion contrast front image and a luminance front image (or a luminance tom image) of the fundus of the eye are set can be considered. Further, as the input data included in the training data, for example, input data in which a tomographic image (B scan image) of the fundus and a color fundus image (or a fluorescent fundus image) are set can be considered. Further, the plurality of medical images of different types may be anything as long as they are acquired by different modality, different optical systems, different principles, or the like.

Further, the trained model for generating the diagnosis result may be a trained model obtained by learning from the training data including the input data including a plurality of medical images of different parts of the subject. At this time, as the input data included in the training data, for example, input data in which a tomographic image of the fundus of the eye (B scan image) and a tomographic image of the anterior segment of the eye (B scan image) are considered as a set can be considered. Further, as the input data included in the training data, for example, input data in which a three-dimensional OCT image (three-dimensional tomographic image) of the macula of the fundus and a circle scan (or raster scan) tomographic image of the optic nerve head of the fundus are set as a set, etc. Is also possible.

The input data included in the learning data may be different parts of the subject and a plurality of different types of medical images. At this time, the input data included in the training data may be, for example, input data in which a tomographic image of the anterior segment of the eye and a color fundus image are set. Further, the trained model described above may be a trained model obtained by learning from training data including input data including a set of a plurality of medical images having different shooting angles of view of a predetermined portion of the subject. Further, the input data included in the learning data may be a combination of a plurality of medical images obtained by time-dividing a predetermined portion into a plurality of regions, such as a panoramic image. At this time, by using a wide angle of view image such as a panoramic image as training data, there is a possibility that the feature amount of the image can be accurately acquired because the amount of information is larger than that of the narrow angle of view image. The result of can be improved. For example, when an abnormal portion is detected at a plurality of positions in a wide angle-of-view image at the time of estimation (prediction), an enlarged image of each abnormal portion can be sequentially displayed. As a result, it is possible to efficiently confirm the abnormal portion at a plurality of positions, so that the convenience of the examiner can be improved, for example. At this time, for example, the examiner may be configured to select each position on the wide angle-of-view image in which the abnormal portion is detected, and an enlarged image of the abnormal portion at the selected position may be displayed. .. Further, the input data included in the learning data may be input data in which a plurality of medical images of different dates and times of a predetermined part of the subject are set.

Further, the display screen on which at least one of the above-mentioned analysis result, diagnosis result, object recognition result, and segmentation result is displayed is not limited to the report screen. Such a display screen is, for example, at least one display screen such as a shooting confirmation screen, a display screen for follow-up observation, and a preview screen for various adjustments before shooting (a display screen on which various live moving images are displayed). It may be displayed in. For example, by displaying at least one result obtained by using the above-mentioned trained model on the shooting confirmation screen, the operator can confirm the accurate result even immediately after shooting.

Here, the various trained models described above can be obtained by machine learning using training data. Machine learning includes, for example, deep learning consisting of a multi-layer neural network. Further, for at least a part of the multi-layer neural network, for example, a convolutional neural network (CNN) can be used as a machine learning model. Further, a technique related to an autoencoder (self-encoder) may be used for at least a part of a multi-layer neural network. Further, a technique related to backpropagation (backpropagation method) may be used for learning. However, the machine learning is not limited to deep learning, and any learning using a model capable of extracting (expressing) the features of learning data such as images by learning may be used. Here, the machine learning model refers to a learning model based on a machine learning algorithm such as deep learning. The trained model is a model in which a machine learning model by an arbitrary machine learning algorithm is trained (learned) in advance using appropriate learning data. However, the trained model does not require further learning, and additional learning can be performed. The learning data is composed of a pair of input data and output data (correct answer data). Here, the learning data may be referred to as teacher data, or the correct answer data may be referred to as teacher data.

The GPU can perform efficient calculations by processing more data in parallel. Therefore, when learning is performed a plurality of times using a learning model such as deep learning, it is effective to perform processing on the GPU. Therefore, in this modification, a GPU is used in addition to the CPU for processing by the image processing device 300, which is an example of the learning unit (not shown). Specifically, when executing a learning program including a learning model, learning is performed by the CPU and the GPU collaborating to perform calculations. The processing of the learning unit may be performed only by the CPU or GPU. Further, the processing unit (estimation unit) that executes the processing using the various trained models described above may also use the GPU in the same manner as the learning unit. Further, the learning unit may include an error detecting unit and an updating unit (not shown). The error detection unit obtains an error between the output data output from the output layer of the neural network and the correct answer data according to the input data input to the input layer. The error detection unit may use the loss function to calculate the error between the output data from the neural network and the correct answer data. Further, the update unit updates the coupling weighting coefficient between the nodes of the neural network based on the error obtained by the error detection unit so that the error becomes small. This updating unit updates the coupling weighting coefficient and the like by using, for example, the backpropagation method. The error backpropagation method is a method of adjusting the coupling weighting coefficient and the like between the nodes of each neural network so that the above error becomes small.

Further, as a machine learning model used for evaluation processing, segmentation, etc., a function of an encoder composed of a plurality of layers including a plurality of downsampling layers and a function of a decoder composed of a plurality of layers including a plurality of upsampling layers are provided. The U-net type machine learning model to have is applicable. In the U-net type machine learning model, position information (spatial information) that is ambiguous in a plurality of layers configured as encoders is displayed in layers of the same dimension (layers corresponding to each other) in a plurality of layers configured as a decoder. ) (For example, using a skip connection).

Further, as a machine learning model used for evaluation, segmentation, etc., for example, FCN, SegNet, or the like can be used. Further, a machine learning model that recognizes an object in a region unit according to a desired configuration may be used. As a machine learning model for performing object recognition, for example, RCNN, fastRCNN, or fasterRCNN can be used. Further, YOLO or SSD can be used as a machine learning model for recognizing an object in a region unit.

Further, the machine learning model may be, for example, a capsule network (CapsNet: Capsule Network). Here, in a general neural network, each unit (each neuron) is configured to output a scalar value, so that, for example, spatial information regarding the spatial positional relationship (relative position) between features in an image can be obtained. It is configured to be reduced. Thereby, for example, learning can be performed so as to reduce the influence of local distortion and translation of the image. On the other hand, in the capsule network, each unit (each capsule) is configured to output spatial information as a vector, so that, for example, spatial information is retained. Thereby, for example, learning can be performed in which the spatial positional relationship between the features in the image is taken into consideration.

Further, the trained model for evaluation may be a trained model obtained by additionally learning training data including at least one evaluation value generated by the trained model. At this time, whether or not to use the evaluation value as learning data for additional learning may be configured to be selectable according to an instruction from the examiner. It should be noted that these configurations can be applied not only to the trained model for evaluation but also to the various trained models described above. Further, in the generation of the correct answer data used for learning the various trained models described above, the trained model for generating the correct answer data for generating the correct answer data such as labeling (annotation) may be used. At this time, the trained model for generating correct answer data may be obtained by (sequentially) additionally learning the correct answer data obtained by labeling (annotation) by the examiner. That is, the trained model for generating correct answer data may be obtained by additional training of training data in which the data before labeling is used as input data and the data after labeling is used as output data. Further, in a plurality of consecutive frames such as a moving image, the result of the frame judged to have low accuracy is corrected in consideration of the results of object recognition and segmentation of the preceding and following frames. May be good. At this time, according to the instruction from the examiner, the corrected result may be additionally learned as correct answer data.

When the area to be inspected is detected by using the trained model for object recognition or the trained model for segmentation, predetermined image processing can be performed for each detected area. For example, consider the case of detecting at least two regions of the vitreous region, the retinal region, and the choroid region. In this case, when performing image processing such as contrast adjustment on at least two detected regions, adjustments suitable for each region can be performed by using different image processing parameters. By displaying the image adjusted suitable for each area, the operator can more appropriately diagnose the disease or the like in each area. Note that the configuration using different image processing parameters for each detected region may be similarly applied to the region of the eye to be inspected detected without using the trained model, for example.

(Modification 2)
In the various examples and modifications described above, when various trained models are undergoing additional learning, it may be difficult to output (infer / predict) using the trained model itself during the additional learning. Therefore, it is preferable to prohibit the input of the front image and the medical image to the trained model during the additional learning. Further, another trained model that is the same as the trained model being additionally trained may be prepared as another preliminary trained model. At this time, during the additional learning, it is preferable to be able to input the front image and the medical image to the preliminary trained model. Then, after the additional learning is completed, the trained model after the additional learning is evaluated, and if there is no problem, the preliminary trained model may be replaced with the trained model after the additional learning. Also, if there is a problem, a preliminary trained model may be used.

In addition, the trained model obtained by learning for each imaging site may be selectively used. Specifically, learning including a first learned model obtained by using learning data including the first imaging site (lung, eye to be examined, etc.) and a second imaging site different from the first imaging site. A second trained model obtained using the data and a plurality of trained models including the second trained model can be prepared. Then, the image processing device 300 may have a selection means for selecting one of the plurality of trained models. At this time, the image processing device 300 may have a control means for executing additional learning on the selected trained model. The control means searches for data in which the imaged part corresponding to the selected trained model and the photographed image of the imaged part are paired according to the instruction from the examiner, and the data obtained by the search is the learning data. Can be executed as additional learning for the selected trained model. The imaging site corresponding to the selected trained model may be acquired from the information in the header of the data or manually input by the examiner. Further, the data search may be performed from a server of an external facility such as a hospital or a research institute via a network, for example. As a result, additional learning can be efficiently performed for each imaged part by using the photographed image of the imaged part corresponding to the trained model.

The selection means and the control means may be composed of a software module executed by a processor such as a CPU or an MPU of the image processing device 300. Further, the selection means and the control means may be composed of a circuit that performs a specific function such as an ASIC, an independent device, or the like.

In addition, when acquiring learning data for additional learning from a server of an external facility such as a hospital or research institute via a network, it is necessary to reduce reliability deterioration due to falsification or system trouble during additional learning. Is useful. Therefore, the correctness of the learning data for additional learning may be detected by confirming the consistency by digital signature or hashing. As a result, the learning data for additional learning can be protected. At this time, if the validity of the training data for additional learning cannot be detected as a result of confirming the consistency by digital signature or hashing, a warning to that effect is given and additional learning is performed using the training data. Make it not exist. The server may be in any form, for example, a cloud server, a fog server, an edge server, or the like, regardless of its installation location.

(Modification example 3)
In the various examples and modifications described above, the instruction from the examiner may be an instruction by voice or the like in addition to a manual instruction (for example, an instruction using a user interface or the like). At this time, for example, a machine learning model including a voice recognition model (speech recognition engine, trained model for voice recognition) obtained by machine learning may be used. Further, the manual instruction may be an instruction by character input or the like using a keyboard, a touch panel, or the like. At this time, for example, a machine learning model including a character recognition model (character recognition engine, trained model for character recognition) obtained by machine learning may be used. Further, the instruction from the examiner may be an instruction by a gesture or the like. At this time, a machine learning model including a gesture recognition model (gesture recognition engine, learned model for gesture recognition) obtained by machine learning may be used.

Further, the instruction from the examiner may be the result of the examiner's line-of-sight detection on the display screen of the display unit 310 or the like. The line-of-sight detection result may be, for example, a pupil detection result using a moving image of the examiner obtained by photographing from the periphery of the display screen on the display unit 310. At this time, the object recognition engine as described above may be used for the pupil detection from the moving image. Further, the instruction from the examiner may be an instruction by an electroencephalogram, a weak electric signal flowing through the body, or the like.

In such a case, for example, as the training data, various trained data such as character data or voice data (waveform data) indicating an instruction for displaying the result by processing the various trained models as described above are used as input data. The training data may be training data in which the execution instruction for actually displaying the result of the model processing on the display unit is the correct answer data. Further, as the training data, for example, character data or voice data indicating an instruction for designating the extraction target (target area) is used as input data, and an execution command for designating the extraction target and a selection button shown in FIG. 5 are selected. It may be learning data in which the execution instruction for the purpose is the correct answer data. The learning data may be any data as long as the instruction content and the execution instruction content indicated by the character data, the voice data, or the like correspond to each other. Further, the voice data may be converted into character data by using an acoustic model, a language model, or the like. Further, the waveform data obtained by the plurality of microphones may be used to perform a process of reducing the noise data superimposed on the voice data. Further, the instruction by characters or voice and the instruction by a mouse or a touch panel may be configured to be selectable according to the instruction from the examiner. Further, the on / off of the instruction by characters or voice may be selectably configured according to the instruction from the examiner.

According to such a configuration, the extraction target (target area) is a trained model for generating a character recognition result and a trained model for generating a voice recognition result by the image generation unit 304 (target designation unit). , And the information obtained by using at least one of the trained models for generating the gesture recognition result can be specified. This makes it possible to improve the operability of the image processing device 300 by the examiner.

Here, machine learning includes deep learning as described above, and for at least a part of a multi-layer neural network, for example, a recurrent neural network (RNN) can be used. Here, as an example of the machine learning model according to this modified example, RNN, which is a neural network that handles time series information, will be described with reference to FIGS. 27A and 27B. Further, a Long short-term memory (hereinafter referred to as LSTM), which is a kind of RNN, will be described with reference to FIGS. 28A and 28B.

FIG. 27A shows the structure of the RNN, which is a machine learning model. The RNN2720 has a loop structure in the network, data x ^t 2710 is input at ^{time t, and data h t} 2730 is output. Since the RNN2720 has a loop function in the network, the current state can be inherited to the next state, so that time-series information can be handled. FIG. 27B shows an example of input / output of the parameter vector at time t. The data x ^t 2710 contains N pieces of data (Params1 to ParamsN). Further, the data h ^t 2730 output from the RNN 2720 includes N data (Params 1 to Params N) corresponding to the input data.

However, since RNN cannot handle long-term information during error back propagation, LSTM may be used. The LSTM can learn long-term information by including a forgetting gate, an input gate, and an output gate. Here, FIG. 28A shows the structure of the LSTM. In the RSTM2840, the information that the network takes over at the next time t is the internal state ct ^{-1 of the} network called the cell and the output data h ^t-1 . The lowercase letters (c, h, x) in the figure represent vectors.

Next, FIG. 28B shows the details of RSTM2840. In FIG. 28B, the oblivion gate network FG, the input gate network IG, and the output gate network OG are shown, each of which is a sigmoid layer. Therefore, a vector in which each element has a value of 0 to 1 is output. The oblivion gate network FG determines how much past information is retained, and the input gate network IG determines which value to update. Further, in FIG. 28B, the cell update candidate network CU is shown, and the cell update candidate network CU is the activation function tanh layer. This creates a vector of new candidate values to be added to the cell. The output gate network OG selects the cell candidate element and selects how much information to convey at the next time.

Since the above-mentioned LSTM model is a basic model, it is not limited to the network shown here. You may change the coupling between the networks. QRNN (Quasi Recurrent Neural Network) may be used instead of RSTM. Further, the machine learning model is not limited to the neural network, and boosting, a support vector machine, or the like may be used. Further, when the instruction from the examiner is input by characters, voice, or the like, a technique related to natural language processing (for example, Sequence to Sequence) may be applied. Further, a dialogue engine (dialogue model, trained model for dialogue) that responds to the examiner by outputting characters or voices may be applied.

(Modification example 4)
In the various examples and modifications described above, the front image, the area label image generated by the segmentation process, and the like may be stored in the storage unit in response to an instruction from the operator. At this time, for example, after an instruction from the operator for saving the area label image, when registering the file name, as a recommended file name, any part of the file name (for example, the first part, or In the last part), a file name containing information (for example, characters) indicating that the image is generated by processing using a trained model for segmentation can be edited according to an instruction from the operator. It may be displayed as. Similarly, for an image or the like obtained by using another trained model, a file name including information which is an image generated by a process using the trained model may be displayed.

Further, when displaying the area label image on the display unit 310 on various display screens such as the report screen, it is determined that the displayed image is an image generated by processing using the trained model for segmentation. The indication shown may be displayed with the image. In this case, the operator can easily identify from the display that the displayed image is not the image itself acquired by shooting, so that misdiagnosis can be reduced or the diagnosis efficiency can be improved. it can. In addition, the display indicating that the image is generated by the process using the trained model for segmentation is any form as long as the input image and the image generated by the process can be distinguished from each other. But it may be. Further, not only the processing using the trained model for segmentation but also the processing using the various trained models as described above is the result generated by the processing using the trained model of that type. May be displayed with the result. In addition, when displaying the analysis result of the segmentation result using the trained model for segmentation processing, the display indicating that the analysis result is based on the result using the trained model for segmentation is displayed together with the analysis result. It may be displayed.

At this time, the display screen such as the report screen may be saved in the storage unit as image data according to the instruction from the operator. For example, the report screen may be saved in the storage unit as one image in which the area label image and the like and the display indicating that these images are images generated by the processing using the trained model are arranged side by side.

In addition, regarding the display showing that the image is generated by the processing using the trained model for segmentation, the display showing what kind of training data the trained model for segmentation used for training is displayed. It may be displayed on the display unit. The display may include an explanation of the types of the input data and the correct answer data of the learning data, and an arbitrary display regarding the correct answer data such as the imaging part included in the input data and the correct answer data. For example, even in the case of processing using the various trained models described above, even if a display indicating what kind of training data the trained model of that type is trained by is displayed on the display unit 310. Good.

Further, information (for example, characters) indicating that the image is generated by processing using the trained model may be displayed or stored in a state of being superimposed on the image or the like. At this time, the portion to be superimposed on the image may be any region (for example, the edge of the image) that does not overlap with the region where the region of interest to be photographed is displayed. Further, the non-overlapping areas may be determined and superimposed on the determined areas. In addition to the processing using the trained model for segmentation, the image obtained by the processing using the other various trained models described above may be processed in the same manner.

Further, when the default setting is set so that a predetermined extraction target (target area) is selected as the initial display screen of the report screen as shown in FIG. 4, the extraction target is according to the instruction from the examiner. The report image corresponding to the report screen including the image of the above may be configured to be transmitted to the server. In addition, when the default setting is set so that a predetermined extraction target is selected, when the inspection is completed (for example, when the shooting confirmation screen or preview screen is changed to the report screen in response to an instruction from the inspector). ) May be configured to (automatically) send the report image corresponding to the report screen including the image to be extracted to the server. At this time, various settings in the default settings (for example, the depth range for generating the En-Face image on the initial display screen of the report screen, whether or not the analysis map is superimposed, whether or not the image is to be extracted, and the display for follow-up observation. The report image generated based on (settings related to at least one such as screen or not) may be configured to be transmitted to the server.

(Modification 5)
In the various examples and modifications described above, among the various trained models described above, images obtained by the first type of trained model (for example, an image showing an analysis result such as an analysis map, object recognition). An image showing the result, an image showing the segmentation result) may be input to the trained model of the second type different from the first type. At this time, the result of processing the second type of trained model (for example, evaluation result, analysis result, diagnosis result, object recognition result, segmentation result) may be generated.

Further, among the various trained models as described above, the first type is used by using the result of processing the first type of trained model (for example, analysis result, diagnosis result, object recognition result, segmentation result). An image to be input to a second type of trained model different from the first type may be generated from the image input to the trained model of. At this time, the generated image is likely to be an image suitable as an image to be processed using the second type of trained model. Therefore, an image obtained by inputting the generated image into the second type of trained model (for example, an image showing an analysis result such as an analysis map, an image showing an object recognition result, an image showing a segmentation result). The accuracy can be improved.

Alternatively, a similar case image search using an external database stored in a server or the like may be performed using the analysis result, the diagnosis result, etc. obtained by the processing of the trained model as described above as a search key. If a plurality of images stored in the database are already managed by machine learning or the like with the feature amount of each of the plurality of images attached as incidental information, the image itself is used as a search key. A similar case image search engine (similar case image search model, trained model for similar case image search) may be used. For example, the image processing device searches for similar case images for each of the different regions specified by segmentation processing or the like, using a trained model for searching for similar case images that is different from the trained model for acquiring evaluation results. be able to.

(Modification 6)
In the above-described examples and modifications, the three-dimensional volume data and the front image relating to the fundus portion of the eye to be inspected have been described, but the image processing may be performed on the image relating to the anterior segment of the eye to be inspected. In this case, the regions of the image to be subjected to different image processing include regions such as the crystalline lens, cornea, iris, and anterior chamber of eye. The region may include another region of the anterior segment of the eye. Further, the region for the image relating to the fundus portion is not limited to the vitreous portion, the retina portion, and the choroid portion, and may include other regions relating to the fundus portion.

Further, in the above-mentioned Examples and Modified Examples, the subject to be examined has been described as an example, but the subject is not limited to this. For example, the subject may be skin, other organs, or the like. In this case, the OCT device according to the above embodiment and the modified example can be applied to a medical device such as an endoscope in addition to the ophthalmic device.

(Modification 7)
Further, the image processed by the image processing apparatus or the image processing method according to the various examples and modifications described above includes a medical image acquired by using an arbitrary modality (imaging apparatus, imaging method). The medical image to be processed may include a medical image acquired by an arbitrary imaging device or the like, or an image created by an image processing device or an image processing method according to the above-described embodiment and modification.

Further, the medical image to be processed is an image of a predetermined part of the subject (subject), and the image of the predetermined part includes at least a part of the predetermined part of the subject. In addition, the medical image may include other parts of the subject. Further, the medical image may be a still image or a moving image, and may be a black-and-white image or a color image. Further, the medical image may be an image showing the structure (morphology) of a predetermined part or an image showing the function thereof. The image showing the function includes, for example, an OCTA image, a Doppler OCT image, an fMRI image, and an image showing blood flow dynamics (blood flow volume, blood flow velocity, etc.) such as an ultrasonic Doppler image. The predetermined part of the subject may be determined according to the subject to be imaged, and the human eye (eye to be examined), brain, lung, intestine, heart, pancreas, kidney, liver and other organs, head, chest, etc. Includes any part such as legs and arms.

Further, the medical image may be a tomographic image of the subject or a frontal image. The frontal image is, for example, a frontal image of the fundus, a frontal image of the anterior segment of the eye, a fluorescently photographed fundus image, and data acquired by OCT (three-dimensional OCT data) in at least a part of a range in the depth direction of the object to be imaged. Includes En-Face images generated using the data from. The En-Face image is an OCTA En-Face image (motion contrast front image) generated by using data in at least a part of the depth direction of the shooting target for three-dimensional OCTA data (three-dimensional motion contrast data). ) May be. Further, three-dimensional OCT data and three-dimensional motion contrast data are examples of three-dimensional medical image data.

Here, the photographing device is a device for photographing an image used for diagnosis. The photographing device detects, for example, a device that obtains an image of a predetermined part by irradiating a predetermined part of the subject with radiation such as light or X-rays, electromagnetic waves, ultrasonic waves, or the like, or radiation emitted from the subject. This includes a device for obtaining an image of a predetermined part. More specifically, the imaging devices according to the various examples and modifications described above include at least an X-ray imaging device, a CT device, an MRI device, a PET device, a SPECT device, an SLO device, an OCT device, an OCTA device, and a fundus. Includes cameras, endoscopes, etc.

Therefore, with respect to the inventions described in the above Examples and Modifications, for example, for a plurality of slice images corresponding to different positions of the subject acquired by the CT apparatus, the target region (attention site or the region of interest) in the slice image is obtained by the image evaluation unit 343. It may be configured to evaluate the presence of the target site). In this case, the image generation unit 304 can determine the output image by using the evaluation result (information indicating the evaluation) by the image evaluation unit 343. Such image processing is not limited to the field of ophthalmology, and can be applied to medical images acquired for a target site by any of the above-mentioned imaging devices. Since the position of interest in the subject may shift due to individual differences, the process evaluates multiple images corresponding to different locations and determines the output image using the evaluation results to determine the region of interest. It is possible to acquire an image that is easy to confirm. The predetermined part of the subject described above can be an example of the extraction target (target area).

The medical image acquired by using the imaging device has different image features depending on the type of the region of interest. Therefore, the trained models used in the various examples and modifications described above may be generated and prepared for each type of the region of interest. In this case, for example, the image processing apparatus 300 can select a trained model to be used for processing by the image evaluation unit 343 or the like according to the designated target area (part of interest).

Further, the display mode of the GUI or the like described in the above-described embodiment and the modified example is not limited to the above-mentioned one, and may be arbitrarily changed according to a desired configuration. For example, although it is described that the OCTA front image, the tomographic image, and the depth range are displayed for the GUI500 and the like, the motion contrast data may be displayed on the tomographic image. In this case, it is possible to confirm at which depth the motion contrast value is distributed. Further, colors may be used for displaying an image or the like.

Further, in the above embodiment and the modified example, the generated image is displayed on the display unit 310, but for example, it may be output to an external device such as an external server. Further, the different depth ranges corresponding to the plurality of front images may be partially overlapping depth ranges.

Further, in the trained model for evaluation of the extraction target (target area) according to the above-mentioned example and the modified example, the magnitude of the brightness value of the front image, the order and inclination of the bright part and the dark part, the position, the distribution, the continuity, etc. are determined. It is considered that it is extracted as a part of the feature quantity and used for the estimation process.

In the above-described embodiment and modification, the spectrum domain OCT (SD-OCT) device using the SLD as the light source has been described as the OCT device, but the configuration of the OCT device according to the present invention is not limited to this. For example, the present invention can be applied to any other type of OCT apparatus such as a wavelength sweep type OCT (SS-OCT) apparatus using a wavelength sweep light source capable of sweeping the wavelength of emitted light. The present invention can also be applied to a Line-OCT device (or SS-Line-OCT device) using line light. The present invention can also be applied to a Full Field-OCT device (or SS-Full Field-OCT device) using area light. Further, the present invention may be applied to a wave surface adaptive optics tomography (AO-OCT) device using an adaptive optics system or a polarized light OCT (PS-OCT) device for visualizing information on polarization phase difference and polarization elimination. it can.

In the above examples and modifications, an optical fiber optical system using a coupler is used as the dividing means, but a spatial optical system using a collimator and a beam splitter may be used. Further, the configurations of the optical interference unit 100 and the scanning optical system 200 are not limited to the above configurations, and a part of the configurations included in the optical interference unit 100 and the scanning optical system 200 may be different from these configurations. Further, although the Michelson interference system is used as the interference system, the Mach-Zehnder interference system may be used.

Further, in the above embodiment and the modified example, the image processing apparatus 300 has acquired the interference signal acquired by the optical interference unit 100, the tomographic data generated by the reconstruction unit 301, and the like. However, the configuration in which the image processing device 300 acquires these signals and images is not limited to this. For example, the image processing device 300 may acquire these signals and data from a server or a photographing device connected to the image processing device 300 via a LAN, WAN, the Internet, or the like.

The trained model according to the above embodiment and the modified example can be provided in the image processing device 300. The trained model may be composed of, for example, a CPU, a software module executed by a processor such as an MPU, GPU, or FPGA, or a circuit or the like that performs a specific function such as an ASIC. Further, these trained models may be provided in a device of another server connected to the image processing device 300 or the like. In this case, the image processing device 300 can use the trained model by connecting to a server or the like provided with the trained model via an arbitrary network such as the Internet. Here, the server provided with the trained model may be, for example, a cloud server, a fog server, an edge server, or the like. Further, the training data of the trained model is not limited to the data obtained by using the ophthalmology device itself that actually performs the imaging, but the data obtained by using the same type of ophthalmology device or the same type of ophthalmology according to a desired configuration. It may be data or the like obtained by using the device.

In connection with this, the image evaluation unit 343 may be provided outside the image processing device 300. In this case, the image evaluation unit 343 is configured by an external device such as an external server connected to the image processing device 300, and the image processing device 300 includes the acquired three-dimensional volume data, the generated front image, and the extraction target (target area). ) Is sent to an external device. After that, the image processing device 300 may determine or generate a front image to be output by using the evaluation result acquired from the external device. In this case, an image processing system provided with the image processing device 300 and the external device (evaluation device) can be configured. When the image evaluation unit 343 is provided outside the image processing device 300, the determination unit that determines the image to be output using the information indicating the evaluation may be provided in the same device as the image evaluation unit 343. Good.

According to the various examples and modifications described above of the present invention, the target area can be easily confirmed.

(Other Examples)
The present invention is also a process in which a program that realizes one or more functions of the above-described examples and modifications is supplied to a system or device via a network or storage medium, and a computer of the system or device reads and executes the program. It is feasible. A computer may have one or more processors or circuits and may include multiple separate computers or a network of separate processors or circuits to read and execute computer executable instructions.

The processor or circuit may include a central processing unit (CPU), a microprocessing unit (MPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), or a field programmable gateway (FPGA). Also, the processor or circuit may include a digital signal processor (DSP), a data flow processor (DFP), or a neural processing unit (NPU).

It is not limited to the above-described embodiment of the present invention, and various changes and modifications can be made without departing from the gist and scope of the present invention. Therefore, the following claims are attached in order to publicize the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2019-211862 submitted on November 22, 2019, and all the contents thereof are incorporated herein by reference.

300: Image processing device, 304: Image generation unit (decision unit), 343: Image evaluation unit (evaluation unit)

Claims

Information indicating an evaluation of the existence of a target area using a plurality of front images corresponding to different depth ranges of the three-dimensional volume data of the eye to be inspected, and acquiring a plurality of information corresponding to the plurality of front images. Evaluation department and
A determination unit that determines at least one of the plurality of front images as an output image using the plurality of information.
An image processing device.
It also has a display control unit that controls the display of the display unit.
The image processing device according to claim 1, wherein the display control unit causes the display unit to display the plurality of information side by side with the plurality of front images.
The plurality of pieces of information are a plurality of evaluation values.
The image processing apparatus according to claim 1 or 2, wherein the determination unit determines a front image corresponding to an evaluation value higher than the other evaluation values among the plurality of evaluation values as the output image.
Information indicating an evaluation of the existence of a target area using a plurality of front images corresponding to different depth ranges of the three-dimensional volume data of the eye to be inspected, and acquiring a plurality of information corresponding to the plurality of front images. Evaluation department and
A determination unit that determines the depth range using the plurality of information and determines an image generated using the determined depth range as an output image.
An image processing device.
The plurality of pieces of information are a plurality of evaluation values.
The image processing apparatus according to claim 4, wherein the determination unit determines the depth range by connecting the depth ranges corresponding to the evaluation values that are equal to or greater than the threshold value among the plurality of evaluation values.
The plurality of pieces of information are a plurality of evaluation values.
The determination unit has an upper limit of a depth position shallower than other depth positions in a depth range corresponding to an evaluation value equal to or higher than a threshold value among the plurality of evaluation values, and a lower limit of a depth position deeper than other depth positions. The image processing apparatus according to claim 4, wherein the depth range of the output image is determined.
The plurality of pieces of information are a plurality of evaluation values.
The image processing apparatus according to claim 4, wherein the determination unit determines a depth range of the output image centering on a depth range corresponding to an evaluation value higher than the other evaluation values among the plurality of evaluation values.
It also has an image generator that generates a front image.
The plurality of pieces of information are a plurality of evaluation values.
The image generation unit determines a plurality of depth ranges obtained by increasing or decreasing the depth range corresponding to an evaluation value higher than the other evaluation values among the plurality of evaluation values, and obtains a front image corresponding to the plurality of depth ranges. Generate and
The evaluation unit acquires a plurality of evaluation values of the plurality of front images corresponding to the plurality of depth ranges by using the front images corresponding to the plurality of depth ranges.
According to claim 4, the determination unit determines as the output image a front image corresponding to an evaluation value higher than the other evaluation values among a plurality of evaluation values of the plurality of front images corresponding to the plurality of depth ranges. The image processing apparatus described.
The image according to any one of claims 1 to 8, wherein each depth range for generating the plurality of frontal images is a depth range within a range of 0 to 50 μm from the outer layer of the retina or the Bruch's membrane to the choroid side. Processing equipment.
Any one of claims 1 to 8, wherein each depth range for generating the plurality of front images is a depth range within a range of 0 to 500 μm from the boundary between the retina and the vitreous body of the papilla to the interstitial side. The image processing apparatus according to.
The image processing apparatus according to any one of claims 1 to 8, wherein each depth range for generating the plurality of front images is a depth range in the surface layer of the retina or the deep layer of the retina.
The evaluation unit uses a trained model obtained by training with training data including a front image and information indicating an evaluation for evaluating the existence of a target region in the front image, and uses the plurality of trained models from the plurality of front images. The image processing apparatus according to any one of claims 1 to 11, wherein the information of the above is acquired.
The image processing apparatus according to claim 12, wherein the training data includes a plurality of front images corresponding to different depth ranges and information indicating an evaluation for evaluating the existence of a target region in the plurality of front images.
One of claims 1 to 13, wherein the evaluation unit acquires the plurality of information from the plurality of front images by using a trained model for generating a segmentation result or an object recognition result from the front image. The image processing apparatus according to.
The evaluation unit obtains information on the difference between the front image obtained by using the hostile generation network or the autoencoder for each of the plurality of front images and the front image input to the hostile generation network or the autoencoder. The image processing apparatus according to any one of claims 1 to 14, wherein the plurality of information is acquired by using the image processing apparatus.
The evaluation unit obtains the plurality of information from the plurality of front images by using a trained model for generating an analysis result or a diagnosis result from the front image, according to any one of claims 1 to 15. The image processing apparatus described.
Information indicating evaluation of evaluation of the existence of a target region using a plurality of medical images corresponding to different positions of three-dimensional volume data of a subject, and acquiring a plurality of information corresponding to the plurality of medical images. Evaluation department and
A determination unit that determines at least one of the plurality of medical images as an output image using the plurality of information.
An image processing device.
The image processing apparatus according to any one of claims 1 to 17, further comprising a target designation unit for designating the target area.
A targeting part that specifies the target area from the 3D volume data of the eye to be inspected,
A display control unit that displays a plurality of front images corresponding to different depth ranges of the three-dimensional volume data, and a plurality of front images generated by using the information of the designated target area, arranged side by side on the display unit.
Using the information of the designated target area, the type of three-dimensional volume data for generating the plurality of front images, the layer or depth range to be the target area, the number of front images to be generated, and the front image are generated. A determinant that determines at least one of the depth range to be created and the interval between the depth ranges to generate the front image.
An image processing device.
The target area includes at least one trained model among a trained model for generating a character recognition result, a trained model for generating a voice recognition result, and a trained model for generating a gesture recognition result. The image processing apparatus according to claim 18 or 19, which is designated in use.
The image processing apparatus according to any one of claims 1 to 20, wherein the target region is a region of a neovascularization, and the three-dimensional volume data is three-dimensional motion contrast data.
The image processing apparatus according to any one of claims 1 to 20, wherein the target area is a sieve plate area, and the three-dimensional volume data is three-dimensional tomographic data of the papilla.
The image processing apparatus according to any one of claims 1 to 20, wherein the target region is a region of a capillary aneurysm, and the three-dimensional volume data is three-dimensional motion contrast data of a yellow spot portion.
Information indicating an evaluation of the existence of a target area using a plurality of front images corresponding to different depth ranges of the three-dimensional volume data of the eye to be inspected, and acquiring a plurality of information corresponding to the plurality of front images. To do and
Using the plurality of information, at least one of the plurality of front images is determined as an output image.
Image processing methods, including.
Information indicating an evaluation of the existence of a target area using a plurality of front images corresponding to different depth ranges of the three-dimensional volume data of the eye to be inspected, and acquiring a plurality of information corresponding to the plurality of front images. To do and
The depth range is determined using the plurality of information, and the image generated using the determined depth range is determined as the output image.
Image processing methods, including.
Information indicating evaluation of evaluation of the existence of a target region using a plurality of medical images corresponding to different positions of three-dimensional volume data of a subject, and acquiring a plurality of information corresponding to the plurality of medical images. That and
Using the plurality of information, at least one of the plurality of medical images is determined as an output image.
Image processing methods, including.
Specifying the target area from the 3D volume data of the eye to be inspected,
A plurality of front images corresponding to different depth ranges of the three-dimensional volume data, and a plurality of front images generated by using the information of the designated target area are displayed side by side on the display unit.
Using the information of the designated target area, the type of three-dimensional volume data for generating the plurality of front images, the layer or depth range to be the target area, the number of front images to be generated, and the front image are generated. Determining at least one of the depth range to be used and the interval of the depth range to generate the front image,
Image processing methods, including.
A program that, when executed by a computer, causes the computer to perform each step of the image processing method according to any one of claims 24 to 27.