WO2020008834A1

WO2020008834A1 - Image processing device, method, and endoscopic system

Info

Publication number: WO2020008834A1
Application number: PCT/JP2019/023492
Authority: WO
Inventors: 慧内藤; 駿平加門
Original assignee: 富士フイルム株式会社
Priority date: 2018-07-05
Filing date: 2019-06-13
Publication date: 2020-01-09
Also published as: JP2022189900A; JPWO2020008834A1; JP7289296B2

Abstract

Provided are an image processing device, method, and an endoscopic system with which it is possible to improve recognition accuracy on the basis of a plurality of images and with which it is possible to present preferable observation images and recognition results. The present invention is provided with: a recognizer (15) which receives an image set comprising a plurality of images acquired sequentially using a plurality of different observation beams and which outputs a recognition result with respect to the image set; and a display control unit (66), of an endoscope processor (13), for causing both a recognition result and an observation image calculated using all or a portion of a plurality of images, to be displayed on a display unit (14). Since the recognizer (15) receives a set of images acquired sequentially using a plurality of different observation beams and acquires a recognition result with respect to the image set, it is possible to improve recognition accuracy as compared with the case where image recognition is performed on the basis of a single image acquired using a single observation beam.

Description

Image processing apparatus, method, and endoscope system

The present invention relates to an image processing apparatus, an image processing method, and an endoscope system, and more particularly, to a technology that can be used for assisting a doctor in endoscopy.

検査 In the medical field, examinations using endoscope devices are performed. In recent years, it has been known that the position of a lesion and the type of lesion included in an endoscopic image are recognized by image analysis, and an examination is supported by reporting the recognition result.

In image analysis for recognition, machine learning of images, such as deep learning, is widely used.

Patent Literature 1 discloses an acquisition unit that acquires a plurality of images of cells photographed in a time series, and acquires a plurality of images in a time series with respect to each of one or more predetermined evaluation items. There has been proposed an information processing apparatus including an assigning unit that assigns an evaluation value along the evaluation unit and an evaluation unit that evaluates a cell based on a time change of the evaluation value along the assigned time series. Here, the evaluation unit assigns an evaluation value along the time series to a plurality of images of the cells photographed along the time series according to a machine learning algorithm, and based on a time change of the assigned evaluation value. The cells to be observed are evaluated. This enables an evaluation that comprehensively considers the time-series evaluation of cells.

Further, in Patent Document 2, time-series input data which is a data sequence in a moving image is acquired, and a plurality of input values corresponding to input data at one time point in the time-series input data correspond to the time-series input data. It is supplied to a plurality of nodes of a trained model (a model constituting a Boltzmann machine), and a plurality of nodes corresponding to the input data series before the prediction target time in the time-series input data and the input data in the input data series in the model Calculating a conditional probability to be each input value corresponding to a prediction target time point under a condition in which an input data sequence occurs, based on a weight parameter between each of the input values and each of the plurality of nodes; Based on the conditional probability of each input value corresponding to the target time point, the next input data becomes a predetermined value under the condition that the time-series input data occurs. Processing apparatus has been proposed to calculate the conditional probability.

As an example, the processing apparatus can generate a moving image including a total of T images by predicting one image data arrayed at the next time based on T-1 image data arranged in time series. it can.

JP 2018-22216 A JP 2016-71697 A

The information processing device described in Patent Literature 1 evaluates various changes and the like in a culture process of cells (fertilized eggs) to be imaged, based on a plurality of images imaged in time series. The images are captured under the same imaging conditions. This is because, unless the images are taken under the same imaging conditions, it is not possible to evaluate the change in the fertilized egg from the plurality of acquired images. That is, the plurality of images are not images sequentially acquired using different observation lights.

The processing device described in Patent Literature 2 enables prediction of image data at the next time by using a trained model that inputs time-series input data. The time-series input data is captured under the same imaging conditions. It is a thing. This is because if the image is not captured under the same imaging condition, the image data at the next time cannot be predicted from the input time-series input data. That is, the time-series input data is not input data sequentially acquired using different observation lights.

In addition, the inventions described in

Patent Documents

1 and 2 each input a plurality of time-series images in order to predict objects (cells, future moving images) that change over time, and improve recognition accuracy in a recognizer. It does not input multiple images for the purpose of improvement.

The present invention has been made in view of such circumstances, and can improve the recognition accuracy based on a plurality of images, and an image processing apparatus that can present a good observation image and a recognition result, It is an object to provide a method and an endoscope system.

In order to achieve the above object, an image processing apparatus according to an aspect of the present invention receives an image set including a plurality of images sequentially acquired using a plurality of different observation lights and outputs a recognition result for the image set. The image processing apparatus includes a recognizer, and a display control unit that causes the display unit to display the observation image and the recognition result calculated using a part of the plurality of images or the plurality of images.

According to an aspect of the present invention, an image set sequentially acquired using a plurality of different observation lights is input, and a recognition result for the image set is acquired, so that one image acquired by one observation light is acquired. , The recognition accuracy can be improved as compared with the case where the image is recognized. In addition, by displaying the recognition result on the display unit together with the observation images obtained from the plurality of images, the recognition result can be appropriately presented.

In the image processing device according to another aspect of the present invention, the recognizer has a trained model learned by setting a plurality of images for learning and correct data, and each time a plurality of images for recognition are received. It is preferable to output a recognition result based on the learned model.

において In the image processing device according to still another aspect of the present invention, it is preferable that the learned model is configured by a convolutional neural network. The convolutional neural network is excellent in recognizing images.

In the image processing device according to still another aspect of the present invention, the plurality of images include a first endoscope image and a second endoscope image acquired using observation light different from the first endoscope image. It is preferred to include. In the endoscopy, a plurality of images may be acquired using a plurality of different observation lights, and the present invention can be applied to the endoscopy in this case.

In the image processing device according to still another aspect of the present invention, the first endoscope image is a normal light image captured with normal light, and the second endoscope image is a special light image captured with special light. It is preferably an image. Generally, a normal light image is used as an image for observation, and a special light image is used when it is desired to observe a surface structure.

In the image processing apparatus according to still another aspect of the present invention, the special light image includes two or more special light images captured by two or more different special lights. Two or more special light images can be captured according to the observation purpose such as when the depth of the surface structure to be observed is different.

In the image processing device according to still another aspect of the present invention, the first endoscope image is a first special light image captured with the first special light, and the second endoscope image is a first special light image. 3 is a second special light image captured by a second special light different from the second special light. That is, the plurality of endoscope images may not include the normal light image.

In the image processing device according to still another aspect of the present invention, it is preferable that the display control unit causes the display unit to display an observation image calculated using a part of the plurality of images or the plurality of images as a moving image. . Thus, it is possible to perform an inspection in real time while viewing the observation image and the recognition result displayed as a moving image.

In the image processing device according to still another aspect of the present invention, the recognizer recognizes a region of interest included in the plurality of images, and the display control unit displays an index indicating the recognized region of interest on the display unit. It is preferable to superimpose and display the image on the image. This makes it possible to support the inspection so that the attention area in the observation image is not overlooked.

In the image processing device according to still another aspect of the present invention, the recognizer recognizes a region of interest included in the plurality of images, and the display control unit displays information indicating presence / absence of the region of interest on the display unit. It is preferable that the display is made so as not to overlap with. Accordingly, it is possible to notify that the attention area exists in the observation image, and to prevent the observation of the observation image from being disturbed by the information displayed on the display unit.

In the image processing device according to still another aspect of the present invention, the recognizer executes the discrimination regarding the lesion based on the plurality of images and outputs the discrimination result, and the display control unit displays the discrimination result on the display unit. Is preferred. This makes it possible to visually inspect the observation image while referring to the discrimination result obtained by the recognizer.

An endoscope system according to yet another aspect of the present invention includes a light source device that sequentially generates a first observation light and a second observation light different from the first observation light, and a first observation light and a second observation light sequentially. An endoscope scope that captures a plurality of images by sequentially capturing the illuminated observation target, a display unit, and the image processing device described above, and a recognizer includes a plurality of image capture units that the endoscope scope captures. An image set including images is received.

An endoscope system according to still another aspect of the present invention includes an endoscope processor that receives a plurality of images captured by an endoscope and performs image processing on the plurality of images. It is preferable to receive a plurality of images after image processing by the processor. The endoscope processor has a function of performing image processing on a plurality of images captured by the endoscope scope, and the recognizer can detect and discriminate a lesion area using the plurality of images after the image processing. it can. The recognizer may be separate from the endoscope processor, or may be built in the endoscope processor.

An image processing method according to still another aspect of the present invention includes a first step of receiving an image set including a plurality of images acquired using a plurality of different observation lights, and a recognizer outputting a recognition result for the image set. And a third step in which the display control unit causes the display unit to display the observation image and the recognition result calculated using a part of the plurality of images or the plurality of images. To the third step are repeatedly executed.

In the image processing method according to still another aspect of the present invention, in the second step, the recognizing device having the trained model learned from the learning image set and the correct data receives the learning each time the recognition image set is received. It is preferable to output a recognition result based on the completed model.

において In the image processing method according to still another aspect of the present invention, it is preferable that the learned model is configured by a convolutional neural network.

In the image processing method according to still another aspect of the present invention, the plurality of images include a first endoscope image and a second endoscope image acquired using observation light different from the first endoscope image. It is preferred to include.

In the image processing method according to still another aspect of the present invention, the first endoscope image is a normal light image captured with normal light, and the second endoscope image is a special light image captured with special light. It is preferably an image.

According to the present invention, recognition of an image set is performed based on an image set including a plurality of images sequentially acquired using a plurality of different observation lights, so that recognition accuracy can be improved. In addition, by displaying the recognition result on the display unit together with the observation images obtained from the plurality of images, the recognition result can be appropriately presented.

FIG. 1 is a perspective view showing an appearance of an endoscope system 10 according to the present invention. FIG. 2 is a block diagram illustrating an electrical configuration of the endoscope system 10. FIG. 3 is a diagram illustrating an example of a multi-frame image and an image set mainly captured in the multi-frame shooting mode. FIG. 4 is a schematic diagram showing a typical configuration example of a convolutional neural network which is one of the learning models constituting the recognizer 15. FIG. 5 is a schematic diagram showing a configuration example of the intermediate layer 15B of the CNN 15 shown in FIG. FIG. 6 is a block diagram showing a main configuration used for explaining the operation of the endoscope system 10 according to the present invention. FIG. 7 is a diagram illustrating an example of an image set including an R image, a G image, a B image, and a V image captured in a frame sequential manner. FIG. 8 is a flowchart showing an embodiment of the image processing method according to the present invention.

Hereinafter, preferred embodiments of an image processing apparatus, an image processing method, and an endoscope system according to the present invention will be described with reference to the accompanying drawings.

[Overall configuration of endoscope system]
FIG. 1 is a perspective view showing an appearance of an endoscope system 10 according to the present invention.

As shown in FIG. 1, an endoscope system 10 mainly includes an endoscope (here, a flexible endoscope) 11 for imaging an observation target in a subject, a light source device 12, an endoscope processor 13, It comprises a display unit (display) 14 such as a liquid crystal monitor and a recognizer 15.

The light source device 12 supplies the endoscope 11 with various kinds of observation light such as white light for capturing a normal light image and light in a specific wavelength band for capturing a special light image.

The endoscope processor 13 has an image processing function of generating image data of a normal light image for display / recording, a special light image, or an image for observation based on an image signal obtained by the endoscope 11, and a light source. It has a function of controlling the device 12, a function of displaying a normal image or an image for observation, and a function of displaying a recognition result by the recognizer 15 on the display 14. Although the details of the recognizer 15 will be described later, the endoscope processor 13 accepts an endoscopic image, detects the position of a region of interest (lesion, surgical scar, treatment scar, treatment tool, etc.) with respect to the endoscopic image, and detects the lesion. It is a part that performs recognition such as discrimination of the type.

The display 14 displays a normal image, a special light image or an image for observation, and a recognition result by the recognizer 15 based on display image data input from the endoscope processor 13.

The endoscope 11 is connected to a flexible insertion portion 16 to be inserted into a subject and a base end of the insertion portion 16, and is used for gripping the endoscope 11 and operating the insertion portion 16. And a universal cord 18 that connects the hand operation unit 17 to the light source device 12 and the endoscope processor 13.

(4) The illumination lens 42, the objective lens 44, the imaging element 45, and the like are built in the insertion portion tip 16a, which is the tip of the insertion portion 16 (see FIG. 2). A bending portion 16b that can freely bend is connected to the rear end of the insertion portion front end portion 16a. Further, a flexible tube portion 16c having flexibility is connected to the rear end of the curved portion 16b.

The hand operation unit 17 is provided with an angle knob 21, an operation button 22, a forceps inlet 23, and the like. The angle knob 21 is rotated when adjusting the bending direction and the bending amount of the bending portion 16b. The operation button 22 is used for various operations such as air supply / water supply and suction. The forceps inlet 23 communicates with a forceps channel in the insertion section 16. Further, the hand operation unit 17 is provided with an endoscope operation unit 46 (see FIG. 2) for performing various settings.

The universal cord 18 incorporates an air / water channel, a signal cable, a light guide, and the like. The distal end of the universal cord 18 is provided with a connector 25a connected to the light source device 12 and a connector 25b connected to the endoscope processor 13. Thereby, observation light is supplied from the light source device 12 to the endoscope scope 11 via the connector section 25a, and an image signal obtained by the endoscope scope 11 is input to the endoscope processor 13 via the connector section 25b. Is done.

The light source device 12 is provided with a light source operation unit 12a such as a power button, a lighting button for turning on the light source, and a brightness adjustment button. The endoscope processor 13 includes a power button, a mouse (not shown), and the like. Is provided with a processor operation unit 13a including an input unit for receiving an input from the pointing device. Although the endoscope processor 13 and the light source device 12 of this example are of a separate type, the endoscope processor may be of a type with a built-in light source device.

[Electrical configuration of endoscope system]
FIG. 2 is a block diagram illustrating an electrical configuration of the endoscope system 10.

As shown in FIG. 2, the endoscope scope 11 is roughly divided into a light guide 40, an illumination lens 42, an objective lens 44, an image sensor 45, an endoscope operation unit 46, and an endoscope control unit 47. , ROM (Read Only Memory) 48.

(4) The light guide 40 uses a large-diameter optical fiber, a bundle fiber, or the like. The light guide 40 has an incident end inserted into the light source device 12 via the connector portion 25a, and an emission end thereof passes through the insertion portion 16 and faces an illumination lens 42 provided in the insertion portion distal end portion 16a. ing. The illumination light supplied from the light source device 12 to the light guide 40 is applied to the observation target through the illumination lens 42. Then, the illumination light reflected and / or scattered by the observation target enters the objective lens 44.

(4) The objective lens 44 forms reflected light or scattered light (that is, an optical image of an observation target) of the incident illumination light on the imaging surface of the imaging element 45.

The image sensor 45 is a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) type image sensor, and is positioned and fixed relatively to the objective lens 44 at a position deeper than the objective lens 44. A plurality of pixels constituted by a plurality of photoelectric conversion elements (photodiodes) for photoelectrically converting an optical image are two-dimensionally arranged on an imaging surface of the imaging element 45. Further, red (R), green (G), and blue (B) color filters are arranged for each pixel on the incident surface side of the plurality of pixels of the image sensor 45 of the present example, whereby the R pixel and the G pixel are arranged. , B pixels. The filter array of the RGB color filters is generally a Bayer array, but is not limited to this.

The imaging element 45 converts the optical image formed by the objective lens 44 into an electric image signal and outputs it to the endoscope processor 13.

When the image sensor 45 is a CMOS type, an A / D (Analog / Digital) converter is built in, and a digital image signal is directly output from the image sensor 45 to the endoscope processor 13. You. When the image sensor 45 is of a CCD type, an image signal output from the image sensor 45 is converted into a digital image signal by an A / D converter or the like (not shown). Is output.

The endoscope operation unit 46 has a shooting mode setting unit that sets any one of a still image shooting button (not shown), a normal light image shooting mode, a special light image shooting mode, and a multi-frame shooting mode. ing. Note that the photographing mode setting unit may be provided in the processor operation unit 13a of the endoscope processor 13.

(4) The endoscope control unit 47 sequentially executes various programs and data read from the ROM 48 or the like in accordance with an operation on the endoscope operation unit 46, and mainly controls the driving of the imaging element 45. For example, in the case of the normal light image capturing mode, the endoscope control unit 47 controls the image sensor 45 to read out the signals of the R pixel, the G pixel, and the B pixel of the image sensor 45, and performs the special light image capturing mode or the multi In the frame photographing mode, when the V-LED 32a emits purple light as the observation light in order to acquire a specific special light image, or when the B-LED 32b emits blue light, these purple lights are emitted. Only the signal of the B pixel of the image sensor 45 having the spectral sensitivity in the wavelength band of light and blue light is read out, or one of the three color pixels of the R pixel, the G pixel, and the B pixel, or 2 The image sensor 45 is controlled to read out one color pixel.

The endoscope control unit 47 communicates with the processor control unit 61 of the endoscope processor 13, and operates the endoscope scope 11 stored in the ROM 48 and the operation information of the endoscope operation unit 46. The identification information for identifying the type is transmitted to the endoscope processor 13.

The light source device 12 has a light source control unit 31 and a light source unit 32. The light source control unit 31 performs communication between the control of the light source unit 32 and the processor control unit 61 of the endoscope processor 13 to exchange various information.

The light source unit 32 has, for example, a plurality of semiconductor light sources. In the present embodiment, the light source unit 32 includes a V-LED (Violet Light Emitting Diode) 32a, a B-LED (Blue Light Emitting Diode) 32b, a G-LED (Green Light Emitting Diode) 32c, and an R-LED (Red Light). Emitting (Diode) 32d has four color LEDs. The V-LED 32a, the B-LED 32b, the G-LED 32c, and the R-LED 32d are, for example, observation lights having peak wavelengths at 410 nm, 450 nm, 530 nm, and 615 nm, respectively, and include violet (V) light, blue (B) light, It is a semiconductor light source that emits green (G) light and red (R) light.

(4) The light source control unit 31 individually controls the on / off of the four LEDs of the light source unit 32, the light emission amount at the time of lighting, and the like for each LED according to the shooting mode set by the shooting mode setting unit. In the case of the normal light image capturing mode, the light source control unit 31 turns on all of the V-LED 32a, the B-LED 32b, the G-LED 32c, and the R-LED 32d. Therefore, in the normal light image capturing mode, white light including V light, B light, G light, and R light is used as observation light.

On the other hand, in the special light image capturing mode, the light source control unit 31 turns on any one of the V-LED 32a, the B-LED 32b, the G-LED 32c, and the R-LED 32d, or a plurality of light sources appropriately combined. When a plurality of light sources are turned on or off, the light emission amount (light amount ratio) of each light source is controlled, whereby images of a plurality of layers having different depths of the subject can be captured.

The multi-frame shooting mode is a shooting mode in which a normal light image and one or more special light images are switched and photographed for each frame, or two or more special light images are switched and photographed for each frame. In the case of the photographing mode, the light source control unit 31 causes the light source unit 32 to emit different observation light for each frame.

Light of each color emitted from each of the LEDs 32a to 32d is incident on a light guide 40 inserted into the endoscope 11 through an optical path coupling portion formed by a dichroic mirror, a lens, and the like, and a diaphragm mechanism (not shown). Is done.

The observation light of the light source device 12 may be white light (light in a white wavelength band or light in a plurality of wavelength bands), light having a peak in one or more specific wavelength bands (special light), or a light having a peak in one or more specific wavelength bands. Light of various wavelength bands according to the observation purpose, such as a combination, is selected.

第 A first example of the specific wavelength band is, for example, a blue band or a green band in a visible region. The wavelength band of the first example includes a wavelength band of 390 nm to 450 nm or 530 nm to 550 nm, and the light of the first example has a peak wavelength in the wavelength band of 390 nm to 450 nm or 530 nm to 550 nm. .

第 The second example of the specific wavelength band is, for example, a red band in a visible region. The wavelength band of the second example includes a wavelength band of 585 nm to 615 nm or 610 nm to 730 nm, and the light of the second example has a peak wavelength in the wavelength band of 585 nm to 615 nm or 610 nm to 730 nm. .

The third example of the specific wavelength band includes a wavelength band in which the extinction coefficient differs between oxyhemoglobin and reduced hemoglobin, and the light of the third example has a peak wavelength in a wavelength band in which the extinction coefficient differs between oxyhemoglobin and reduced hemoglobin. Having. The wavelength band of the third example includes 400 ± 10 nm, 440 ± 10 nm, 470 ± 10 nm, or a wavelength band of 600 nm or more and 750 nm or less, and the light of the third example includes the above 400 ± 10 nm, 440 ± 10 nm, 470 nm. It has a peak wavelength in a wavelength band of ± 10 nm, or 600 nm to 750 nm.

The fourth example of the specific wavelength band is a wavelength band (390 nm to 470 nm) of excitation light used for observation of fluorescence emitted from a fluorescent substance in a living body (fluorescence observation) and for exciting this fluorescent substance.

The fifth example of the specific wavelength band is a wavelength band of infrared light. The wavelength band of the fifth example includes a wavelength band of 790 nm to 820 nm or 905 nm to 970 nm, and the light of the fifth example has a peak wavelength in a wavelength band of 790 nm to 820 nm or 905 nm to 970 nm.

The endoscope processor 13 includes a processor operation unit 13a, a processor control unit 61, a ROM 62, a digital signal processing circuit (DSP: Digital Signal Processor) 63, an image processing unit 65, a display control unit 66, a storage unit 67, and the like. ing.

The processor operation unit 13a includes a power button, an input unit for receiving inputs such as a coordinate position indicated on the screen of the display unit 14 by a mouse and a click (execution instruction).

The processor control unit 61 reads out necessary programs and data from the ROM 62 according to the operation information at the processor operation unit 13a and the operation information at the endoscope operation unit 46 received via the endoscope control unit 47, By performing the sequential processing, each part of the endoscope processor 13 is controlled, and the light source device 12 is controlled. The processor control unit 61 may receive a necessary instruction input from another external device such as a keyboard connected via an interface (not shown).

The DSP 63, which functions as one mode of an image acquisition unit that acquires image data of each frame of a moving image output from the endoscope 11 (the imaging element 45), is controlled by the processor control unit 61. Performs various signal processing such as defect correction processing, offset processing, white balance correction, gamma correction, and demosaicing processing (also referred to as “simultaneous processing”) on image data for one frame of a moving image input from Image data for one frame is generated.

The image processing unit 65 receives image data from the DSP 63, performs image processing such as color conversion processing, color emphasis processing, and structure emphasis processing on the input image data as necessary, to capture the observation target. Image data representing an endoscope image is generated. The color conversion process is a process of performing color conversion on image data by 3 × 3 matrix processing, gradation conversion processing, three-dimensional lookup table processing, or the like. The color emphasis process is a process of emphasizing the color of the image data that has been subjected to the color conversion process, for example, in a direction that makes a difference in the color of blood vessels and mucous membranes. The structure emphasis process is a process of emphasizing a specific tissue or structure included in an observation target such as a blood vessel or a pit pattern, and is performed on image data after the color emphasis process.

The image data of each frame of the moving image processed by the image processing unit 65 is recorded in the storage unit 67 as a still image or a moving image instructed to be shot when a shooting instruction of a still image or a moving image is issued.

The display control unit 66 generates display data for displaying the normal light image or the special light image on the display unit 14 based on the image data input from the image processing unit 65, and displays the generated display data on the display unit 14. And the display device 14 displays a display image (such as a moving image captured by the endoscope 11).

In the case of the multi-frame shooting mode, when a plurality of images sequentially obtained using different observation lights are sequentially displayed as they are, the appearance changes and the image flickers. Therefore, the display control unit 66 sets any one of the plurality of images. (A part of the image) is displayed on the display unit 14, or the observation image calculated by the image processing unit 65 using the plurality of images is displayed on the display unit 14.

(4) The display controller 66 causes the display 14 to display a recognition result input from the recognizer 15 via the image processor 65 or a recognition result input from the recognizer 15.

When the target area is detected by the recognizer 15, the display control unit 66 displays an index indicating the target area so as to be superimposed on the image displayed on the display 14. For example, highlighting such as changing the color of the region of interest in the display image, displaying a marker, and displaying a bounding box can be considered as indices.

The display control unit 66 can display information indicating the presence or absence of the attention area based on the detection result of the attention area by the recognizer 15 so as not to overlap the image displayed on the display 14. The information indicating the presence / absence of the attention area includes, for example, changing the color of the frame of the endoscope image between the case where the attention area is detected and the case where the attention area is not detected, and the method of displaying the text “attention area exists!” A mode in which the image is displayed in a display area different from the endoscope image is considered.

{Circle around (4)} When the discriminator 15 performs discrimination regarding a lesion, the display controller 66 causes the display 14 to display the discrimination result. As a display method of the discrimination result, for example, display of text indicating the detection result on a display image of the display unit 14 or the like can be considered. The display of the text need not be on the display image, and is not particularly limited as long as the correspondence between the text and the display image is understood.

[Recognizer 15]
Next, the recognizer 15 according to the present invention will be described.

The recognizer 15 receives the image after the image processing by the endoscope processor 13. First, the recognition image received by the recognizer 15 will be described.

認識 The recognizer 15 of this example is applied when the multi-frame shooting mode is set.

When the multi-frame shooting mode is set, the light source device 12 outputs white light including violet light, blue light, green light, and red light, and V-LED 32a, B-LED 32b, G-LED 32c, and R-LED 32d. Light (special light) of one or a plurality of specific wavelength bands whose lighting is controlled is sequentially generated, and the endoscope processor 13 outputs an image (normal light image) under white light from the endoscope 11 and a special light. Images under light (special light images) are sequentially acquired.

In the multi-frame photographing mode of this example, as shown in FIG. 3, a normal light image (WL (White @ Light) image) as the first endoscope image and two types of special light images as the second endoscope image. (BLI (Blue Light Imaging or Blue LASER Imaging) image) and LCI (Linked Color Imaging) image are sequentially switched for each frame and repeatedly acquired.

Here, the BLI image and the LCI image are images captured with the BLI observation light and the LCI observation light, respectively.

The observation light for BLI is observation light in which the ratio of V light having a high absorptivity in surface blood vessels is high and the ratio of G light having a high absorptivity in middle blood vessels is suppressed. It is suitable for generating an image (BLI image) suitable for structure enhancement.

Further, the observation light for LCI has a higher ratio of the V light than the observation light for WL, and is an observation light suitable for capturing a minute change in color tone as compared with the observation light for WL. This is an image that has been subjected to a color enhancement process using the signal of the R component so that the reddish color becomes redder and the whitish color becomes more white, centering on the color near the mucous membrane.

The recognizer 15 receives an image set Sa including a plurality of images (in this example, a WL image, a BLI image, and an LCI image) sequentially acquired by the endoscope processor 13 as images for recognition.

Since the WL image, the BLI image, and the LCI image are color images, they each have an R image, a G image, and a B image (three color channels). Therefore, the image set Sa input by the recognizer 15 is an image having 9 (= 3 × 3) channels.

The recognizer 15 sequentially inputs the image sets Sa. Since the image set Sa is composed of three consecutive frames (WL image, BLI image, and LCI image) in chronological order, each image set Sa Is equivalent to three frames of each frame imaged in the multi-frame shooting mode. That is, the time interval between the image set Sa at the time t _{n and} the image set Sa at the previous time t _{n−1 received by} the recognizing unit 15 is equivalent to three frames of each frame captured in the multi-frame shooting mode. Equivalent to time.

FIG. 4 is a schematic diagram showing a typical configuration example of a convolutional neural network (CNN: Convolutional Neural Network), which is one of the learning models constituting the recognizer 15.

The CNN 15 is, for example, a learning model for detecting the position of a region of interest (lesion, surgical mark, treatment mark, treatment tool, etc.) in the endoscope image and discriminating the type of lesion, and has a plurality of layer structures. , And holds a plurality of weight parameters. The CNN 15 becomes a trained model when the weight parameter is set to the optimum value, and functions as a recognizer.

As shown in FIG. 4, the CNN 15 includes an input layer 15A, an intermediate layer 15B having a plurality of convolutional layers and a plurality of pooling layers, and an output layer 15C. In each layer, a plurality of "nodes" are connected by "edges". The structure is

The CNN 15 of the present example is a learning model for performing segmentation for recognizing the position of a region of interest in an endoscope image. A full-layer convolution network (FCN), which is a type of CNN, is applied to the CNN 15. The position of the attention area in the endoscope image can be grasped at the pixel level.

認識 The image set Sa for recognition (FIG. 3) is input to the input layer 15A.

The intermediate layer 15B is a part for extracting features from the image set Sa input from the input layer 15A. The convolutional layer in the intermediate layer 15B performs a filtering process on a nearby node in the image set Sa or the previous layer (performs a convolution operation using a filter) to obtain a “feature map”. The pooling layer reduces (or enlarges) the feature map output from the convolutional layer to create a new feature map. The “convolution layer” has a role of extracting features such as edge extraction from an image, and the “pooling layer” has a role of providing robustness so that the extracted features are not affected by translation or the like. The intermediate layer 15B is not limited to the case where the convolutional layer and the pooling layer are set as one set, but may include a case where the convolutional layer is continuous or a normalization layer.

The output layer 15C is a part that outputs a recognition result for detecting the position of the attention area in the endoscope image and classifying (discriminating) the type of lesion based on the features extracted by the intermediate layer 15B.

The CNN 15 is learned from a large number of sets of a learning image set Sa and correct data for the image set Sa. The coefficients and offset values of the filter applied to each convolutional layer of the CNN 15 are determined by learning. Is set to the optimal value by the data set for Here, the correct answer data is preferably a region of interest or a discrimination result specified by a doctor with respect to an endoscopic image (in this example, at least one image of the image set Sa).

FIG. 5 is a schematic diagram showing a configuration example of the intermediate layer 15B of the CNN 15 shown in FIG.

The convolution layer of the first (1st) of the region of interest, and the image set Sa for recognition, the convolution operation of the filter F ₁ is performed. Here, the image set Sa is N (N-channel) images having an image size of H vertically and W horizontally. In this example, as shown in FIG. 3, the image set Sa is an image of 9 channels.

Filter F ₁ of this are image set Sa and convolution operation, since the image set S is N-channel (N sheets), for example, in the case of the filter size 5, the filter size will filter 5 × 5 × N.

The convolution operation using the filter F _1, "feature map" is created in one single channel to the filter F ₁ (1 sheet). In the example shown in FIG. 5, the use of the M filter F _1, "feature map" of M channels is generated.

Filter F ₁ used in the second convolution layer, for example, in the case of size 3 filter, the filter size will filter 3 × 3 × M.

The size of the “feature map” in the n-th convolutional layer is smaller than the size of the “feature map” in the second convolutional layer because the downscaling is performed by the convolutional layers up to the previous stage.

{Circle around (1)} The convolutional layer in the first half of the intermediate layer 15B is responsible for extraction of feature values, and the convolutional layer in the second half is responsible for segmentation of the object (region of interest). In the convolution layer in the latter half, upscaling is performed, and in the last convolution layer, one “feature map” having the same size as the input image set Sa is obtained. The output layer 15C (FIG. 4) of the CNN 15 grasps the position of the attention area in the image of the image set Sa at the pixel level by using the “feature map” obtained from the intermediate layer 15B. That is, it is possible to detect whether each pixel of the endoscope image belongs to the attention area and output the detection result.

According to the present embodiment, a plurality of images (WL) sequentially acquired in the multi-frame shooting mode are compared with a case where recognition is performed using any one (one type) of the WL image, the BLI image, and the LCI image. Image, a BLI image, and an LCI image), the recognition accuracy can be improved.

Further, the CNN 15 of the present example recognizes the position of the attention area in the endoscope image, but the recognizer (CNN) according to the present invention is not limited to this, and performs discrimination regarding a lesion. Output the discrimination result. For example, the recognizer classifies the endoscope image into three categories of “neoplastic”, “non-neoplastic”, and “other”, and discriminates the result as “neoplastic”, “non-neoplastic”, and “other” May be output as three scores (the total of the three scores is 100%), or if the three scores can be clearly classified, a classification result may be output. Further, in the case of a CNN that outputs such a discrimination result, a CNN having a fully connected layer as the last one or more layers of the intermediate layer is preferable instead of the full-layer convolutional network (FCN).

[Operation of endoscope system]
FIG. 6 is a block diagram showing a main configuration used for explaining the operation of the endoscope system 10 according to the present invention.

From the V-LED 32a, B-LED 32b, G-LED 32c, and R-LED 32d of the light source unit 32, observation light (V light, B light, G light, and R light) having different peak wavelengths are respectively transmitted to the light guide 40. The subject 20 is irradiated via the. Since the V light, the B light, the G light, and the R light respectively reach a plurality of layers at different depths of the subject 20, images of the subject 20 at different depths can be captured by these observation lights.

As described with reference to FIG. 3, in the multi-frame photographing mode, a plurality of different observation lights (for example, first observation light for WL, second observation light for BLI, and third observation light for LCI) are used for WL. An image, a BLI image, and an LCI image are sequentially acquired, but the observation light for WL, BLI, and LCI has different light intensity ratios of V light, B light, G light, and R light as described above. It is.

(5) In the endoscope 11, a WL image, a BLI image, and an LCI image are sequentially and repeatedly captured by irradiation of a plurality of different observation lights. Since the WL image, the BLI image, and the LCI image are each a color image, the endoscope processor 13 generates an RGB three-channel WL image, a BLI image, and an LCI image.

The recognizer 15 receives an image set Sa (images of 9 channels in total) including a WL image, a BLI image, and an LCI image as images for recognition.

The recognizer 15 detects the position of the region of interest (in this example, the lesion region) in the endoscope image, and outputs position information (recognition result) indicating the lesion region to the endoscope processor 13.

The image processing unit 65 of the endoscope processor 13 generates a WL image, a BLI image, and an LCI image from an image signal input from the endoscope 11 and also generates an observation image. As the observation image, a part of a plurality of images (for example, a WL image among a WL image, a BLI image, and an LCI image) may be used as the observation image, or an image calculated using the plurality of images (the WL image) , An image obtained by combining two or more images of the BLI image and the LCI image) may be used as the observation image. In addition, if a plurality of images sequentially acquired using different observation lights are sequentially displayed as they are as observation images, the appearance changes and flickers. Therefore, the observation images are preferably one type of image.

The display control unit 66 inputs the observation image from the image processing unit 65, inputs the position information indicating the lesion area from the recognizing device 15, and causes the display device 14 to display the observation image and the recognition result.

In this example, the display control unit 66 displays the observation image 26 on the display device 14 and performs an emphasis process for emphasizing the recognized attention area (lesion area). In the highlighting process by the display control unit 66, the lesion area is highlighted by superimposing and displaying the index 28 indicating the lesion area on the observation image 26 displayed on the display unit 14. Here, as the display of the index 28, in addition to highlighting such as changing the color of the lesion area, display of a boundary line indicating the outline of the lesion area, display of a marker indicating the lesion area, and display of a bounding box can be considered.

In this way, by superimposing and displaying the index 28 indicating the attention area on the observation image 26 displayed on the display unit 14, it is possible to support the inspection so that the attention area is not overlooked.

The recognizer 15 of the present example recognizes the position of the attention area in the endoscope image. However, the present invention is not limited to this, and the recognizer 15 may execute a discrimination regarding a lesion and output a discrimination result. . As a display method of the discrimination result, for example, a method of displaying text indicating the discrimination result on the image of the display unit 14 is conceivable. The display position of the text need not be on the image, and may be a window different from the image as long as the correspondence between the text and the image is understood, and is not particularly limited.

[Another embodiment of multi-frame shooting]
When a color endoscope image is obtained by an endoscope having a monochrome image sensor without a color filter instead of the image sensor 45 (color image sensor), the subject is observed with different colors of observation light. The image is sequentially illuminated and an image is taken for each observation light (image is taken in a plane sequence).

For example, by sequentially emitting different colors of observation light (R light, G light, B light, and V light) from the light source unit 32, the monochrome image sensor converts the light into R light, G light, B light, and V light. The R, G, B, and V images of the corresponding colors are picked up in a frame-sequential manner.

FIG. 7 is a diagram showing an example of an image set including an R image, a G image, a B image, and a V image, which are imaged in a frame sequential manner.

The endoscope processor 13 includes a plurality of images (R image, G image, B image, and V image) sequentially acquired using a plurality of different observation lights (R light, G light, B light, and V light). , An observation image such as a WL image, a BLI image, and an LCI image can be generated. These observation images can be generated by adjusting the synthesis ratio of the R, G, B, and V images.

{Circle around (4)} An image set may include an image obtained by multiplying at least two images among the R image, the G image, the B image, and the V image by a preset coefficient and synthesizing (four arithmetic operations). For example, an image obtained by dividing each pixel by an image (V image) having a center wavelength of 410 nm and an image (B image) having a center wavelength of 450 nm, or an image (V image) having a center wavelength of 410 nm and an image (B image) having a center wavelength of 450 nm An image obtained by multiplying each pixel by (image) may be used.

The recognizer 15 can receive the WL image, the BLI image, and the LCI image generated by the endoscope processor 13 as an image set Sb, and return a recognition result for the endoscope image to the endoscope processor 13.

The recognizer 15 of this example accepts an image set Sa (images of 9 channels in total) including a WL image, a BLI image, and an LCI image as an image for recognition, but is not limited thereto. An image set including a G image, a B image, and a V image may be accepted, and a recognition result for an endoscopic image may be output.

[Image processing method]
FIG. 8 is a flowchart showing an embodiment of the image processing method according to the present invention, and shows a processing procedure of each unit of the endoscope system 10 shown in FIG.

In FIG. 8, the multi-frame shooting mode is set, and the endoscope 11 sequentially captures multi-frame images using a plurality of different observation lights (step S10).

(4) The endoscope processor 13 acquires an image set constituting a multi-frame image captured by the endoscope 11 (step S12, first step).

The image set includes a WL image, a BLI image, and an LCI image captured by observation light for WL, BLI, and LCI captured by the endoscope 11, or an R image and a G image captured in a frame-sequential manner. , B image, and WL image, BLI image, and LCI image generated from the V image are conceivable, but the special light image may be any one of the BLI image and the LCI image, May be a special light image captured by the special light. Further, the image set does not include the WL image (normal light image) but includes two or more special light images including the first special light image captured by the first special light and the second special light image captured by the second special light. It may be an optical image. In short, any image set may be used as long as it is an image set including a plurality of images sequentially acquired using a plurality of different observation lights, and may be any image set.

(4) The image processing unit 65 of the endoscope processor 13 generates an observation image based on the acquired image set (Step S14). The observation image is an image calculated using a part of a plurality of images (for example, a WL image among a WL image, a BLI image, and an LCI image) or a plurality of images.

On the other hand, the recognizer 15 performs the position detection of the attention area shown in the endoscope image, the discrimination of the type of the lesion, and the like based on the image set received via the endoscope processor 13 and outputs the recognition result. (Step S16, second step).

Then, the display controller 66 causes the display 14 to display the generated observation image and the recognition result obtained by the recognizer 15 (step S18, third step).

Subsequently, it is determined whether or not the imaging of the multi-frame image is to be ended (step S20). If the imaging of the multi-frame image is to be continued (in the case of “No”), the process transits to step S10 and proceeds to step S10. To step S20 are repeatedly performed. Thereby, the observation image is displayed as a moving image, and the recognition result of the recognizer 15 is also continuously displayed.

場合 When the imaging of the multi-frame image is completed (in the case of “Yes”), the present process is terminated.

[Others]
In the present embodiment, the endoscope system 10 including the endoscope scope 11 and the like has been described. However, the present invention is not limited to the endoscope system 10, but includes the endoscope processor 13 and the recognizer 15. An image processing device may be used. In this case, the endoscope processor 13 and the recognizer 15 may be integrated or may be separate.

Further, the different observation lights are not limited to those emitted from the four-color LEDs. For example, a blue laser diode that emits blue laser light having a center wavelength of 445 nm and a blue-violet laser diode that emits blue-violet laser light having a center wavelength of 405 nm May be used as a light source, and the laser light of the blue laser diode and the blue-violet laser diode may be emitted to a YAG (Yttrium Aluminum Aluminum Garnet) based phosphor to emit light. By irradiating the phosphor with a blue laser beam, the phosphor is excited to emit broadband fluorescence, and a part of the blue laser beam passes through the phosphor as it is. The blue-violet laser light is transmitted without exciting the phosphor. Therefore, by adjusting the intensity of the blue laser light and the blue-violet laser light, it is possible to irradiate the observation light for WL, the observation light for BLI, and the observation light for LCI. When only light is emitted, observation light having a center wavelength of 405 nm can be emitted.

The observation image according to the present invention is not limited to a moving image, but may be a still image stored in the storage unit 67 or the like, and the recognizer may output a recognition result based on a still image set.

The recognizer is not limited to the CNN, but may be a machine learning model other than the CNN, such as a DBN (Deep Belief Network) and an SVM (Support Vector Machine).

The hardware structure of the endoscope processor 13 and / or the recognizer 15 is various processors as described below. For various processors, the circuit configuration can be changed after manufacturing such as CPU (Central Processing Unit) and FPGA (Field Programmable Gate Array), which are general-purpose processors that execute software (programs) and function as various control units. Special-purpose circuits such as a programmable logic device (Programmable Logic Device: PLD), an ASIC (Application Specific Integrated Circuit), and a dedicated electric circuit having a circuit configuration specifically designed to execute a specific process are included. It is.

One processing unit may be configured by one of these various processors, or configured by two or more processors of the same or different types (for example, a plurality of FPGAs or a combination of a CPU and an FPGA). You may. Further, a plurality of control units may be configured by one processor. As an example of configuring a plurality of control units with one processor, first, as represented by a computer such as a client or a server, one processor is configured by a combination of one or more CPUs and software. There is a form in which a processor functions as a plurality of control units. Second, as represented by a system-on-chip (SoC), a form in which a processor that realizes the functions of the entire system including a plurality of control units by one IC (Integrated Circuit) chip is used. is there. As described above, the various control units are configured by using one or more of the various processors described above as a hardware structure.

Furthermore, the present invention is not limited to the above-described embodiment, and it goes without saying that various modifications can be made without departing from the spirit of the present invention.

Reference Signs List 10 endoscope system 11 endoscope scope 12 light source device 12a light source operation unit 13 endoscope processor 13a processor operation unit 14 display 15 recognizer (CNN)

15A Input layer

15B Intermediate layer

15C Output layer 16 Insert section 16a Insert section distal end section 16b Bend section 16c Flexible tube section 17 Hand operating section 18 Universal cord 20 Subject 21 Angle knob 22 Operation button 23 Forceps inlet

25a Connector section

25b Connector section 26 Observation image 28 Index 31 Light source control unit 32 Light source unit 32a V-LED
32b B-LED
32c G-LED
32d R-LED
40 light guide 42 illumination lens 44 objective lens 45 image sensor 46 endoscope operation unit 47

endoscope control unit

48, 62 ROM
61 Processor control unit 65 Image processing unit 66 Display control unit 67 Storage unit F1 Filter S Image set S10 Step S12 Step S14 Step S16 Step S18 Step S20 Step

Claims

A recognizer that receives an image set including a plurality of images sequentially acquired using a plurality of different observation lights, and outputs a recognition result for the image set,
A display control unit that causes the display unit to display the observation image and the recognition result calculated using a part of the plurality of images or the plurality of images,
An image processing apparatus comprising:
The recognizer has a learned model learned by setting the plurality of images for learning and the correct answer data, and each time the plurality of images for recognition are received, the recognition result is based on the learned model. The image processing device according to claim 1, wherein the image processing device outputs the image data.
The image processing device according to claim 2, wherein the learned model is configured by a convolutional neural network.
The plurality of images according to any one of claims 1 to 3, wherein the plurality of images include a first endoscopic image and a second endoscopic image acquired using observation light different from the first endoscopic image. The image processing apparatus according to claim 1.
The image processing device according to claim 4, wherein the first endoscope image is a normal light image captured by normal light, and the second endoscope image is a special light image captured by special light. .
The image processing apparatus according to claim 5, wherein the special light image includes two or more special light images captured by two or more different special lights.
The first endoscope image is a first special light image captured with a first special light, and the second endoscope image is captured with a second special light different from the first special light. The image processing device according to claim 4, wherein the image is a second special light image.
The display control unit according to any one of claims 1 to 6, wherein the display control unit causes the display unit to display a part of the plurality of images or an observation image calculated using the plurality of images as a moving image. Image processing device.
The recognizer recognizes a region of interest included in the plurality of images,
9. The image processing apparatus according to claim 1, wherein the display control unit displays an index indicating the recognized area of interest in a superimposed manner on an image displayed on the display unit. 10.
The recognizer recognizes a region of interest included in the plurality of images,
The image processing apparatus according to claim 1, wherein the display control unit displays information indicating the presence or absence of the attention area so as not to overlap an image displayed on the display unit.
The recognizer performs a discrimination regarding a lesion based on the plurality of images and outputs a discrimination result,
The image processing apparatus according to claim 1, wherein the display control unit causes the display unit to display the discrimination result.
A light source device for sequentially generating a first observation light and a second observation light different from the first observation light,
An endoscope scope for capturing the plurality of images by sequentially capturing images of the observation target sequentially illuminated by the first observation light and the second observation light;
The display unit;
An image processing apparatus according to any one of claims 1 to 11, comprising:
The endoscope system, wherein the recognizer receives the image set including the plurality of images captured by the endoscope.
An endoscope processor that receives the plurality of images captured by the endoscope scope and performs image processing on the plurality of images,
The endoscope system according to claim 12, wherein the recognizer accepts the plurality of images after the endoscope processor processes the images.
A first step of receiving an image set including a plurality of images acquired using a plurality of different observation lights;
A second step in which a recognizer outputs a recognition result for the image set;
A display control unit includes a third step of displaying a part of the plurality of images or an observation image calculated using the plurality of images and the recognition result on a display unit,
An image processing method for repeatedly executing the processing from the first step to the third step.
In the second step, the recognizer having a trained model trained by the image set for learning and the correct answer data, the recognizer based on the trained model each time the image set for recognition is received. The image processing method according to claim 14, wherein the output is performed.
The image processing method according to claim 15, wherein the learned model is configured by a convolutional neural network.
The image according to any one of claims 14 to 16, wherein the plurality of images include a first endoscope image and a second endoscope image acquired using observation light different from the first endoscope image. The image processing method described in the above.
18. The image processing method according to claim 17, wherein the first endoscope image is a normal light image captured with normal light, and the second endoscope image is a special light image captured with special light. .