WO2023188169A1

WO2023188169A1 - Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program

Info

Publication number: WO2023188169A1
Application number: PCT/JP2022/016192
Authority: WO
Inventors: 祐介山本; 大伊藤; 貴行清水; 浩一新谷; 修野中; 賢一森島; 優齋藤; 学市川
Original assignee: オリンパス株式会社
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-10-05

Abstract

Provided are an endoscope, a training data candidate-image acquisition device, a training data candidate-image acquisition method, and a program that make it possible to accurately select medical images with which machine learning prepared in advance would struggle in identification. Training data candidate-image acquisition is performed using endoscope-acquired images for endoscopic inferential model learning, and comprises: comparing the content of endoscopic images obtained by imaging consecutively in time, and determining a similar image group in which the same site is observed even though non-similar images may be included over a predetermined number of frames (S3); using machine learning to infer a specific target image included in the endoscopic images obtained by imaging (S1); calculating the difference in inference results from making inferences with regard to each of the images determined to be included in the similar image group (S5); and selecting images to use for learning on the basis of the calculated difference in the inference results (S7).

Description

Endoscope, training data candidate image acquisition device, training data candidate image acquisition method, and program

The present invention relates to an endoscope used when generating an inference model for the medical field through learning, a teacher data candidate image acquisition device and a teacher data candidate image acquisition method for acquiring teacher data candidate images, and a program.

In image diagnosis in the medical field, various systems have been developed to take medical images of various anatomical structures of individual patients in order to classify and evaluate medical conditions. Known examples of these imaging systems include endoscope systems, CT (computed tomography) systems, MRI (magnetic resonance imaging) systems, X-ray systems, ultrasound systems, and PET (positron emission tomography) systems. There is. Additionally, there is a known method for realizing a lesion detection function using CADe/x (Computer-Aided Detection/Diagnosis), which is so-called computer detection/diagnosis support, by applying machine learning to annotated data by medical personnel such as doctors. .

By the way, it is generally said that a large amount of data is required to improve the performance of a classifier using machine learning as described above. Therefore, in a system that requires machine learning in a classifier, it is predicted that the amount of data handled will increase. However, a large amount of data requires a huge amount of storage capacity, and data transfer occupies network lines, and it takes a huge amount of time to check each piece of data. It is expected that "collection" will be necessary. As for this "efficient data collection," for example, it is conceivable to collect only "medical images that are difficult to identify using machine learning" as useful data. Here, as a technique for selecting useful medical images from a large amount of medical images, for example, Patent Document 1 discloses a technique for efficiently transferring only medical images in which a desired body part is photographed from among a plurality of medical images. Disclosed.

Patent No. 5048286

The technology described in Patent Document 1 mentioned above is a technology that efficiently transfers medical images in which a desired region is photographed. Therefore, it seems difficult to select "medical images that are difficult to identify by machine learning" using the technique disclosed in Patent Document 1. That is, with conventional technology, it is difficult to collect only "medical images that are difficult to identify using machine learning" as useful data.

The present invention has been made in view of the above circumstances, and provides an endoscope and training data candidate image acquisition system that makes it possible to accurately select medical images that are difficult to identify using pre-prepared machine learning. The purpose of the present invention is to provide a device, a method for acquiring training data candidate images, and a program.

In order to achieve the above object, the endoscope according to the first invention compares the contents of endoscopic images obtained by temporally consecutive imaging, and observes the same region over a predetermined number of frames. a similar image group determination unit that determines a group of similar images including dissimilar images, and an inference unit that uses machine learning to infer a specific object image included in an endoscopic image obtained by imaging. , an inference result difference calculation unit that calculates a difference in inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit; and a learning image selection unit that selects an image to be used for learning based on the difference between the above-mentioned inference results.

In the endoscope according to a second invention, in the first invention, the similar image group is an endoscope obtained by sequentially capturing a group of images corresponding to a sudden change in viewpoint position movement. Contains the results determined according to the similarity of the images before and after the mirror image.
The endoscope according to a third aspect of the invention, in the first aspect, includes an imaging information acquisition section capable of acquiring imaging information corresponding to each image selected by the learning image selection section.

In the endoscope according to a fourth aspect of the present invention, in the first aspect, the similar image group determining section digitizes a pattern of the same object in the endoscopic image into a numerical value that allows tracking of the pattern of the same object in the image, It is determined whether the images are similar using the above numerical values.
In the endoscope according to a fifth aspect of the invention, in the first aspect, the learning image selection section excludes images with extremely poor conditions.

A teacher data candidate image acquisition device according to a sixth aspect of the present invention is a teacher data candidate image acquisition device for acquiring images for inference model learning, and the teacher data candidate image acquisition device is a teacher data candidate image acquisition device for acquiring images for inference model learning. a similar image group determination unit that compares and determines a group of similar images in which the same region is observed over a predetermined number of frames, including dissimilar images; and a specific target object image included in the captured images. an inference unit that infers the above using machine learning, and an inference result difference calculation unit that calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit. and a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit.

A teaching data candidate image acquisition method according to a seventh aspect of the present invention is a teaching data candidate image acquisition method for acquiring images for inference model learning, in which the contents of images obtained by temporally continuous imaging are acquired. A machine learning model is used to compare images of specific objects included in the captured images to determine whether similar images include dissimilar images over a predetermined number of frames but which observe the same area. For each image determined to be included in the similar image group, the difference between the inference results inferred using the machine learning model is calculated, and based on the difference in the calculated inference results. , select images to use for learning.
The program according to the eighth invention has a computer that acquires teacher data candidate images for inference model learning compare the contents of images obtained by temporally consecutive images, and compares the contents of images obtained by sequentially capturing images over a predetermined number of frames. Determine the similar image group that observes the same part even if it contains similar images, infer the specific target image included in the captured image using a machine learning model, and apply it to the above similar image group. For each image determined to be included, calculate a difference in inference results inferred using the machine learning model, and select an image to be used for learning based on the calculated difference in the inference results. Execute.

A teacher data candidate image acquisition device according to a ninth invention includes an inference unit that infers an input image using a machine learning model, and an inference result change calculation unit that calculates an amount of change in an inference result of an identified image pair. , a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit.

A teacher data candidate image acquisition device according to a tenth invention, in the ninth invention, includes an image identification section that determines that the temporally continuous input pair images are substantially unchanged.
In the teaching data candidate image acquisition device according to an eleventh invention, in the tenth invention, the image identification unit detects at least one of the amount of movement of the corresponding points and the amount of change in brightness, saturation, and contrast of the image. Based on this, it is determined that the input pair image has not changed substantially.

A teacher data candidate image acquisition device according to a twelfth invention is the teacher data candidate image acquisition device according to the ninth invention, wherein the input image is a medical image.
In the teaching data candidate image acquisition device according to a thirteenth invention, in the ninth invention, the inference section infers at least one of classification, detection, and region extraction of the input image.

A teacher data candidate image acquisition device according to a fourteenth invention is based on the ninth invention, wherein the inference section outputs the reliability of the inference.
In the teacher data candidate image acquisition device according to a fifteenth invention, in the ninth invention, the inference result change calculation section uses at least one differential value of the inference result and reliability.
In the teacher data candidate image acquisition device according to a sixteenth invention, in the ninth invention, the learning image selection section selects the teacher data candidate image to be used for learning when the amount of change exceeds a specified value. select.

A teaching data candidate image acquisition method according to the seventeenth invention infers an input image using a machine learning model, calculates an amount of change in the inference result of the identified image pair, and based on the calculated amount of change. to select the training data candidate images to be used for learning.
The program according to the eighteenth invention causes a computer that acquires training data candidate images for inference model learning to infer an input image using a machine learning model, and calculates the amount of change in the inference result of the identified image pair. Then, based on the calculated amount of change, a teacher data candidate image to be used for learning is selected.

The teacher data candidate image acquisition device according to the nineteenth invention includes an inference unit that infers an input image using a machine learning model, and an inference result change calculation that calculates the amount of change in the inference results of a pair of temporally consecutive images. and a learning image selection unit that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation unit.

A teacher data candidate image acquisition device according to a twentieth invention is the nineteenth invention, wherein the input image is a medical image.
In the teacher data candidate image acquisition device according to a twenty-first invention, in the nineteenth invention, the inference section infers at least one of classification, detection, and region extraction of the input image.

A teacher data candidate image acquisition device according to a twenty-second invention is the nineteenth invention, wherein the inference section outputs the reliability of the inference.
In the teacher data candidate image acquisition device according to a twenty-third invention, in the nineteenth invention, the inference result change calculation section uses at least one differential value of the inference result and reliability.
In the teacher data candidate image acquisition device according to a twenty-fourth invention, in the nineteenth invention, the learning image selection unit selects the teacher data candidate image to be used for learning when the amount of change exceeds a specified value. select.

The training data candidate image acquisition method according to the twenty-fifth invention infers an input image using a machine learning model, calculates the amount of change in the inference result of a temporally consecutive pair of images, and calculates the amount of change that is calculated. Based on this, select teacher data candidate images to be used for learning.
A program according to the twenty-sixth invention provides a computer that acquires training data candidate images for inference model learning, infers an input image using a machine learning model, and calculates the amount of change in inference results for temporally consecutive image pairs. is calculated, and a teacher data candidate image to be used for learning is selected based on the calculated amount of change.

According to the present invention, there is provided an endoscope, a teacher data candidate image acquisition device, a teacher data candidate image acquisition method, and a program that enable machine learning prepared in advance to accurately select medical images that are difficult to identify. can be provided.

1 is a block diagram mainly showing the electrical configuration of an endoscope system according to an embodiment of the present invention. 3 is a flowchart showing an imaging operation in an endoscope system according to an embodiment of the present invention. 7 is a flowchart showing the operation of similar image group determination in the endoscope system according to an embodiment of the present invention. FIG. 3 is a diagram illustrating determination of a similar image group in an endoscope system according to an embodiment of the present invention. 12 is a flowchart illustrating a modified example of the similar image group determination operation in the endoscope system according to an embodiment of the present invention. FIG. 2 is a diagram illustrating selection of a similar image group in an endoscope system according to an embodiment of the present invention. FIG. 2 is a diagram illustrating the flow of data between blocks in an endoscope system according to an embodiment of the present invention. FIG. 2 is a diagram showing an example of a data structure in a recording unit of an endoscope system according to an embodiment of the present invention. It is a figure which shows the modification of the structure of the data in the recording part of the endoscope system based on one Embodiment of this invention.

The present invention can be applied to a device that acquires candidate images that serve as training data for learning, for example, when creating or improving an inference model using image data based on an imaging signal such as an endoscope, and maintaining performance. The explanation will be based on the case where it is applied. Equipment such as endoscopes are used by professionals such as doctors, and they approach an object, illuminate it as necessary, and display the continuously captured image data as an image on a display device. It is possible to display. While viewing this image visually or with the aid of an inference model, the expert carefully observes areas of concern. Therefore, when observing the same object, it is possible to obtain a plurality of similar image frames from sequentially obtained image frames. The degree of image change can be determined by digitizing the content of images obtained by temporally continuous imaging and comparing the numeric values for each image, or by digitizing differences in image data. Embodiments according to the present invention include a similar image group determining unit that determines a similar image group in which the image change is less than or equal to a predetermined value, and can determine whether the same object is being observed.

Further, in an embodiment of the present invention, an inference unit inputs each sequentially obtained image frame into machine learning and performs inference, and an inference unit performs inference on each image determined to be included in a similar image group by a similar image determination unit. It has a determination unit that determines the difference between the results. With this configuration, if a difference occurs in the inference results even though the same object is observed, by analyzing the cause of the difference, it is possible to analyze the situation in which the inference section is weak. becomes. In other words, if an image to be used for learning is selected based on the difference in the inference results, the selected image becomes a clue for generating an effective improved inference model. For this reason, in one embodiment of the present invention, a learning image selection section is provided.

Next, an embodiment in which the present invention is applied to an endoscope system will be described using FIGS. 1 to 9. This endoscope system functions as a teacher data candidate image acquisition device (endoscope) for acquiring images for inference model learning. FIG. 1 is a block diagram mainly showing the electrical configuration of the endoscope system. This endoscope system includes an endoscope device 10 and a learning device 30. However, each block in the endoscope device 10 does not need to be provided in a single device, and may be divided into a plurality of devices. For example, the imaging unit 11 and each other block may be provided in separate devices. In this case, each unit other than the imaging unit 11 may be connectable to an intranet within a hospital or the like.

Additionally, the endoscope system may be linked with an in-hospital system, an electronic medical record system, etc. If images and the like recorded in the recording unit 18, which will be described later, have confidentiality obligations, etc., the endoscope device 10 may not output them to the learning device 30, for example. Furthermore, if the learning device 30 is located within a hospital, or if the contract allows data exchange, etc., the learning device 30 may be an in-hospital system. However, although the following explanation assumes a situation where an organization outside the hospital is developing an inference model using a learning device, the present invention is not limited to this. Furthermore, each block within the learning device 30 may be divided into a plurality of devices, and the learning section 34 may be located within the server.

The endoscope apparatus 10 includes an imaging section 11, an image processing section 12, a display section 12a, an inference section 13, a difference determination section 14, an image determination section 15, a situation determination section 16, an information association section 17, a recording section 18, and a communication section. 19, and a control section 20. As mentioned above, each block in the endoscope device 10 may be separated from the endoscope device 10 and used as another device. An example is shown that is equipped with the same hardware as the device. However, some functional blocks can be omitted if the roles are shared, such as by the endoscope device 10 cooperating with the learning device 30. The endoscope device 10 includes an insertion portion made of a long cylindrical tube for inserting into a cavity or a tubular object to observe the inside, but this insertion portion is flexible. There are things we have and things we don't have. Further, an imaging section 11 is often provided at the distal end of the insertion section.

The imaging unit 11 is assumed to have an imaging unit, a light source unit, an operation unit, a treatment unit, and the like. The imaging unit includes an optical lens for imaging, an imaging element, an imaging control circuit, and the like. Furthermore, if the camera has an automatic focus adjustment function, it includes a focus detection circuit, an automatic focus adjustment device, and the like. Optical lenses form optical images of objects. The image sensor is arranged near a position where an optical image is formed, converts the optical image into an image signal, performs AD conversion on this image signal, and then outputs it to the image processing section 12. When the imaging control circuit receives an instruction to start imaging from the control unit 20, it performs readout control of image signals from the imaging device at a predetermined rate.

Furthermore, the light source section in the imaging section 11 provides illumination to brighten the walls of the digestive tract in the body cavity, etc., in order to facilitate observation. The light source section includes a light source such as a laser light source, an LED light source, a xenon lamp, and a halogen lamp, and also has an optical lens for illumination. Since tissue detection characteristics change depending on the wavelength of the illumination light, the light source unit may have a function of changing the wavelength of the light source, and image processing may be changed using a known method in accordance with the wavelength change. Detection does not necessarily need to be performed visually by a doctor. In addition, the operation unit in the imaging unit 11 includes operation members for instructing the shooting of endoscopic still images, the start and end of shooting of endoscopic moving images, operating units that operate in connection with operations, and treatment units. It has a function execution section, etc. Further, it may include an operation member for instructing switching of the focus of the endoscopic image. Changes caused by the above operations can become parameters for image changes.

Furthermore, the operating section in the imaging section 11 has an operating angle knob for bending the tip of the insertion section of the endoscope. It also has a function execution unit for executing the function of supplying air and water into the body cavity through the flexible tube, and the function of suctioning air and liquid. In addition, the treatment section has a function execution section such as treatment tools such as biopsy forceps for performing a biopsy to collect a part of tissue, and a snare and high-frequency scalpel for removing affected areas such as polyps. It may also have a treatment tool such as the following. The function execution unit (which may be broadly classified as an operation unit) of these treatment tools and the like can be operated by an operation member that operates the function execution unit. The above operations may change the shape of the affected area or cause bleeding, and if heat or the like is used during the operation, steam, smoke, water spray, etc. may be generated. There are also jigs that are left in the body during treatment. Changes that occur when these operations are changed and changes in the state of the object accompanying the operations can be parameters for image changes. Changes in image brightness, saturation, and contrast may occur when changing operations.

The image processing section 12 has an image processing circuit, receives image signals from the imaging section 11, and performs various image processing such as development processing in accordance with instructions from the control section 20. The image processing section 12 outputs the image data subjected to image processing to the display section 12a, the inference section 13, the image determination section 15, and the communication section 19. Image processing in the image processing unit 12 includes adjustment of the color and brightness of the image, as well as enhancement processing such as contrast enhancement and edge enhancement to improve visibility, and gradation processing to create natural gradations. Furthermore, processing such as HDR (High Dynamic Range) processing and super resolution processing, which improve image quality using a plurality of image frames, may be performed. The image processing unit 12 functions as an image processing unit (image processing circuit) that processes image frame information into visible information. Changes during operation changes become parameters for image changes. Note that the image processing section 12 may be omitted from the endoscope apparatus 10 by entrusting the above-mentioned functions to the image processing section 32 in the learning device 30. However, if the endoscope device 10 requires independence as an IoT device, providing the image processing unit 12 within the endoscope device 10 increases the degree of freedom, such as being able to transmit images to the outside. be able to.

The display unit 12a has a display device such as a display monitor and a display control circuit. The display section 12a receives image data processed by the image processing section 12 according to control signals from the control section 20, and displays endoscopic images and the like. Further, an endoscopic image or the like on which the inference result by the inference unit 13 is superimposed may be displayed.

The inference unit 13 includes an inference engine, receives image data of image frames from the imaging unit 11, and performs inference. An inference model for inferring this advice is set in the inference engine. The inference engine may be configured by hardware or software. Note that the inference unit 13 may include a forward propagation neural network or the like, and this forward propagation neural network will be explained in the learning unit 34. The data input to the inference engine is not limited to the image data of the image frame, but situational information (related information, auxiliary information) at the time of obtaining the frame image may be input to perform inference. By making inferences using situational information, more reliable inference results can be obtained. Note that inference is performed frame by frame, but it is not limited to each frame, but may be performed every few frames according to requirements such as visibility of the inference results, or by inputting several consecutive frames to the inference section. You may also do this.

The inference unit 13 inputs image data and infers using an inference model generated by machine learning, thereby inferring advice for providing support to a doctor during diagnosis, for example. The inference unit 13 also specifies the object (affected part, organ, etc.) shown in the image acquired by the imaging unit 11, specifies its position, and determines the range of the affected part, etc. It is also possible to perform inferences such as specifying the image, segmenting the image, classifying it, and determining whether it is good or bad. The inference unit 13 performs inference according to instructions from the control unit 20 and outputs the inference result to the control unit 20. Furthermore, when inference is made, the inference unit 13 calculates the reliability of the inference and outputs the calculated reliability value to the difference determination unit 14 .

Depending on the inference specifications of the inference unit 13, how the inference results are used (how to display them, etc.) changes, and various methods can be used to detect differences in the inference results. For example, when inferring pass/fail judgment, the 〇/× display changes over time; when inferring location, the coordinates of a specified position on the screen change over time; When inferring a range), the shape and area of each segment within the screen change over time, so these can be used as changes in the inference result.

Note that the image input to the inference unit 13 is an image from the imaging unit 11, and here it is assumed that it is a medical image from an endoscope or the like. However, medical images are not limited to endoscopic images acquired by endoscopic devices, but can also be acquired by medical equipment such as X-ray machines, MRI machines, ultrasound diagnostic machines, dermatology cameras, dental cameras, etc. It may also be an image. When using an endoscopic image as an input image, this embodiment can deal with sudden changes in visual field that are unique to an endoscope.

Furthermore, the input image input to the inference unit 13 does not have to be limited to a medical image. For example, sudden changes in field of view can occur with industrial endoscopes and cameras mounted on robotic vehicle-mounted drones. Furthermore, an endoscope is a device that is inserted and used through a narrow insertion hole. For this reason, it has a tubular tip portion with a narrow outer diameter, and the tip portion tends to shake, and this shaking greatly affects the acquired image. When the object is close, the change becomes particularly large. Shaking the tip vertically and horizontally changes the composition, and shaking the tip in the direction of insertion leads to a change in the size of the object. Changes in the vertical and horizontal directions may cause unnecessary objects to appear in the image, and changes in relative distance may cause the size of the object to change. Further, when the image is rotated due to twisting, or when the imaging unit is not directly facing the object to be observed and is observed from an oblique angle, distortion or the like may occur in the image. These situational changes also become parameters that can change the image, and as I repeat, the image may change significantly in a complex manner. This change can occur not only in endoscopes but in many inspection devices that use images.

The inference unit 13 functions as an inference unit (inference engine) that infers a specific object image included in a captured image (an endoscopic image may be used) using a machine learning model (for example, as shown in FIG. (See S1 in FIG. 5 and the inference unit 44 in FIG. 7). Further, the inference unit 13 functions as an inference unit (inference engine) that infers an input image using a machine learning model (for example, see S1 in FIGS. 2 and 5, and the inference unit 44 in FIG. 7). The input image described above is a medical image. The above-mentioned inference unit infers at least one of classification, detection, and region extraction of the input image (for example, see S1 in FIGS. 2 and 5). The inference unit also outputs the reliability of the inference (for example, see S1 in FIGS. 2 and 5).

The difference determination unit 14 may include a difference determination circuit, or may be realized by a processor including a CPU or the like executing a program. The difference determining unit 14 calculates the difference between the results of the inference by the inference unit 13 (this difference has already been explained using an example). That is, since the inference unit 13 performs inference for each frame and outputs the inference result, the difference determination unit 14 calculates the difference between the inference results. In calculating the difference (change), the differential value of the inference result may be used. Furthermore, in calculating the difference (change), a change in the value indicating reliability calculated by the inference unit 13 may be calculated. That is, since the inference unit 13 performs inference every time image data is input and calculates the reliability of the inference result, a difference may be determined based on a change in this reliability value. Furthermore, as will be described later, since the image determining unit 15 determines whether the image belongs to a similar image group, the difference determining unit 14 determines the difference between each image included in the similar image group.

The difference determination unit 14 functions as an inference result difference calculation unit that calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit (for example, in FIGS. (See S5 in FIG. 5). To calculate the difference between the inference results, for example, at least one of a change in the position of the specific object, a change in the size, a change in the inferred range, etc. may be calculated.

The image determination unit 15 may include an image determination circuit, or may be realized by a processor including a CPU or the like executing a program. The image determination unit 15 determines whether or not the image data inputted in time series from the image processing unit 12 belongs to a similar image group. When a doctor or the like inserts the distal end of an endoscopic device in which an imaging device is arranged into a body cavity, the imaging device acquires an endoscopic image. When the distal end approaches a specific object such as an affected area, many images of the vicinity of the affected area are acquired in order for a doctor or the like to carefully observe the affected area. The image determination unit 15 may determine whether the image group includes this specific target object. Note that the specific object is an object to be observed or inspected with an endoscope or the like, and the specific object image refers to the portion of the image of the specific object within the screen. Various objects appear in an endoscopic image, and among these objects, an object that has been determined as a detection target according to the specifications of the AI (inference unit) is called a specific object. For example, as a characteristic of the endoscopic image mentioned above, the object in the acquired image changes due to minute vibrations of the tip, the position of a specific object in the image changes suddenly, or unnecessary objects enter the image. There are many things. Therefore, the position and size of the object change drastically.

The image determination unit 15 digitizes the content of the image and determines whether the images are similar or not, and whether or not the continuous input pair images are substantially unchanged. This numerical value is such that the pattern of the same object in the image can be tracked. This numerical value may be determined by detecting changes in results, etc. in time series, as described above. Furthermore, temporally adjacent images (or inference results of images), etc. may be compared. A known method may be used as appropriate to quantify the similarity of images. Here, numerical values such as composition, color, brightness, etc. are assumed, and it is assumed that changes in the size of the target object, whether the target object is within the screen, etc. can also be determined using these numerical values.

When the image determination unit 15 determines similar image groups, it converts the content of the images into numerical values. may include dissimilar images that cannot be said to be similar. As will be described later with reference to FIG. 6, the distal end of the endoscope has small movements, which makes it difficult to stare at it and makes it easy for objects to fall out of the image. Therefore, when the same region is observed over a predetermined number of frames, the similar image group may include dissimilar images during the observation. The determination of similar image groups will be described later using FIG. 3. The image determining unit 15 may also determine that the two input images have not changed substantially by, for example, calculating the amount of movement of corresponding points, or may determine that the two input images have not changed substantially. It may be determined that the two images are substantially unchanged based on at least one of the quantities.

The image determination unit 15 compares the contents of images (which may be endoscopic images) obtained by temporally consecutive imaging, and selects a group of similar images in which the same region is observed over a predetermined number of frames. , functions as a similar image group determination unit that performs determination including dissimilar images (for example, see S3 in FIGS. 2 and 5, and the image identification unit 43 in FIGS. 3 and 7). Further, the similar image group determination unit described above converts the pattern of the same object in the endoscopic image into a numerical value that can be tracked, and uses the numerical value to determine whether or not the images are similar (for example, , see Figure 3). When performing an endoscopy, an observer may stare at the same object or change the observation method to confirm a specific object in order to sufficiently confirm the object with the naked eye. Therefore, the number of frames corresponding to the time required for such observation is the predetermined number of frames. Furthermore, as described above, when observing with an endoscope, the distal end cannot be fixed in space and moves slightly, making it difficult to stare at the object and making it easy for the observed image to deviate from the object. Similar image groups should include images that may include teacher image candidates important for diagnosis (e.g., images in which the same body part is observed), which correspond to weak images or missed images as described above. ing. Therefore, a group of images that may include similar images is also determined to be a group of similar images, including dissimilar images.

The above-mentioned similar image group may include the results determined according to the similarity of images before and after endoscopic images obtained by sequentially capturing a group of images corresponding to a sudden change in viewpoint position. good. In addition, the similar image group is a group of rapidly changing images that corresponds to the blurring of visual images caused by unconscious eye movements called saccades (in the case of humans, this is corrected by the brain and is not noticed). The results determined according to the similarity of the images may be included before and after the endoscopic images obtained by consecutively capturing images. The image corresponding to the above-mentioned saccade will be described later using FIGS. 4 and 6, but it is an image corresponding to eyeball movement. Here, the image change equivalent to a saccade is caused by a change in the relative positional relationship between the tip of the endoscope and the object, that is, a change in the relative position between the tip of the endoscope and the object. This is a conceptual representation of the sudden change in the image depending on the positional relationship with the light source.

The above-mentioned image change equivalent to a saccade is related to the special characteristics of an examination using an endoscope or the like. In other words, even if you try to observe the object in the same way, if there is movement in the object or the imaging unit, the image will change significantly even though the relative change is relatively small. In other words, even if an object is captured within the screen, its size and position within the screen are likely to change suddenly, and its exposure and brightness distribution are likely to change suddenly. Moreover, these images cannot be taken on purpose, and may or may not occur depending on the situation, so it is difficult to use them as intentional training data.

A situation in which a sudden image change similar to the image change equivalent to a saccade as described above occurs is likely to occur in endoscopic images. In addition to the fact that the object tends to move, the object may be immersed in body fluids or cleaning fluids, may be subjected to operations during endoscopic observation such as suction or air supply, special light observation, or staining. In many cases, various methods are used in combination to observe the object, such as moving the object and operating the treatment instruments in conjunction. In some cases, these things occur simultaneously, leading to complex changes, and unintended (or unconscious) sudden changes in the image are extremely likely to occur. In response to such situations, it is desirable to correctly confirm the target object.

In this way, image changes (sudden changes) equivalent to saccades have many causal parameters and can occur in a complex manner, so it is often difficult to prepare patterns for all complex causes during machine learning. Images at the time of these sudden changes may be determined not to be similar images when a similarity determination using pattern matching or a similarity determination performed by quantifying the images is performed. In other words, there is a possibility that the images are determined to be dissimilar images with different contents. Note that pattern matching is two-dimensional handling of numerical data for each pixel or for each coordinate within a screen, so it can be said to compare data numerically.

When similar images are acquired sequentially in time, there is a high possibility that important images are included even if the images are simply considered dissimilar. , dissimilar images that satisfy the above conditions are treated as images that form a group of similar images. In other words, when acquiring training data candidate images for acquiring endoscopic images for learning endoscopic inference models, the contents of endoscopic images obtained by temporally consecutive imaging are compared, and a predetermined A similar image group determination is performed on a group of similar images in which the same part is observed over a number of frames, including dissimilar images. The content of an image refers to the characteristics of the image, such as the shape, pattern, shadow, color, size, and position (including rotation and distortion, but it may also be corrected) of the object being imaged. image features).

The situation determination unit 16 may include a situation determination circuit, or may be realized by a processor including a CPU or the like executing a program. The situation determination unit 16 determines information regarding the usage status of the endoscope apparatus 10 used by a doctor or the like. For this purpose, the situation determining unit 16 determines, based on the image data acquired by the image processing unit 12, whether the positional relationship between the distal end portion where the imaging device is provided and the wall surface in the body cavity is directly facing each other. , it may be determined whether it is oblique. Further, it may be determined whether the image is in focus or not, how deep the image is, etc. Further, the usage status of the light source section in the imaging section 11 may be determined. For example, if the wavelength of the light source light is known, it can be determined whether narrow band imaging (NBI) is being performed. Furthermore, the usage status of the treatment section within the imaging section 11 may be determined. For example, if a doctor or the like performs a water injection operation, the effect of water injection will appear on the image, and if a doctor or the like performs a suction operation, the effect of suction will appear.

The determination result of the situation determination section 16 is output to the control section 20. As will be described later, the image data is associated with the situation determination result at that time. If the distal end is not directly facing the affected area, etc., but is at an oblique position, the affected area, etc. is difficult to see, making it difficult to make inferences. Furthermore, when a water injection operation is performed, the screen is affected by the water, making it difficult to see the affected area, making it difficult to make inferences. If such situational information is available, and images are inferred together with this situational information, it becomes possible to improve the reliability of inference for finding affected areas. The situation determination unit 16 functions as an imaging information acquisition unit that can acquire imaging information corresponding to each image selected by the learning image selection unit (for example, see S6 in FIG. 5).

The information association unit 17 may include an information association circuit, or may be realized by a processor including a CPU or the like executing a program. The information association unit 17 uses the situation information determined by the situation determination unit 16, the similar image information determined by the image determination unit 15, and the difference determination unit 14 for the image data processed by the image processing unit 12. At least one piece of information, such as information regarding a difference in confidence values of the determined inferences, is associated. The image data associated with information by the information association section 17 is recorded in the recording section 18 . Further, regarding the image data for which the difference determination unit 14 determines that the inference reliability value has a difference greater than a predetermined value, the communication unit 19 is notified of this fact. As will be described later, the communication unit 19 transmits the notified image data to the learning device 30, and uses it as teacher data when relearning.

The recording unit 18 includes electrically rewritable nonvolatile memory and/or volatile memory. This recording unit 18 stores a program for operating the CPU and the like in the control unit 20. It also stores model information of the endoscope device 10, various characteristic values and adjustment values within the endoscope 10, and the like. Furthermore, image data processed by the image processing section 12 and associated with information by the information association section 17 may be recorded. Details of the data structure recorded in the recording unit 18 will be explained using FIGS. 8 and 9. Note that the recording unit 18 may not record all of the information as shown in FIGS. 8 and 9, and may be shared with the recording unit 33 in the learning device 30.

As described above, the recording unit 18 stores the image data of the original moving image acquired by the imaging unit 11 and processed by the image processing unit 12, and the image data with information associated with it by the information association unit 17. Record. The reason why the original moving image is recorded in the recording unit 18 is to improve the inference model later based on the recorded content. Therefore, it is not necessary that the image data is the selected image data itself, but it is sufficient that the image data can be easily found by searching for the image data. In other words, the information may be such that the original examination video is recorded as is and a specific frame can be selected from among the recorded images. For example, it shows a specific frame, such as an image at the start of an inspection, an image at the time of detection of a specific object, etc. It can also be information for This information will be described later using FIGS. 8 and 9.

In addition, there are various ways to create an inference model, such as when improving an inference model by relearning, customizing it for a specific situation, increasing its versatility, or creating a new inference model. There are specifications and direction. Therefore, it is preferable to record necessary related data in accordance with the specifications and direction of the inference model. For example, when creating or improving an inference model that fits a specific profile (gender, age group, race, medical history, etc.), it is necessary to organize and record the test results for which profile. If you do this, it will be easier to select the optimal training data candidates. In addition, data representing the various backgrounds at the time of data acquisition is useful when actually using the data, such as which hospital and which doctor performed the test, the presence or absence of personal information, and the presence or absence of various contracts such as informed consent, terms of use, etc. It may also be possible to record information such as:

As described above, when recording an image in the recording unit 18, it is preferable to record related information in association with the image. For example, in the example of Fig. 4, when using the cross (×) as the reliability judgment result as learning data, it is possible to annotate it with reference to the information of the circle (〇), so the coordinates and/or the judgment result, etc. may be recorded as a reference value. In the example in Figure 4, if the reliability is determined to be 〇××〇, and if 〇 and 〇 are judged to be the same, then the × in between can be converted into training data by referring to the image of 〇. There is. After acquiring the video, an annotator or robot can easily annotate the X image appropriately by referring to related information (reliability judgment results).

The communication unit 19 has a communication circuit (transmission/reception circuit), and communicates with the communication unit 31 in the learning device 30. The communication unit 19 can reduce the communication and recording load by selecting and transmitting necessary information from the information recorded in the recording unit 18. Additionally, security can be increased by not sending unnecessary data. For example, images acquired by the imaging unit 11 that have different inference results are transmitted to the learning device 30 for learning. However, if there are no restrictions or based on a contract, all information (including images) may be sent. At this time, the communication unit 19 sends the image data that has been associated with information by the information association unit 17 and that has a difference in the inference result of each image included in the similar image group by the difference determination unit 14 to the learning device 30. It may also be transmitted to the communication section 31 inside. Communication can also be used when replacing an inference model, and the communication unit 19 receives the inference model generated by the learning unit 34 in the learning device 30 through the communication unit 31. In addition, each request signal is sent and received, and information that satisfies the conditions is sent and received in accordance with the request signal.

The control unit 20 is a processor that includes a CPU (Central Processing Unit), its peripheral circuits, memory, and the like. This processor may be one, or may be composed of multiple chips. The CPU executes the entire endoscope apparatus 10 by controlling each part within the endoscope apparatus 10 according to a program stored in the memory. Each part within the endoscope apparatus 10 is realized by being controlled by software using a CPU. All or part of the difference determination section 14, image determination section 15, situation determination section 16, and information association section 17 described above may be realized by a processor within the control section 20. The above-mentioned processor may realize all or part of the similar image group determination section, the image identification section, the inference result difference calculation section, the inference result change calculation section, the learning image selection section, and the imaging information acquisition section. Similarly, the processor in the control section 20 may realize all or part of the functions of the imaging section 11, the image processing section 12, and the inference section 13. Further, the control unit 20 may operate in cooperation with the control unit 35 in the learning device 30, so that the endoscope device 10 and the learning device 30 operate as one.

The control unit 20 performs a difference determination on the inference result (reliability etc.) inferred by the inference unit 13 for the image determined by the image determination unit 15 to be an image forming part of a similar image group. Using the result of the difference calculated by the unit 14, an image to be used for relearning is selected (for example, see S7 in FIG. 2 and S8 in FIG. 5). For example, when a situation changes from a high-confidence situation to a low-confidence situation, an image with a high confidence level is generally likely to have been created using a similar image when creating an inference model. On the other hand, images with low reliability are likely not to be used when creating an inference model. However, images included in a group of similar images may be images of the same object, even if they are dissimilar. Such images are not good for inference models, and it is desirable to be able to make accurate inferences even for such difficult images. Therefore, in this embodiment, when such an inference result (reliability or the like) changes, the image at that time is selected as a candidate image for learning. When creating training data, an expert such as a doctor may determine whether a candidate image can be adopted as training data. In other words, if you leave it unselected, it will be adopted as training data, which will take time and effort to improve, but if you select it, it will be candidateized as training data and you can immediately use it to improve the inference model. This is because there are cases.

The control unit 20 functions as a learning image selection unit that selects an image to be used for relearning based on the difference in the inference results calculated by the inference result difference calculation unit (for example, S7 in FIG. 2, S7 in FIG. (S8, see learning image selection unit 46 in FIG. 7). Images are selected based on differences in inference results, for example, when the reliability changes by more than a predetermined value, when the size of a specific object changes suddenly, or when the position of the object in the image is determined. If there is a difference in the inference results, such as when the value suddenly changes by more than a predetermined value, the changed image is selected for relearning. The selection here may be a selection for recording the image frame itself, but is not limited to this, and may also be a method of selecting information that allows an immediate search for the image frame.

The above-mentioned learning image selection unit may omit images that do not satisfy specific conditions or have extremely poor conditions. Under these specific conditions, for example, even if image correction or the like is performed, there may be cases where visibility is too low. As described above, images to be used for relearning are selected from among the images included in the similar image group based on the difference in the inference results by the inference result calculation unit. Therefore, at first glance, images that are dissimilar to images observing the same region may also be selected as candidate images for learning. However, images with extremely poor conditions, such as images where the entire screen has become completely white due to water injection, or images where the entire screen has become completely black due to the position of the light source, may be used for specific purposes. It can be said that there is a very low possibility that something is included.

Furthermore, the longer the time it takes to image the object, the more likely it is that the image will have poor conditions. In other words, (1) when the exposure time is long, (2) when the light emission time of the light source is long to obtain a sufficient amount of light inside the body cavity, (3) when the gain in the image sensor or circuit is high, etc. The image may become extremely poor. This is because when the exposure time etc. becomes longer, the relative positional relationship between the specific object and the imaging unit 11 changes during that time, resulting in poor images such as blurring of the image.

Therefore, the learning image selection unit excludes from the selection targets images with extremely poor conditions such that it is impossible to include an image of such a specific object. This may be determined based on the contrast of the image data acquired by the imaging unit 11, the amount of image blur, etc. Furthermore, without directly determining the image quality, it may be determined that the image is in extremely poor condition based on the photographing conditions or the like. In addition, when combining multiple frames of images, such as HDR (High Dynamic Range) shooting or depth compositing shooting, it is easy to obtain an image with extremely poor conditions, so in this case as well, depending on the shooting conditions, It may be determined that the image is under extremely poor conditions.

The control unit 20 also functions as a learning image selection unit that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation unit (for example, S7 in FIG. 2, (See S8 in FIG. 7, the learning image selection unit 46 in FIG. 7). This learning image selection unit selects a teacher data candidate image to be used for learning when the amount of change exceeds a specified value. In other words, if the amount of change is larger than a predetermined value, it may be a different image even though it is almost the same identified image, and it may not have been used for learning before, so it is considered a learning candidate. Select as image.

The learning device 30 is assumed to be a part that controls the creation and improvement of an inference model, and is assumed to be located outside a hospital (examination facility). The learning device 30 includes a communication unit 31 for exchanging data through communication, and also includes an image processing unit 32, a recording unit 33, and a learning unit in order to efficiently perform highly specialized learning. 34, and a control section 35. For example, if the recording unit 33 is located outside, communication may become a constraint during learning, but the learning device 30 may cooperate with a recording unit located at another location through communication. The learning device 30 may be located on a server or the like, and in that case, it is connected to the endoscope device 10 through a communication network such as the Internet. Further, the learning device 30 is connected to a large number of devices, receives a large amount of teacher data from these devices, performs learning using this teacher data, and generates an inference model. Note that the learning device 30 may receive teacher data candidates such as image data, perform annotation to create teacher data, and use this teacher data to generate an inference model.

The communication unit 31 has a communication circuit (transmission/reception circuit), and communicates with the communication unit 19 in the endoscope device 10. As described above, the communication unit 31 receives image data in which the inference results of the images included in the similar image group are different. Further, the communication unit 31 transmits the inference model generated by the learning unit 34 to the communication unit 19. Further, the communication unit 31 transmits to the communication unit 19 an inference model generated by relearning using the image data for relearning selected from the endoscope apparatus 10 based on the difference in the inference results. As mentioned above, if the recording section 33 is located outside, communication etc. may become a constraint during learning, but it is possible to cooperate with a recording section located at another location through the communication section 31. Good too.

The image processing unit 32 has an image processing circuit, receives image data from the endoscope device 10, and performs various image processing such as development processing in accordance with instructions from the control unit 35. The image processing may be the same as the processing in the image processing section 12, or the processing contents may be changed as appropriate. Here, the processed image data may be recorded in the recording unit 33, or may be displayed on a display device or the like. Furthermore, image processing may be performed during learning. For example, the learning images recorded in the recording unit 33 may be subjected to image processing to make them easier to learn, such as changing the image size and performing emphasis processing to make them easier to process. . Alternatively, the image processing section 32 may intentionally generate an image that is difficult to judge, and the learning section 34 may use this image for testing when learning.

The recording unit 33 includes electrically rewritable nonvolatile memory and/or volatile memory. This recording unit 33 stores a program for operating the CPU and the like in the control unit 35. The recording unit 33 also stores various characteristic values, adjustment values, etc. within the learning device 30. Furthermore, the recording unit 33 records training training data (including image data for re-learning selected based on differences in inference results) transmitted from the endoscope apparatus 10. Details of the data structure recorded in the recording unit 33 will be explained using FIGS. 8 and 9. Note that the recording unit 33 may not record all of the information as shown in FIGS. 8 and 9, and may be shared with the recording unit 18 in the endoscope apparatus 10.

The learning unit 34 includes an inference engine, and performs machine learning such as deep learning using the learning teacher data recorded in the recording unit 33 to generate an inference model. When a predetermined number of image data for relearning selected based on differences in inference results are accumulated from the endoscope apparatus 10, relearning is performed. By performing relearning, it is possible to generate an inference model that can perform highly reliable inferences even for weak images or missed images.

Here, deep learning will be explained. "Deep learning" is a multilayered version of the "machine learning" process that uses neural networks. A typical example is a forward propagation neural network, which sends information from front to back to make decisions. Note that the above-mentioned inference section 13 also includes a forward propagation neural network. The simplest version of a forward propagation neural network (for example, the inference engine 13A in FIG. 6 has a similar structure) has an input layer consisting of N1 neurons and N2 neurons given by parameters. It is sufficient to have three layers: an intermediate layer composed of an output layer composed of N3 neurons corresponding to the number of classes to be discriminated. Each neuron in the input layer and the intermediate layer, and the intermediate layer and the output layer, are connected by connection weights, and a bias value is added to the intermediate layer and the output layer, thereby easily forming a logic gate.

A neural network may have three layers if it performs simple discrimination, but by having a large number of intermediate layers, it is also possible to learn how to combine multiple features in the process of machine learning. In recent years, systems with 9 to 152 layers have become practical in terms of learning time, judgment accuracy, and energy consumption. Alternatively, a "convolutional neural network" that performs a process called "convolution" that compresses image features, operates with minimal processing, and is strong in pattern recognition may be used. In addition, a "recurrent neural network" (fully connected recurrent neural network) that can handle more complex information and that allows information to flow in both directions may be used to support information analysis whose meaning changes depending on order and order.

In order to realize these techniques, conventional general-purpose arithmetic processing circuits such as a CPU or FPGA (Field Programmable Gate Array) may be used. However, this is not limited to this, and since much of the processing of neural networks is matrix multiplication, it is not possible to use processors called GPUs (Graphic Processing Units) or Tensor Processing Units (TPUs) that specialize in matrix calculations. good. In recent years, neural network processing units (NPUs), which are specialized hardware for artificial intelligence (AI), have been designed to be integrated with other circuits such as CPUs and become part of processing circuits. In some cases.

Other machine learning methods include, for example, support vector machine and support vector regression. The learning here involves calculating the weights, filter coefficients, and offsets of the classifier, and in addition to this, there is also a method that uses logistic regression processing. When a machine makes a decision, humans need to teach the machine how to make a decision. In this embodiment, a method of deriving the image judgment by machine learning is adopted, but a rule-based method that applies rules acquired by humans using empirical rules and heuristics may also be used.

The control unit 35 is a processor that includes a CPU (Central Processing Unit), its peripheral circuits, memory, and the like. This processor may be one, or may be composed of multiple chips. The CPU executes the entire learning device 30 by controlling each part within the learning device 30 according to a program stored in the memory. Each part within the learning device 30 is realized by software control by the CPU. Further, the control unit 35 may operate in cooperation with the control unit 20 in the endoscope device 10, so that the endoscope device 10 and the learning device 30 operate as one.

Next, the structure of data recorded in the recording unit 50 will be explained using FIG. 8. The recording unit 50 can be applied to the recording unit 18 in the endoscope device 10 and/or the recording unit 33 in the learning device 30, and the recording unit 50 has an electrically rewritable nonvolatile memory. Although all of the data as shown in FIG. 8 may be recorded in the recording unit 18 or the recording unit 33, the data may be recorded in a divided manner in both recording units. That is, data required by the endoscope device 10 and the learning device 30 may be recorded respectively. The recording section 50 has two recording areas: an inspection image recording section 51 and a utilization information recording section 52.

The test image recording section 51 is an area for recording test videos in the same way as recording medical records, and test videos A51a and B51b are recorded therein. Although FIG. 8 shows only two test videos, video A and video B, it is of course possible to record three or more test videos. Recording is sometimes important as evidence for patient diagnosis, and there are also cases where a still image corresponding to one frame of a video is recorded as a report, but the display of the still image is omitted in FIG. The test image recording unit 51 may record not only images but also test results and the like.

Additionally, the recording unit 50 is provided with a utilization information recording unit 52. In order to use the videos recorded in the examination image recording unit 51 for evidence reports, for example, as teaching data, processing procedures (for example, informed consent) are required, and depending on the video, doctors and patients may not want to use it. There are cases where we do not agree. Furthermore, when learning, it is desirable to consider various relevant information at the time of the examination. Therefore, it is preferable to provide a usage information recording section 52 that records such usage conditions, etc., and to organize and record information on which videos can be used for what purposes. In this case, the video utilization information folder and the teacher data folder may be recorded separately. Note that the usage information recording units do not necessarily need to be located in the same recording device.

In the example shown in FIG. 8, video usage information A52 and video usage information B53 are provided in the usage information recording unit 52. Although FIG. 8 shows only two types of video usage information, usage information A and usage information B, it is of course possible to record three or more usage information depending on the number of inspection videos. be. Since the information recorded in the moving image utilization information A53 and the moving image utilization information B56 are similar, the information recorded in the moving image utilization information A53 will mainly be explained here.

The original video information 53a is information indicating which video this information corresponds to. For example, the video usage information A53 corresponds to the test video A51a, and the video usage information B56 corresponds to the test video B51b. The acquired information 53b includes information such as the date and time of the test, the testing institution, the name of the doctor who conducted the test, and the model name of the device used. The inference model type 53c indicates the type of inference model used by the endoscope apparatus 10 used for the examination. If you do not know the inference model used, you will not know which inference model to use for relearning.

The usage condition information 53d indicates under what conditions this video can be used. For example, if the scope of use is determined by informed consent, etc., that fact is recorded. The profile information 53e is information such as the sex, age, ID, and medical examination results of the subject (patient) who underwent the test. The inference result information 53f is information regarding the result of inference for each frame of the image.

Additionally, a first teacher data folder 54 and a second teacher data 55 are provided in the usage information recording section 52. The first and second

teacher data folders

54 and 55 contain images selected as teacher data candidates by the control unit 20 based on differences in inference results using the test video A51a, that is, the first teacher data candidates 54a, the Two teacher data candidates 55a are recorded. This candidate image may record the candidate image data itself, or may contain information (for example, the number of frames) that can specify candidate image data from among the images recorded in the inspection image recording section images, or images taken at what time and minute, etc.) may be recorded. Further, related information (first related information 54b, second related information 55b) associated by the information association unit 17 is recorded. As the first related information 54b and the second related information 55b, for example, situation information, similar image information, information regarding differences in reliability values, etc. are recorded.

The video usage information record B56 also includes original video information 56a, acquisition time information 56b, inference model type 56c, usage condition information 56d, profile information 56e, inference result information 56f, first teacher data folder 57, and first teacher data. Candidates 57a, first related information 57b, second teacher data folder 58, second teacher data candidates 58a, and second related information 58b are recorded, but these pieces of information are stored in the video utilization information recording section A55. Since they are similar, detailed explanation will be omitted.

Next, a modification of the structure of data recorded in the recording unit 50 will be described using FIG. 9. That is, FIG. 9 is a modification of the method of recording data that is organized and recorded in the recording section 50, in which the moving image recording section A61 and the moving image recording section B65 are combined with the moving image utilization information A and the moving image utilization information B. It was designed to be recorded. Data is managed centrally and there is no need to search for video recording locations when using the information, but the file size may become too large.

That is, in this modification, an inspection image 61a is recorded in the moving image recording section A61, and an inspection image 65a is recorded in the moving image recording section B65. These inspection moving images are the same as those recorded collectively in the inspection image recording section 51 in FIG. In this modification, the video recording unit A61 also includes acquisition time information 62a, usage condition information 62b, inference model type 62c, profile information 62d, inference result information 62e, and a first teacher data folder 63 (first teacher data folder 63). A data candidate 63a and first related information 63b) and a second teacher data folder 64 (including a second teacher data candidate 64a and second related information 64b) are recorded. Similarly, in the video recording unit A65, acquisition time information 66a, usage condition information 66b, inference model type 66c, profile information 66d, inference result information 66e, first teacher data folder 67 (first

teacher data candidate

67a, 1 related information 67b) and a second teacher data folder 68 (including a second teacher data candidate 68a and second related information 68b) are recorded. Although only two video recording sections, the video recording section A61 and the video recording section B65, are depicted in the recording section in FIG. 9, the recording area can be increased as appropriate depending on the number of inspection videos. good.

As shown in FIGS. 8 and 9, various applications are possible regarding which recording unit is placed in which recording device. Also, which folder should these data be placed in and which recording unit should record them may be determined as appropriate depending on the situation and environment at the time. Further, the teacher data folders may be divided according to the characteristics of the teacher data.

Next, in explaining the operation of an embodiment of the present invention, first a method for selecting an image to be used for relearning from a group of similar images will be explained. FIGS. 6A and 6B show an example when the distal end of the endoscope device 10 is inserted into a patient's body cavity, and show examples of images acquired at this time. The black-painted images (P1, P2, P11, P12) are the images at the time of insertion, and the black-painted images (P8, P9, P18, P19) are the images at the time of removal, and these transition images P1, A specific object such as an affected area is not shown in P2, P8, and P9. Moreover, the diagonally shaded images (P5, P6, P15, P16) are observation images when a doctor or the like finds a specific object such as an affected area and carefully observes it. Intermediate images P3, P4, and P7 are acquired between the transition images P1, P2, P8, and P9 and the observation images P5 and P6. Similarly, intermediate images P13, P14, and P17 are acquired between transition images P11, P12, P18, and P19 and observation images P15 and P16.

Further, images P5 to P7 and P15 to P17 within the range of the broken line frame Is are images that depict similar objects, are similar to each other, and belong to a similar image group. A known method may be used to determine whether the images are similar. For example, the image determination unit 15 may calculate feature amounts from the images, and if the feature amounts of two images are within a predetermined range, it may be determined that the images are similar. Furthermore, the image determination unit 15 may determine whether the images are similar based on the composition, color distribution, etc. of the images.

Further, the image data acquired by the image sensor of the imaging unit 11 is input to the input layer of the inference engine 13A of the inference unit 13 after being subjected to image processing. The inference engine 13A performs inference using a neural network, and the inference result is output from the output layer. The inference engine 13A infers, for example, whether or not there is an image of a specific object such as an affected area in the endoscopic image. Based on this inference result, it can be shown that images P5, P6, and P16 are specific target object images. In FIG. 6(a), images P5 and P6 are surrounded by a thick frame, and in FIG. 6(b), image P16 is surrounded by a thick frame to indicate that they are specific object images, but the display method can be changed as appropriate. Just choose the most suitable method. Further, the inference engine 13A calculates the reliability of the inference result. In FIG. 6A, when the reliability value is higher than a predetermined value, it is indicated by a circle (◯), and when the reliability value is lower than the predetermined value, it is indicated by a cross (×).

Further, as described above, the situation determining unit 16 determines the usage situation of the endoscope device 10. In the example shown in FIG. 6, the situation determining unit 16 determines whether the distal end of the endoscope device 10 is facing the wall surface in the body cavity directly, that is, whether it is facing the wall surface in the body cavity, whether it is facing diagonally, etc. Obtaining angle information. FIG. 6(a) shows the acquired angle information. Although FIG. 6 shows only the angle information, the image information is not limited to this information, and other information may also be obtained, and an image to be used for relearning may be selected from a group of similar images.

In the example shown in FIG. 6(a), the inference engine 13A infers that images P5 and P6 are images of a specific target such as an affected area, the reliability is high, and the images P5 and P6 are in a state where they are directly facing the specific target. It is. In addition, in the example shown in FIG. 6(b), the inference engine 13A infers that the image P16 is an image of a specific target such as an affected area, the reliability is high, and the image P16 is directly facing the specific target. It is.

Since the reliability of the inference results for images P5, P6, and P16 that include specific objects is high, it can be said that there is a high possibility that the images P5, P6, and P16 include specific objects such as affected areas. However, among the similar image group Is, images determined to have low reliability (for example, images P7, P15, and P17) may also include specific objects such as affected areas. In the example shown in FIG. 6(b), although the image P15 includes a specific target object such as an affected area, the reliability of the inference is low and it is displayed that the specific target object is not included. In the example shown in FIGS. 6(a) and 6(b), when the endoscope device 10 is directly facing a specific object such as an affected area, the reliability of the inference is high; If the angle is oblique to the specific object, the reliability of the inference is low. That is, the inference model set in the inference engine 13A is unlikely to be able to provide highly reliable inference when the tip of the endoscope is oblique to the wall surface within the body cavity. In such a case, it is preferable to perform re-learning using images with different inference results (unfavorable images) to generate a more reliable inference model.

In this way, when inputting images from the endoscope device 10 into an inference model and performing inference, it is difficult to create a highly reliable inference model due to the following special characteristics of endoscopes, etc. There are issues unique to endoscopic devices. In other words, (1) the tip of the endoscope device is not fixed spatially, so the direction of the tip is blurred, and the direction of the image that can be acquired changes rapidly (images equivalent to saccades). sudden change). In this way, the line of sight direction of the image may jump. (2) Since the distance between the tip of the endoscope and the wall surface of the body cavity is originally close (nearby distance), even a slight movement of the tip can significantly change the object to be observed. (3) When the light source for illuminating the inside of the body cavity and the image sensor move simultaneously, the brightness changes suddenly and the image changes suddenly. Additionally, changes in brightness, saturation, and contrast of the image may occur.

As mentioned above, the tip of the endoscope has small movements, which makes it difficult to stare at and easily detaches from the object. At this time, there may be important images (teacher image candidates) that correspond to weak images or missed images. By learning such images that are teacher candidates as teacher data, images that include specific objects such as affected parts can also be extracted as teacher candidates. For this reason, in this embodiment, images that are candidates for teacher data are selected from a group of similar images that are similar to images of a specific target such as an affected area. Note that the similar image group may also include dissimilar images acquired while observing the same region. Furthermore, when selecting training data candidates from a group of similar images, differences in inference results between adjacent images are used.

Next, the imaging operation in this embodiment will be explained using the flowchart shown in FIG. This flow is realized by the CPU of the control section 20 within the endoscope apparatus 10 controlling each section within the endoscope apparatus 10 based on a program stored in a memory such as the recording section 18.

When the operation of the flow shown in FIG. 2 starts, first, imaging and inference are performed (S1). Here, the control unit 11 instructs the imaging device and the imaging control circuit in the imaging unit 11 to start an imaging operation. In the imaging operation, imaging signals for one screen (one frame) are sequentially read out at time intervals of a predetermined frame rate. The imaging operation continues repeatedly until it is determined in step S9 that the imaging operation has ended. When the imaging operation is started, the imaging element in the imaging section 11 outputs an image signal, and the image processing section 12 processes the image signal so that it becomes visually recognizable information. The image data subjected to this image processing is output to the display unit 12a and displayed as an endoscopic image.

Furthermore, in step S1, the inference unit 13 receives image data that has been subjected to image processing and performs inference. That is, the inference unit 13 infers a specific object image included in the captured endoscopic image using a machine learning model. Make inferences using . This inference result is superimposed on the endoscopic image in the image processing section 12, and this image is displayed on the display section 12a. This inference is performed, for example, in order to display advice to assist in diagnosis to a user of the endoscope apparatus 10, such as a doctor. If a specific object such as an affected area is included in the endoscopic image, a display to that effect may be provided. Further, advice regarding endoscope operation to find a specific object may be inferred. Further, the inference unit 13 may infer at least one of classification, detection, and region extraction of the input image. That is, if there is an affected area in the input image, the affected area may be detected, and the type of affected area, the detected position, the range of the affected area, etc. may be inferred. After performing these inferences, the inference unit 13 calculates the reliability of this inference.

After imaging and inference are performed, next, a similar image group is determined (S3). Here, the image determination unit 15 compares the contents of endoscopic images obtained by temporally consecutive imaging, and observes the same region even if dissimilar images are included for a predetermined number of frames. Determine a group of similar images. As described above (see FIG. 6), in this embodiment, an image for relearning is selected from among the images in the similar image group Is. In this step, it is determined whether the input image is a similar image. Specifically, when image data for one frame is output from the image processing unit 12, the image determining unit 15 compares the feature amounts of the previous frame and the current frame, and determines whether or not they are similar images. Determine whether Note that whether or not an image belongs to a similar image group may be determined using parameters other than the feature amount.

As mentioned above, even if endoscopic images successively detect similar parts, they tend to change easily and are not similar images based on simple pattern matching or similarity determination when quantifying images. It can be determined as In other words, the images may be determined to be dissimilar images with different contents. However, if similar images are acquired sequentially in time, there is a high possibility that important images may be included even if the images are simply considered dissimilar. Dissimilar images that satisfy the conditions are treated as images that constitute a group of similar images. In other words, when acquiring training data candidate images for acquiring endoscopic images for learning endoscopic inference models, the contents of endoscopic images obtained by temporally consecutive imaging are compared, and a predetermined A group of similar images in which the same part is observed over a number of frames is determined as a group of similar images, including dissimilar images. The content of an image refers to the characteristics of the image, such as the shape, pattern, shadow, color, size, and position (including rotation and distortion, but it may also be corrected) of the object being imaged. image features).

As described above, when determining a similar image group, dissimilar images may also be included in the similar image group. This point will be explained using FIG. 4. FIG. 4 shows an example of images acquired by the imaging unit 11, and images P21, P22, and P29 are transition images when the endoscope device 10 is inserted and removed, similar to FIGS. 6(a) and 6(b). P23 and P24 are intermediate images. Images P25 and P28 are images when the distal end of the endoscope is directly facing a specific target area such as an affected area. These images P25 and P28 clearly show specific objects such as the affected area, and the reliability of the inference is high.

Images P26 and P27 between image P25 and image P28 are images corresponding to a saccade (involuntary eye movement), and are expressed as image micromovement. A saccade is an involuntary change in visual field, and an image change equivalent to a saccade refers to a sudden change in the image due to an object at close range or the positional relationship between the light source and the object. This image change equivalent to a saccade is related to the special nature of the examination using an endoscope or the like, as described above in the explanation of FIG. In this way, since figures P26 and P27 are images corresponding to saccades, they are unintended images and unexpected images, and may take various forms of change. For this reason, images corresponding to saccades are not always included in the training data during learning, which tends to make inference difficult. Here, we express that the quality is changing. Therefore, images P26 and P28 are likely to be determined to be dissimilar to images P25 and P26. However, even if the image has a quality change, if the image quality is not extremely deteriorated, it may show a specific object such as an affected area, and depending on the specifications of the inference model, it may be an object that should not be overlooked. There is a possibility that it is. Therefore, in this embodiment, dissimilar images included in a similar image group are also treated as a similar image group. The detailed operation of similar image group determination in step S3 will be described later using FIG.

After performing the similar image group determination, it is then determined whether there is a difference in the inference results within the image group (S5). As described above, the inference section 13 performs inference on the image acquired by the imaging section 11 and subjected to image processing by the image processing section 12 . In this step, it is determined whether there is a difference in the inference results among the similar image groups. That is, the difference determination unit 14 calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the image determination unit 15.

Furthermore, in step S5, reliability may be used as the difference in the inference results for determination. Furthermore, in addition to the reliability, it may be determined that there is a difference in the inference results, for example, when the detection range or position of a specific object such as an affected area varies. Furthermore, it may be determined that there is a difference in the inference results when the appearance state changes, such as when a specific object such as an affected area appears or does not appear. Furthermore, the determination may be made based on how much the difference in reliability values and the like has changed. In addition, just as the visual image is corrected in the brain even when the human eye makes a saccade, which is an unconscious movement, it is easier to use if the inference result is not disrupted even if the image suddenly changes. , it can be an inference model that is friendly to people who perform examinations with endoscopic equipment. There are many parameters that cause this sudden image change, and it can occur in a complex manner, so it is often difficult to prepare all patterns during machine learning.

Note that there are the following cases (1) to (5) as factors that can change the reliability. In other words, (1) when structural enhancement processing that affects image quality is performed, (2) when image processing such as TXI (Texture and Color Enhancement Imaging) is changed, (3) when NBI (Narrow Band Imaging), RDI (Red (4) When a treatment tool such as forceps is inserted or removed; (5) When performing magnified observation (pressing a hood or photographing using the immersion method by filling the hood with water) There is. In addition to simply determining reliability as high (○) or low (x), the reliability determination method may be changed depending on the mode.

As a result of the determination in step S5, if there is a difference in the inference results within the image group, a learning image is selected (S7). In this step, the control unit 20 selects an image to be used for learning (which may include relearning) based on the difference in the inference results calculated by the difference determination unit 14. Based on the amount of change calculated by the determination unit 14, a teacher data candidate image to be used for learning is selected. As mentioned above, if the images are among a group of similar images and there is a difference in the inference results, in other words, even if the reliability is determined to be low, it is difficult to identify the affected area, etc. The object may be in the image. Therefore, the control unit 20 selects such an image for relearning. Of course, there is a possibility that the selected images include images that do not include a specific object such as an affected area. However, such inappropriate images can be eliminated when performing annotation to create training data. Rather than eliminating images, a better inference model can be generated by selecting weak images or missed images for relearning.

The selected image itself or the information that allows the selected image to be searched are organized and recorded in the recording section 18 so that the inference model can be improved immediately using these images and information, and the information is stored in the communication section 19. The information is transmitted to the learning device 30 through Note that each time an image is selected, it is transmitted to the learning device 30 through the communication unit 19 and recorded in the recording unit 33, and when a predetermined number of images for re-learning have been collected, re-learning is performed to generate the inference model. May be generated. Note that when selecting learning images in step S7, it is preferable to exclude images with extremely poor image quality. Images that are not suitable for learning may be defined as having poor image quality. In addition, there are cases where visibility does not improve even after image processing, or where there is no difference in the level of two-dimensionally arranged pixel signals (for example, when the entire screen is pure white or pure black), and it is difficult to recognize it as an image. If the image is too bad, such as when it cannot be used, it is not suitable for learning.

If a learning image is selected in step S7, or if there is no difference in the inference results within the image group as a result of the determination in step S5, then it is determined whether or not to end (S9). When a doctor or the like performs an operation to end the endoscopy, it is determined that the endoscopy has ended. As a result of this determination, if it is not the end, the process returns to step S1, whereas if it is the end, the flow is ended after the end process is performed.

Next, the similar image group determination operation in step S3 (see FIG. 2) will be described using the flowchart shown in FIG.

When the similar image group determination flow starts, first, image features are temporarily recorded (S11). Here, the image processing section 16 or the image determination section 15 calculates the feature amount of the image acquired by the imaging section 11. As described above, the feature amount of the image may be calculated using a known method. Here, the calculated image feature amount is temporarily recorded in a memory provided inside the recording section 18, the control section 20, etc. In this memory, a history of image feature amounts, that is, image feature amounts are temporarily recorded in chronological order.

After the image feature amount is provisionally recorded, it is then determined whether or not the image feature amount is similar to the previous frame (S13). Here, the difference determination unit 14 compares the image feature amount of the latest image frame and the image feature amount of the immediately previous image frame based on the image feature amount temporarily recorded in step S11, and the image features are similar. Determine whether or not. In this case, if the image feature amounts of the two images are outside the predetermined range, it is determined that the two images are not similar.

As a result of the determination in step S13, if it is determined that the features of the image are not similar to the immediately preceding frame, then it is determined whether the features are similar to the images before the immediately preceding frame (S15). . Here, the difference determination unit 14 compares the image feature amount of the latest image frame with the image feature amount of the image before the immediately preceding frame, based on the image feature amount provisionally recorded in step S11, It is determined whether the features are similar to the image before the previous frame. The number of previous frames to be compared as the immediately previous image may be determined according to the design concept, and may be changed as appropriate depending on the state of the image.

In step S15, the number of frames before the previous frame is determined to avoid the time during which a specialist such as a doctor may miss a specific object when performing an examination such as an endoscopy, or the occurrence of weak images. The number of frames may be any number that corresponds to the time period during which there is a possibility that the image will be lost. For example, the number of frames corresponding to the time it takes for camera shake to subside, the time it takes for the slight vibration of the tip to subside, the number of frames corresponding to the time it takes for tip shake to subside during endoscope operation, or the number of frames corresponding to the time it takes for the tip to stop shaking during endoscope operation, or the operator's hand. The number of frames may be any number that corresponds to the time it takes for the blur to subside. Further, in images acquired during a specific procedure, only the image of that procedure may appear to be a different image. It is better to treat such images as a group of similar images.If the type of treatment is known, this can be detected and the "previous frame" You may also decide on the number of frames and identify the images before the previous frame. Similarly, if the situation can be determined, such as when there is bleeding or when water vapor is generated due to the treatment, the number of frames immediately before may be determined depending on the situation.

If the result of the determination in step S13 is that the features of the latest frame image and the image of the immediately preceding frame are similar, or as a result of the determination of step S15, the characteristics of the image of the latest frame and the images before the immediately preceding frame are similar. If so, the image is determined to be a similar image (S17). As a result of the determination in steps S13 and S15, the latest image is similar to the immediately preceding frame or the immediately preceding frame, and is therefore determined to be a similar image. Similar images can be determined based on pattern matching or similarity determination when the images are digitized. However, when determining by pattern matching etc., the images may be determined as not similar (dissimilar), but if similar images are acquired temporally consecutively, the images may simply be determined as dissimilar. However, there is a high possibility that important images are included. Therefore, as described above, dissimilar images that satisfy the conditions are treated as images that constitute a group of similar images.

As a result of the determination in step S15, if the features are not similar to the images before the previous frame, or if it is determined that the image group is a similar image group in step S17, the similar image group determination flow is ended and the process returns to the original flow. In this similar image group determination flow, even if the images are dissimilar as a result of determining the feature amount of the images, if the characteristics are similar to the previous frame image, the images are determined to be a similar image group. That is, the similar image group may include dissimilar images.

In this way, in one embodiment of the present invention, images are acquired by temporally continuous imaging, and the images obtained by this imaging are inferred using a machine learning model (S1). Further, the content of the acquired images is digitized, the numeric values are compared, and images whose image change is less than or equal to a predetermined value are determined as a group of similar images (S3). The difference between the inference results of each image included in the determined similar image group is calculated, and it is determined whether or not there is a difference (S5). Based on the difference in the inference results, images to be used for learning are selected (S7). In this manner, in this embodiment, teacher data candidate images for inference model learning can be efficiently acquired. That is, among a group of similar images, there are cases where the inference results are different, such as images with high inference results and images with low inference results. However, if the inference result is low, there is a possibility that the input image is not suitable for the learning device, so such an image is selected as the teacher data candidate image. Teacher data can be selected from among these images when performing annotation.

Next, a modification of the imaging operation shown in FIG. 2 will be described using the flowchart shown in FIG. In this modification, step S6 is simply added and step S7 is replaced with step S8 in the flowchart shown in FIG. 2, so this difference will be explained.

When the flow shown in FIG. 5 starts, similarly to the flow shown in FIG. 2, imaging and inference are performed (S1), similar image group determination is performed (S3), and whether there is a difference in the inference results within the image group. It is determined whether or not (S5).

As a result of the determination in step S5, if there is a difference in the inference results within the image group, a situation difference determination is performed (S6). Here, the situation determination unit 16 performs a difference determination in the usage status of the endoscope apparatus 10 and the like. The information acquired as the situation determination includes, for example, angle information (angle information between the wall surface in the body cavity and the tip of the endoscope) as shown in FIGS. 4 and 6(a) and (b). The angle information may be determined from changes in the image, or may be detected using a built-in sensor. Detection is also possible based on the uniformity and distribution of illumination. Sensor data, illumination light brightness distribution detection results, etc. may be recorded as they are. Information used for situation determination includes focus information and depth information in the imaging unit, information on the light source in the imaging unit 11, and treatment information such as water injection operation and suction operation. When inferring a specific object such as an affected area, the reliability of the inference will be improved if such situation information is available in addition to the image data. Therefore, when creating training data, the information association unit 17 determines the differences between these situations, associates the situation difference information with image data, and performs relearning using this associated image data. enable.

After performing the situation difference determination, next, a learning image is selected (S8). In this step, the control unit 20 selects an image to be used for learning (which may include relearning) based on the difference in the inference results calculated by the difference determining unit 14. That is, similarly to step S7, in step S5, an image among the similar image group and for which it has been determined that there is a difference in the inference result is selected. In this case, the situation difference information acquired in step S7 is organized, that is, recorded in association with the selected image. Based on this recorded information, learning can be customized for each situation.

If a learning image is selected in step S8, or if there is no difference in the inference results within the image group as a result of the determination in step S5, then it is determined whether or not to end (S9). When a doctor or the like performs an operation to end the endoscopy, it is determined that the endoscopy has ended. As a result of this determination, if it is not the end, the process returns to step S1, whereas if it is the end, the flow is ended after the end process is performed.

In this way, in the modified example of the embodiment of the present invention, a situation difference determination is performed (see S6), and this situation difference determination information is acquired and associated with the image data (see S8). By relearning using this image data, inferences can be made in accordance with changes in the situation during endoscopy, and the reliability of inferences can be improved.

Next, using FIG. 7, the flow of data in an embodiment of the present invention will be mainly explained. That is, FIG. 7 shows the flow of data among the imaging section 41, the developing section 42, the image identification section 43, the inference section 44, the inference result change calculation section 45, the learning image selection section 46, and the recording section 47. Each of these parts corresponds to each part of FIG. 1, as described later. All or part of the functions of these units may be realized by a processor, or may be realized by hardware and software.

The imaging unit 41 corresponds to the imaging unit 11 in FIG. 1, and acquires an image of a specific object such as an affected area as RAW data, and outputs it to the developing unit 42. As mentioned above, the image data output from the imaging unit 11 is continuous in time, and when the latest RAW data 2 is output, the immediately preceding RAW data 1 is an image pair, and the latest RAW data When the next RAW data 3 is output after outputting the data 2, the RAW data 2 and the RAW data 3 become an image pair. In other words, temporally consecutive images are paired.

The developing unit 42 develops the RAW data output from the imaging unit 41. In FIG. 1, the image processing section 12 performs the function of the developing section 42. The developed image data is output to the image identification section 43. The image identification unit 43 determines whether or not there is a large change between image pairs, and extracts one or more identical image pairs that are determined (identified) as having no large change. The extracted identical image pair is output to the inference section 44. Note that in FIG. 1, the image determination section 15 functions as the image identification section 43.

The image identification unit 43 functions as an image identification unit that determines that the temporally continuous input pair images have hardly changed (for example, S3 in FIGS. 2 and 5, the image determination unit in FIGS. 3 and 1) 15). Here, the input pair image is, for example, when a plurality of images such as a first image, a second image, a third image, etc. are sequentially input to the image identification unit, the first image and the second image are paired. The second image and the third image form a pair image. Further, the image identification unit determines that the input pair images are substantially unchanged based on at least one of the amount of movement of the corresponding points and the amount of change in brightness, saturation, and contrast of the images.

The inference unit 44 inputs the same image pair from the image identification unit 43, performs inference on each image, and outputs a pair of inference results (inference result pair) to the inference result change calculation unit 45. When outputting this inference result, the reliability of each inference result is also output to the inference result change calculation unit 45. Note that in FIG. 1, the inference section 13 functions as the inference section 44. In FIG. 1, the inference unit 13 does not directly input the same image pair from the image identification unit 43, but it may receive information from the control unit 20 that the images are the same pair.

Based on the inference result pair output from the inference unit 44, the inference result change calculation unit 45 calculates the inference result for the same image pair and the amount of change in its reliability. That is, the inference result for the latest image and the inference result for the immediately previous image are compared to calculate the amount of change, and the reliability of both inference results is compared to calculate the amount of change. This calculated amount of change is output to the learning image selection section 46. Note that in FIG. 1, the difference determination unit 14 functions as the inference result change calculation unit 45.

The inference result change calculation unit 45 functions as an inference result change calculation unit that calculates the amount of change in the inference result of the identified image pair (for example, see S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). . The identified image pair is, for example, an image in which two temporally consecutive input images (image pair) are almost unchanged. The inference result change calculation unit uses at least one differential value of the inference result and reliability (for example, see S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). The inference result change calculation unit 45 functions as an inference result change calculation unit that calculates the amount of change in the inference results of temporally consecutive image pairs (for example, S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). reference).

Furthermore, in step S5 of FIG. 2 described above, it is determined whether or not there is a difference in the inference results within the image group, and in this process, the inference result change calculation unit 45 This corresponds to calculating the amount of change in . As for the difference in the inference results, for example, if the reliability values of the inferences differ by a predetermined value or more, it may be determined that there is a difference in the inference results. In other words, it can be said that the inference result change output unit 45 calculates the amount of change in the inference results of temporally consecutive image pairs.

When the amount of change calculated by the inference result change calculation unit 45 is large, the learning image selection unit 46 selects the same image pair as an image (learning image candidate) to be used for learning, and records it as the selected learning image candidate in the recording unit. 47. If the amount of change in the inference result (or confidence level) is large, the image is likely to be difficult to identify by machine learning, and is therefore selected as a learning candidate image. In fact, whether or not to use it for learning can be determined by experts such as doctors when creating training data. Note that in FIG. 1, the control section 20 functions as the learning image selection section 46.

The recording unit 47 is an electrically rewritable nonvolatile memory, and sequentially records the learning image candidates input from the learning image selection unit 46. Teacher data for learning is created from among the learning image candidates recorded in the recording unit 47, and an inference model is generated by machine learning or the like. As described above, if an image is inappropriate as a learning image, it is sufficient to exclude the image and create training data using an appropriate image. Note that in FIG. 1, the recording section 18 and/or the recording section 33 fulfills the function of the recording section 47.

In this way, as shown in FIG. 7, in this embodiment, temporally consecutive images from the imaging unit 41 are paired, and the inference result conversion is performed for the image pairs determined to be the same image pair by the image identification unit 43. Learning candidates are selected based on the amount of change calculated by the calculation unit 45. Note that, in the explanation of FIG. 7, two temporally consecutive images have been described. However, for example, for temporally consecutive images A, B, and C, the inference result for each is A=C, but if A≠B and B≠C, only image B is selected (extracted). ), or three images, image A, image B, and image C, may be selected. In other words, if the amount of change in the inference results for the same image pair is greater than a predetermined value, both of the same image pair may be selected, or the image may be selected based on the comparison results with the previous and subsequent pairs of the same image pair. You may choose.

As described above, the teacher data candidate image acquisition device according to one embodiment of the present invention and its modification acquires a teacher data candidate image based on an endoscope-obtained image for learning an inference model for an endoscope. This training data candidate image acquisition device compares the contents of endoscopic images obtained by temporally consecutive imaging, and selects a group of similar images in which the same region is observed over a predetermined number of frames. , has a similar image group determination unit (for example, see the image determination unit 15 in FIG. 1, S3 in FIGS. 2 and 5, and the image identification unit 43 in FIGS. 3 and 7) that performs determination including dissimilar images. There is. The teacher data candidate image acquisition device also includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. and S1 in FIG. 5, inference unit 44 in FIG. A calculation unit (for example, see the difference determination unit 14 in FIG. 1, S5 in FIGS. 2 and 5, the inference result change calculation unit 45 in FIG. 7, etc.). The teacher data candidate image acquisition device further includes a learning image selection unit (for example, the control shown in FIG. 20, S7 in FIG. 2, S8 in FIG. 5, learning image selection section 46 in FIG. 7, etc.). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, from among the image group determined as a similar image group, images to be used for relearning are selected based on differences in the inference results by the inference unit. Therefore, the inference model can be improved by relearning using weak images.

Further, the teacher data candidate image acquisition device according to an embodiment of the present invention and a modification thereof includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3, see image identification unit 43 in FIGS. 3 and 7), and a learning image selection unit (for example, , the control section 20 in FIG. 1, S7 in FIG. 2, S8 in FIG. 5, the learning image selection section 46 in FIG. 7, etc.). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, an image to be used for learning is selected based on the difference in the inference result by the inference unit between the identified image pair and the determined image. Therefore, the inference model can be improved by relearning using weak images.

Further, the teacher data candidate image acquisition device according to an embodiment of the present invention and a modification thereof includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3 in FIG. 5, image identification section 43 in FIG. 3, FIG. 7, etc.) and a learning image selection section that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation section. (For example, see the control section 20 in FIG. 1, S7 in FIG. 2, S8 in FIG. 5, and the learning image selection section 46 in FIG. 7). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, images to be used for learning are selected based on differences in inference results by the inference unit between images determined to be temporally consecutive image pairs. Therefore, the inference model can be improved by relearning using weak images.

In one embodiment of the present invention and its modified examples, in addition to the imaging section 11, the endoscope apparatus 10 includes an image processing section 12, an inference section 13, a difference determination section 14, an image determination section 15, and a situation determination section. The explanation has been made on the assumption that the section 16, the information association section 17, the recording section 18, etc. are provided. However, the endoscope device 10 may be provided with other configurations, such as a display unit. Furthermore, it is not necessary to provide all the blocks within the endoscope apparatus 10, and they may be distributed as appropriate. In this case, it is only necessary to enable data communication between each device using a communication section or the like. Furthermore, external blocks such as the learning section 34 within the learning device 30 may be arranged.

Furthermore, in the embodiment of the present invention and its modified examples, the

control units

20 and 35 have been described as devices composed of a CPU, memory, and the like. However, in addition to configuring software using a CPU and programs, some or all of each part may also be configured as hardware circuits, such as those written in Verilog, VHDL (Verilog Hardware Description Language), etc. A hardware configuration such as a gate circuit generated based on a programming language may be used, or a hardware configuration using software such as a DSP (Digital Signal Processor) may be used. Of course, these may be combined as appropriate.

Further, the

control units

20 and 35 are not limited to the CPU, and may be any element that functions as a controller, and the processing of each unit described above may be performed by one or more processors configured as hardware. For example, each unit may be a processor configured as an electronic circuit, or each unit may be a circuit unit in a processor configured with an integrated circuit such as an FPGA (Field Programmable Gate Array). Alternatively, a processor including one or more CPUs may execute the functions of each unit by reading and executing a computer program recorded on a recording medium.

Furthermore, in the embodiment of the present invention and its modification, the endoscope device 10 and the learning device 30 have been described as having blocks that each perform respective functions. However, these need not be provided in a single device; for example, the above-mentioned units may be distributed as long as they are connected via a communication network such as the Internet.

Furthermore, in explaining one embodiment of the present invention and its modified examples, the explanation is based on an endoscope, since it is easy to explain an inspection scene using an endoscope. However, as long as the device performs some kind of inference when observing an object using image data, it can be widely applied. In recent years, even cameras built into mobile terminals and consumer cameras are sometimes used for purposes of determining whether they are medical devices. In this case, holding instability may cause a situation described as saccade-like image movement. Furthermore, in cameras mounted on cars, drones, robots, etc., similar image fluctuations are likely to occur in situations where the positional relationship with the object is unstable. The invention of the present application can be applied to these devices.

Furthermore, in each embodiment of the present invention, logic-based determination has been mainly described, and determination has been partially performed by inference using machine learning. In this embodiment, either logic-based determination or inference-based determination may be appropriately selected and used. Further, in the process of determination, a hybrid type determination may be performed by partially utilizing the merits of each.

In addition, in recent years, artificial intelligence that can make decisions based on various criteria at once is often used, and improvements such as making each branch of the flowchart shown here at once also fall within the scope of the present invention. Needless to say, it is something that can be included. If the user can input his or her preferences for such control, the embodiments described in this application can be customized in a direction suitable for the user by learning the user's preferences.

Furthermore, among the techniques described in this specification, the control mainly explained in the flowcharts can often be set by a program, and may be stored in a recording medium or a recording unit. The method of recording on this recording medium and recording unit may be recorded at the time of product shipment, a distributed recording medium may be used, or a recording medium may be downloaded from the Internet.

In addition, in one embodiment of the present invention, the operation in this embodiment has been explained using a flowchart, but the order of the processing procedure may be changed, or any step may be omitted. Steps may be added, and the specific processing content within each step may be changed.

In addition, even if the claims, specification, and operational flows in the drawings are explained using words expressing order such as "first" and "next" for convenience, in parts that are not specifically explained, This does not mean that it is essential to perform them in this order.

The present invention is not limited to the above-mentioned embodiment as it is, and can be embodied by modifying the constituent elements within the scope of the invention at the implementation stage. Moreover, various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components of different embodiments may be combined as appropriate.

DESCRIPTION OF SYMBOLS 10... Endoscope apparatus, 11... Imaging unit, 12... Image processing unit, 13... Reasoning unit, 14... Difference determination unit, 15... Image determination unit, 16... - Situation determination unit, 17... Information association unit, 18... Recording unit, 19... Communication unit, 20... Control unit, 30... Learning device, 31... Communication unit, 32... ...Image processing unit, 33...Recording unit, 34...Learning unit, 35...Control unit

Claims

Similar images are determined by comparing the contents of endoscopic images obtained by temporally continuous imaging and determining similar image groups, including dissimilar images, in which the same region is observed over a predetermined number of frames. a group determination section;
an inference unit that uses machine learning to infer a specific object image included in the captured endoscopic image;
an inference result difference calculation unit that calculates a difference between inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit;
a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit;
An endoscope characterized by having.
The similar image group includes the results determined according to the similarity of the images before and after the endoscopic images obtained by sequentially capturing the image group corresponding to the sudden change in viewpoint position. The endoscope according to claim 1.
The endoscope according to claim 1, further comprising an imaging information acquisition unit capable of acquiring imaging information corresponding to each image selected by the learning image selection unit.
The similar image group determination unit digitizes a pattern of the same object in the endoscopic image into a numerical value that can be tracked, and uses the numerical value to determine whether or not the images are similar. The endoscope according to claim 1.
The endoscope according to claim 1, wherein the learning image selection unit excludes images with extremely poor conditions.
A teacher data candidate image acquisition device for acquiring images for inference model learning,
A similar image group determination unit that compares the contents of images obtained by temporally consecutive imaging and determines a similar image group in which the same part is observed over a predetermined number of frames, including dissimilar images. and,
an inference unit that uses machine learning to infer a specific object image included in the captured image;
an inference result difference calculation unit that calculates a difference between inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit;
a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit;
A teacher data candidate image acquisition device comprising:
A training data candidate image acquisition method for acquiring images for inference model learning, the method comprising:
Comparing the contents of images obtained by temporally continuous imaging, determining a group of similar images that observe the same region even if they include dissimilar images over a predetermined number of frames,
Infer the specific target image included in the captured image using a machine learning model,
For each image determined to be included in the similar image group, calculate the difference between the inference results obtained using the machine learning model,
Selecting images to be used for learning based on the calculated difference in the above inference results,
A method for acquiring training data candidate images, characterized in that:
A computer that acquires training data candidate images for inference model learning,
Comparing the contents of images obtained by temporally consecutive imaging, determining a group of similar images that observe the same region even if they include dissimilar images over a predetermined number of frames;
Infer the specific target image included in the captured image using a machine learning model,
For each image determined to be included in the similar image group, calculate the difference between the inference results obtained using the machine learning model,
Selecting images to be used for learning based on the calculated difference in the above inference results,
A program that performs certain tasks.
an inference unit that infers the input image using a machine learning model;
an inference result change calculation unit that calculates the amount of change in the inference result of the identified image pair;
a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit;
A teacher data candidate image acquisition device comprising:
The teacher data candidate image acquisition device according to claim 9, further comprising an image identification unit that determines that the temporally continuous input pair images have substantially not changed.
The image identification unit determines that the input pair image is substantially unchanged based on at least one of the amount of movement of the corresponding point and the amount of change in brightness, saturation, and contrast of the image. The teacher data candidate image acquisition device according to claim 10.
The teacher data candidate image acquisition device according to claim 9, wherein the input image is a medical image.
The teacher data candidate image acquisition device according to claim 9, wherein the inference unit infers at least one of classification, detection, and region extraction of the input image.
The teacher data candidate image acquisition device according to claim 9, wherein the inference unit outputs the reliability of the inference.
The teacher data candidate image acquisition device according to claim 9, wherein the inference result change calculation unit uses at least one differential value of the inference result and reliability.
The teacher data candidate image acquisition device according to claim 9, wherein the learning image selection unit selects the teacher data candidate image to be used for the learning when the amount of change exceeds a specified value.
Infer the input image using a machine learning model,
Calculate the amount of change in the inference result for the identified image pair,
Selecting training data candidate images to be used for learning based on the calculated amount of change,
A method for acquiring training data candidate images characterized by the following.
A computer that acquires training data candidate images for inference model learning,
Infer the input image using a machine learning model,
Calculate the amount of change in the inference result for the identified image pair,
Selecting training data candidate images to be used for learning based on the calculated amount of change,
A program that performs certain tasks.
an inference unit that infers the input image using a machine learning model;
an inference result change calculation unit that calculates the amount of change in inference results for temporally consecutive image pairs;
a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit;
A teacher data candidate image acquisition device comprising:
The teacher data candidate image acquisition device according to claim 19, wherein the input image is a medical image.
The teacher data candidate image acquisition device according to claim 19, wherein the inference unit infers at least one of classification, detection, and region extraction of the input image.
The teacher data candidate image acquisition device according to claim 19, wherein the inference unit outputs the reliability of the inference.
The teacher data candidate image acquisition device according to claim 19, wherein the inference result change calculation unit uses at least one differential value of the inference result and reliability.
The teacher data candidate image acquisition device according to claim 19, wherein the learning image selection unit selects the teacher data candidate image to be used for the learning when the amount of change exceeds a specified value.
Infer the input image using a machine learning model,
Calculate the amount of change in the inference results for temporally consecutive image pairs,
Selecting training data candidate images to be used for learning based on the calculated amount of change,
A method for acquiring training data candidate images characterized by the following.
A computer that acquires training data candidate images for inference model learning,
Infer the input image using a machine learning model,
Calculate the amount of change in the inference results for temporally consecutive image pairs,
Selecting training data candidate images to be used for learning based on the calculated amount of change,
A program that performs certain tasks.