WO2023188169A1 - Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program - Google Patents

Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program Download PDF

Info

Publication number
WO2023188169A1
WO2023188169A1 PCT/JP2022/016192 JP2022016192W WO2023188169A1 WO 2023188169 A1 WO2023188169 A1 WO 2023188169A1 JP 2022016192 W JP2022016192 W JP 2022016192W WO 2023188169 A1 WO2023188169 A1 WO 2023188169A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
inference
unit
learning
Prior art date
Application number
PCT/JP2022/016192
Other languages
French (fr)
Japanese (ja)
Inventor
祐介 山本
大 伊藤
貴行 清水
浩一 新谷
修 野中
賢一 森島
優 齋藤
学 市川
Original Assignee
オリンパス株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by オリンパス株式会社 filed Critical オリンパス株式会社
Priority to PCT/JP2022/016192 priority Critical patent/WO2023188169A1/en
Publication of WO2023188169A1 publication Critical patent/WO2023188169A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B1/00Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
    • A61B1/04Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor combined with photographic or television appliances
    • A61B1/045Control thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to an endoscope used when generating an inference model for the medical field through learning, a teacher data candidate image acquisition device and a teacher data candidate image acquisition method for acquiring teacher data candidate images, and a program.
  • Patent Document 1 discloses a technique for efficiently transferring only medical images in which a desired body part is photographed from among a plurality of medical images. Disclosed.
  • Patent Document 1 The technology described in Patent Document 1 mentioned above is a technology that efficiently transfers medical images in which a desired region is photographed. Therefore, it seems difficult to select "medical images that are difficult to identify by machine learning” using the technique disclosed in Patent Document 1. That is, with conventional technology, it is difficult to collect only "medical images that are difficult to identify using machine learning" as useful data.
  • the present invention has been made in view of the above circumstances, and provides an endoscope and training data candidate image acquisition system that makes it possible to accurately select medical images that are difficult to identify using pre-prepared machine learning.
  • the purpose of the present invention is to provide a device, a method for acquiring training data candidate images, and a program.
  • the endoscope according to the first invention compares the contents of endoscopic images obtained by temporally consecutive imaging, and observes the same region over a predetermined number of frames.
  • a similar image group determination unit that determines a group of similar images including dissimilar images, and an inference unit that uses machine learning to infer a specific object image included in an endoscopic image obtained by imaging.
  • an inference result difference calculation unit that calculates a difference in inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit; and a learning image selection unit that selects an image to be used for learning based on the difference between the above-mentioned inference results.
  • the similar image group is an endoscope obtained by sequentially capturing a group of images corresponding to a sudden change in viewpoint position movement. Contains the results determined according to the similarity of the images before and after the mirror image.
  • the endoscope according to a third aspect of the invention in the first aspect, includes an imaging information acquisition section capable of acquiring imaging information corresponding to each image selected by the learning image selection section.
  • the similar image group determining section digitizes a pattern of the same object in the endoscopic image into a numerical value that allows tracking of the pattern of the same object in the image, It is determined whether the images are similar using the above numerical values.
  • the learning image selection section excludes images with extremely poor conditions.
  • a teacher data candidate image acquisition device is a teacher data candidate image acquisition device for acquiring images for inference model learning
  • the teacher data candidate image acquisition device is a teacher data candidate image acquisition device for acquiring images for inference model learning.
  • a similar image group determination unit that compares and determines a group of similar images in which the same region is observed over a predetermined number of frames, including dissimilar images; and a specific target object image included in the captured images.
  • an inference unit that infers the above using machine learning, and an inference result difference calculation unit that calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit.
  • a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit.
  • a teaching data candidate image acquisition method is a teaching data candidate image acquisition method for acquiring images for inference model learning, in which the contents of images obtained by temporally continuous imaging are acquired.
  • a machine learning model is used to compare images of specific objects included in the captured images to determine whether similar images include dissimilar images over a predetermined number of frames but which observe the same area. For each image determined to be included in the similar image group, the difference between the inference results inferred using the machine learning model is calculated, and based on the difference in the calculated inference results. , select images to use for learning.
  • the program according to the eighth invention has a computer that acquires teacher data candidate images for inference model learning compare the contents of images obtained by temporally consecutive images, and compares the contents of images obtained by sequentially capturing images over a predetermined number of frames. Determine the similar image group that observes the same part even if it contains similar images, infer the specific target image included in the captured image using a machine learning model, and apply it to the above similar image group. For each image determined to be included, calculate a difference in inference results inferred using the machine learning model, and select an image to be used for learning based on the calculated difference in the inference results. Execute.
  • a teacher data candidate image acquisition device includes an inference unit that infers an input image using a machine learning model, and an inference result change calculation unit that calculates an amount of change in an inference result of an identified image pair. , a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit.
  • a teacher data candidate image acquisition device in the ninth invention, includes an image identification section that determines that the temporally continuous input pair images are substantially unchanged.
  • the image identification unit detects at least one of the amount of movement of the corresponding points and the amount of change in brightness, saturation, and contrast of the image. Based on this, it is determined that the input pair image has not changed substantially.
  • a teacher data candidate image acquisition device is the teacher data candidate image acquisition device according to the ninth invention, wherein the input image is a medical image.
  • the inference section infers at least one of classification, detection, and region extraction of the input image.
  • a teacher data candidate image acquisition device is based on the ninth invention, wherein the inference section outputs the reliability of the inference.
  • the inference result change calculation section uses at least one differential value of the inference result and reliability.
  • the learning image selection section selects the teacher data candidate image to be used for learning when the amount of change exceeds a specified value. select.
  • a teaching data candidate image acquisition method infers an input image using a machine learning model, calculates an amount of change in the inference result of the identified image pair, and based on the calculated amount of change. to select the training data candidate images to be used for learning.
  • the program according to the eighteenth invention causes a computer that acquires training data candidate images for inference model learning to infer an input image using a machine learning model, and calculates the amount of change in the inference result of the identified image pair. Then, based on the calculated amount of change, a teacher data candidate image to be used for learning is selected.
  • the teacher data candidate image acquisition device includes an inference unit that infers an input image using a machine learning model, and an inference result change calculation that calculates the amount of change in the inference results of a pair of temporally consecutive images. and a learning image selection unit that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation unit.
  • a teacher data candidate image acquisition device is the nineteenth invention, wherein the input image is a medical image.
  • the inference section infers at least one of classification, detection, and region extraction of the input image.
  • a teacher data candidate image acquisition device is the nineteenth invention, wherein the inference section outputs the reliability of the inference.
  • the inference result change calculation section uses at least one differential value of the inference result and reliability.
  • the learning image selection unit selects the teacher data candidate image to be used for learning when the amount of change exceeds a specified value. select.
  • the training data candidate image acquisition method infers an input image using a machine learning model, calculates the amount of change in the inference result of a temporally consecutive pair of images, and calculates the amount of change that is calculated. Based on this, select teacher data candidate images to be used for learning.
  • a program according to the twenty-sixth invention provides a computer that acquires training data candidate images for inference model learning, infers an input image using a machine learning model, and calculates the amount of change in inference results for temporally consecutive image pairs. is calculated, and a teacher data candidate image to be used for learning is selected based on the calculated amount of change.
  • an endoscope a teacher data candidate image acquisition device, a teacher data candidate image acquisition method, and a program that enable machine learning prepared in advance to accurately select medical images that are difficult to identify.
  • FIG. 1 is a block diagram mainly showing the electrical configuration of an endoscope system according to an embodiment of the present invention.
  • 3 is a flowchart showing an imaging operation in an endoscope system according to an embodiment of the present invention.
  • 7 is a flowchart showing the operation of similar image group determination in the endoscope system according to an embodiment of the present invention.
  • FIG. 3 is a diagram illustrating determination of a similar image group in an endoscope system according to an embodiment of the present invention.
  • 12 is a flowchart illustrating a modified example of the similar image group determination operation in the endoscope system according to an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating selection of a similar image group in an endoscope system according to an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating the flow of data between blocks in an endoscope system according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing an example of a data structure in a recording unit of an endoscope system according to an embodiment of the present invention. It is a figure which shows the modification of the structure of the data in the recording part of the endoscope system based on one Embodiment of this invention.
  • the present invention can be applied to a device that acquires candidate images that serve as training data for learning, for example, when creating or improving an inference model using image data based on an imaging signal such as an endoscope, and maintaining performance.
  • Equipment such as endoscopes are used by professionals such as doctors, and they approach an object, illuminate it as necessary, and display the continuously captured image data as an image on a display device. It is possible to display. While viewing this image visually or with the aid of an inference model, the expert carefully observes areas of concern. Therefore, when observing the same object, it is possible to obtain a plurality of similar image frames from sequentially obtained image frames.
  • the degree of image change can be determined by digitizing the content of images obtained by temporally continuous imaging and comparing the numeric values for each image, or by digitizing differences in image data.
  • Embodiments according to the present invention include a similar image group determining unit that determines a similar image group in which the image change is less than or equal to a predetermined value, and can determine whether the same object is being observed.
  • an inference unit inputs each sequentially obtained image frame into machine learning and performs inference, and an inference unit performs inference on each image determined to be included in a similar image group by a similar image determination unit. It has a determination unit that determines the difference between the results.
  • FIG. 1 is a block diagram mainly showing the electrical configuration of the endoscope system.
  • This endoscope system includes an endoscope device 10 and a learning device 30.
  • each block in the endoscope device 10 does not need to be provided in a single device, and may be divided into a plurality of devices.
  • the imaging unit 11 and each other block may be provided in separate devices. In this case, each unit other than the imaging unit 11 may be connectable to an intranet within a hospital or the like.
  • the endoscope system may be linked with an in-hospital system, an electronic medical record system, etc. If images and the like recorded in the recording unit 18, which will be described later, have confidentiality obligations, etc., the endoscope device 10 may not output them to the learning device 30, for example. Furthermore, if the learning device 30 is located within a hospital, or if the contract allows data exchange, etc., the learning device 30 may be an in-hospital system. However, although the following explanation assumes a situation where an organization outside the hospital is developing an inference model using a learning device, the present invention is not limited to this. Furthermore, each block within the learning device 30 may be divided into a plurality of devices, and the learning section 34 may be located within the server.
  • the endoscope apparatus 10 includes an imaging section 11, an image processing section 12, a display section 12a, an inference section 13, a difference determination section 14, an image determination section 15, a situation determination section 16, an information association section 17, a recording section 18, and a communication section. 19, and a control section 20.
  • each block in the endoscope device 10 may be separated from the endoscope device 10 and used as another device.
  • An example is shown that is equipped with the same hardware as the device. However, some functional blocks can be omitted if the roles are shared, such as by the endoscope device 10 cooperating with the learning device 30.
  • the endoscope device 10 includes an insertion portion made of a long cylindrical tube for inserting into a cavity or a tubular object to observe the inside, but this insertion portion is flexible. There are things we have and things we don't have. Further, an imaging section 11 is often provided at the distal end of the insertion section.
  • the imaging unit 11 is assumed to have an imaging unit, a light source unit, an operation unit, a treatment unit, and the like.
  • the imaging unit includes an optical lens for imaging, an imaging element, an imaging control circuit, and the like.
  • the camera has an automatic focus adjustment function, it includes a focus detection circuit, an automatic focus adjustment device, and the like.
  • Optical lenses form optical images of objects.
  • the image sensor is arranged near a position where an optical image is formed, converts the optical image into an image signal, performs AD conversion on this image signal, and then outputs it to the image processing section 12.
  • the imaging control circuit receives an instruction to start imaging from the control unit 20, it performs readout control of image signals from the imaging device at a predetermined rate.
  • the light source section in the imaging section 11 provides illumination to brighten the walls of the digestive tract in the body cavity, etc., in order to facilitate observation.
  • the light source section includes a light source such as a laser light source, an LED light source, a xenon lamp, and a halogen lamp, and also has an optical lens for illumination. Since tissue detection characteristics change depending on the wavelength of the illumination light, the light source unit may have a function of changing the wavelength of the light source, and image processing may be changed using a known method in accordance with the wavelength change. Detection does not necessarily need to be performed visually by a doctor.
  • the operation unit in the imaging unit 11 includes operation members for instructing the shooting of endoscopic still images, the start and end of shooting of endoscopic moving images, operating units that operate in connection with operations, and treatment units. It has a function execution section, etc. Further, it may include an operation member for instructing switching of the focus of the endoscopic image. Changes caused by the above operations can become parameters for image changes.
  • the operating section in the imaging section 11 has an operating angle knob for bending the tip of the insertion section of the endoscope. It also has a function execution unit for executing the function of supplying air and water into the body cavity through the flexible tube, and the function of suctioning air and liquid.
  • the treatment section has a function execution section such as treatment tools such as biopsy forceps for performing a biopsy to collect a part of tissue, and a snare and high-frequency scalpel for removing affected areas such as polyps. It may also have a treatment tool such as the following.
  • the function execution unit (which may be broadly classified as an operation unit) of these treatment tools and the like can be operated by an operation member that operates the function execution unit.
  • the above operations may change the shape of the affected area or cause bleeding, and if heat or the like is used during the operation, steam, smoke, water spray, etc. may be generated.
  • the image processing section 12 has an image processing circuit, receives image signals from the imaging section 11, and performs various image processing such as development processing in accordance with instructions from the control section 20.
  • the image processing section 12 outputs the image data subjected to image processing to the display section 12a, the inference section 13, the image determination section 15, and the communication section 19.
  • Image processing in the image processing unit 12 includes adjustment of the color and brightness of the image, as well as enhancement processing such as contrast enhancement and edge enhancement to improve visibility, and gradation processing to create natural gradations. Furthermore, processing such as HDR (High Dynamic Range) processing and super resolution processing, which improve image quality using a plurality of image frames, may be performed.
  • the image processing unit 12 functions as an image processing unit (image processing circuit) that processes image frame information into visible information.
  • the image processing section 12 may be omitted from the endoscope apparatus 10 by entrusting the above-mentioned functions to the image processing section 32 in the learning device 30.
  • the endoscope device 10 requires independence as an IoT device, providing the image processing unit 12 within the endoscope device 10 increases the degree of freedom, such as being able to transmit images to the outside. be able to.
  • the display unit 12a has a display device such as a display monitor and a display control circuit.
  • the display section 12a receives image data processed by the image processing section 12 according to control signals from the control section 20, and displays endoscopic images and the like. Further, an endoscopic image or the like on which the inference result by the inference unit 13 is superimposed may be displayed.
  • the inference unit 13 includes an inference engine, receives image data of image frames from the imaging unit 11, and performs inference.
  • An inference model for inferring this advice is set in the inference engine.
  • the inference engine may be configured by hardware or software.
  • the inference unit 13 may include a forward propagation neural network or the like, and this forward propagation neural network will be explained in the learning unit 34.
  • the data input to the inference engine is not limited to the image data of the image frame, but situational information (related information, auxiliary information) at the time of obtaining the frame image may be input to perform inference. By making inferences using situational information, more reliable inference results can be obtained.
  • inference is performed frame by frame, but it is not limited to each frame, but may be performed every few frames according to requirements such as visibility of the inference results, or by inputting several consecutive frames to the inference section. You may also do this.
  • the inference unit 13 inputs image data and infers using an inference model generated by machine learning, thereby inferring advice for providing support to a doctor during diagnosis, for example.
  • the inference unit 13 also specifies the object (affected part, organ, etc.) shown in the image acquired by the imaging unit 11, specifies its position, and determines the range of the affected part, etc. It is also possible to perform inferences such as specifying the image, segmenting the image, classifying it, and determining whether it is good or bad.
  • the inference unit 13 performs inference according to instructions from the control unit 20 and outputs the inference result to the control unit 20. Furthermore, when inference is made, the inference unit 13 calculates the reliability of the inference and outputs the calculated reliability value to the difference determination unit 14 .
  • the inference results changes, and various methods can be used to detect differences in the inference results. For example, when inferring pass/fail judgment, the ⁇ / ⁇ display changes over time; when inferring location, the coordinates of a specified position on the screen change over time; When inferring a range), the shape and area of each segment within the screen change over time, so these can be used as changes in the inference result.
  • the image input to the inference unit 13 is an image from the imaging unit 11, and here it is assumed that it is a medical image from an endoscope or the like.
  • medical images are not limited to endoscopic images acquired by endoscopic devices, but can also be acquired by medical equipment such as X-ray machines, MRI machines, ultrasound diagnostic machines, dermatology cameras, dental cameras, etc. It may also be an image.
  • this embodiment can deal with sudden changes in visual field that are unique to an endoscope.
  • the input image input to the inference unit 13 does not have to be limited to a medical image.
  • sudden changes in field of view can occur with industrial endoscopes and cameras mounted on robotic vehicle-mounted drones.
  • an endoscope is a device that is inserted and used through a narrow insertion hole. For this reason, it has a tubular tip portion with a narrow outer diameter, and the tip portion tends to shake, and this shaking greatly affects the acquired image. When the object is close, the change becomes particularly large. Shaking the tip vertically and horizontally changes the composition, and shaking the tip in the direction of insertion leads to a change in the size of the object.
  • Changes in the vertical and horizontal directions may cause unnecessary objects to appear in the image, and changes in relative distance may cause the size of the object to change. Further, when the image is rotated due to twisting, or when the imaging unit is not directly facing the object to be observed and is observed from an oblique angle, distortion or the like may occur in the image. These situational changes also become parameters that can change the image, and as I repeat, the image may change significantly in a complex manner. This change can occur not only in endoscopes but in many inspection devices that use images.
  • the inference unit 13 functions as an inference unit (inference engine) that infers a specific object image included in a captured image (an endoscopic image may be used) using a machine learning model (for example, as shown in FIG. (See S1 in FIG. 5 and the inference unit 44 in FIG. 7). Further, the inference unit 13 functions as an inference unit (inference engine) that infers an input image using a machine learning model (for example, see S1 in FIGS. 2 and 5, and the inference unit 44 in FIG. 7).
  • the input image described above is a medical image.
  • the above-mentioned inference unit infers at least one of classification, detection, and region extraction of the input image (for example, see S1 in FIGS. 2 and 5).
  • the inference unit also outputs the reliability of the inference (for example, see S1 in FIGS. 2 and 5).
  • the difference determination unit 14 may include a difference determination circuit, or may be realized by a processor including a CPU or the like executing a program.
  • the difference determining unit 14 calculates the difference between the results of the inference by the inference unit 13 (this difference has already been explained using an example). That is, since the inference unit 13 performs inference for each frame and outputs the inference result, the difference determination unit 14 calculates the difference between the inference results. In calculating the difference (change), the differential value of the inference result may be used. Furthermore, in calculating the difference (change), a change in the value indicating reliability calculated by the inference unit 13 may be calculated.
  • the inference unit 13 since the inference unit 13 performs inference every time image data is input and calculates the reliability of the inference result, a difference may be determined based on a change in this reliability value. Furthermore, as will be described later, since the image determining unit 15 determines whether the image belongs to a similar image group, the difference determining unit 14 determines the difference between each image included in the similar image group.
  • the difference determination unit 14 functions as an inference result difference calculation unit that calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit (for example, in FIGS. (See S5 in FIG. 5).
  • To calculate the difference between the inference results for example, at least one of a change in the position of the specific object, a change in the size, a change in the inferred range, etc. may be calculated.
  • the image determination unit 15 may include an image determination circuit, or may be realized by a processor including a CPU or the like executing a program.
  • the image determination unit 15 determines whether or not the image data inputted in time series from the image processing unit 12 belongs to a similar image group.
  • a doctor or the like inserts the distal end of an endoscopic device in which an imaging device is arranged into a body cavity, the imaging device acquires an endoscopic image.
  • the distal end approaches a specific object such as an affected area
  • many images of the vicinity of the affected area are acquired in order for a doctor or the like to carefully observe the affected area.
  • the image determination unit 15 may determine whether the image group includes this specific target object.
  • the specific object is an object to be observed or inspected with an endoscope or the like
  • the specific object image refers to the portion of the image of the specific object within the screen.
  • Various objects appear in an endoscopic image, and among these objects, an object that has been determined as a detection target according to the specifications of the AI (inference unit) is called a specific object.
  • the object in the acquired image changes due to minute vibrations of the tip, the position of a specific object in the image changes suddenly, or unnecessary objects enter the image. There are many things. Therefore, the position and size of the object change drastically.
  • the image determination unit 15 digitizes the content of the image and determines whether the images are similar or not, and whether or not the continuous input pair images are substantially unchanged. This numerical value is such that the pattern of the same object in the image can be tracked. This numerical value may be determined by detecting changes in results, etc. in time series, as described above. Furthermore, temporally adjacent images (or inference results of images), etc. may be compared. A known method may be used as appropriate to quantify the similarity of images. Here, numerical values such as composition, color, brightness, etc. are assumed, and it is assumed that changes in the size of the target object, whether the target object is within the screen, etc. can also be determined using these numerical values.
  • the image determination unit 15 determines similar image groups, it converts the content of the images into numerical values. may include dissimilar images that cannot be said to be similar. As will be described later with reference to FIG. 6, the distal end of the endoscope has small movements, which makes it difficult to stare at it and makes it easy for objects to fall out of the image. Therefore, when the same region is observed over a predetermined number of frames, the similar image group may include dissimilar images during the observation. The determination of similar image groups will be described later using FIG. 3.
  • the image determining unit 15 may also determine that the two input images have not changed substantially by, for example, calculating the amount of movement of corresponding points, or may determine that the two input images have not changed substantially. It may be determined that the two images are substantially unchanged based on at least one of the quantities.
  • the image determination unit 15 compares the contents of images (which may be endoscopic images) obtained by temporally consecutive imaging, and selects a group of similar images in which the same region is observed over a predetermined number of frames. , functions as a similar image group determination unit that performs determination including dissimilar images (for example, see S3 in FIGS. 2 and 5, and the image identification unit 43 in FIGS. 3 and 7). Further, the similar image group determination unit described above converts the pattern of the same object in the endoscopic image into a numerical value that can be tracked, and uses the numerical value to determine whether or not the images are similar (for example, , see Figure 3).
  • an observer When performing an endoscopy, an observer may stare at the same object or change the observation method to confirm a specific object in order to sufficiently confirm the object with the naked eye. Therefore, the number of frames corresponding to the time required for such observation is the predetermined number of frames. Furthermore, as described above, when observing with an endoscope, the distal end cannot be fixed in space and moves slightly, making it difficult to stare at the object and making it easy for the observed image to deviate from the object. Similar image groups should include images that may include teacher image candidates important for diagnosis (e.g., images in which the same body part is observed), which correspond to weak images or missed images as described above. ing. Therefore, a group of images that may include similar images is also determined to be a group of similar images, including dissimilar images.
  • the above-mentioned similar image group may include the results determined according to the similarity of images before and after endoscopic images obtained by sequentially capturing a group of images corresponding to a sudden change in viewpoint position. good.
  • the similar image group is a group of rapidly changing images that corresponds to the blurring of visual images caused by unconscious eye movements called saccades (in the case of humans, this is corrected by the brain and is not noticed).
  • the results determined according to the similarity of the images may be included before and after the endoscopic images obtained by consecutively capturing images.
  • the image corresponding to the above-mentioned saccade will be described later using FIGS. 4 and 6, but it is an image corresponding to eyeball movement.
  • the image change equivalent to a saccade is caused by a change in the relative positional relationship between the tip of the endoscope and the object, that is, a change in the relative position between the tip of the endoscope and the object.
  • This is a conceptual representation of the sudden change in the image depending on the positional relationship with the light source.
  • the above-mentioned image change equivalent to a saccade is related to the special characteristics of an examination using an endoscope or the like.
  • the image will change significantly even though the relative change is relatively small.
  • an object is captured within the screen, its size and position within the screen are likely to change suddenly, and its exposure and brightness distribution are likely to change suddenly.
  • these images cannot be taken on purpose, and may or may not occur depending on the situation, so it is difficult to use them as intentional training data.
  • a situation in which a sudden image change similar to the image change equivalent to a saccade as described above occurs is likely to occur in endoscopic images.
  • the object may be immersed in body fluids or cleaning fluids, may be subjected to operations during endoscopic observation such as suction or air supply, special light observation, or staining.
  • various methods are used in combination to observe the object, such as moving the object and operating the treatment instruments in conjunction. In some cases, these things occur simultaneously, leading to complex changes, and unintended (or unconscious) sudden changes in the image are extremely likely to occur. In response to such situations, it is desirable to correctly confirm the target object.
  • dissimilar images that satisfy the above conditions are treated as images that form a group of similar images.
  • the contents of endoscopic images obtained by temporally consecutive imaging are compared, and a predetermined A similar image group determination is performed on a group of similar images in which the same part is observed over a number of frames, including dissimilar images.
  • the content of an image refers to the characteristics of the image, such as the shape, pattern, shadow, color, size, and position (including rotation and distortion, but it may also be corrected) of the object being imaged. image features).
  • the situation determination unit 16 may include a situation determination circuit, or may be realized by a processor including a CPU or the like executing a program.
  • the situation determination unit 16 determines information regarding the usage status of the endoscope apparatus 10 used by a doctor or the like. For this purpose, the situation determining unit 16 determines, based on the image data acquired by the image processing unit 12, whether the positional relationship between the distal end portion where the imaging device is provided and the wall surface in the body cavity is directly facing each other. , it may be determined whether it is oblique. Further, it may be determined whether the image is in focus or not, how deep the image is, etc. Further, the usage status of the light source section in the imaging section 11 may be determined.
  • narrow band imaging For example, if the wavelength of the light source light is known, it can be determined whether narrow band imaging (NBI) is being performed. Furthermore, the usage status of the treatment section within the imaging section 11 may be determined. For example, if a doctor or the like performs a water injection operation, the effect of water injection will appear on the image, and if a doctor or the like performs a suction operation, the effect of suction will appear.
  • NBI narrow band imaging
  • the determination result of the situation determination section 16 is output to the control section 20.
  • the image data is associated with the situation determination result at that time. If the distal end is not directly facing the affected area, etc., but is at an oblique position, the affected area, etc. is difficult to see, making it difficult to make inferences. Furthermore, when a water injection operation is performed, the screen is affected by the water, making it difficult to see the affected area, making it difficult to make inferences. If such situational information is available, and images are inferred together with this situational information, it becomes possible to improve the reliability of inference for finding affected areas.
  • the situation determination unit 16 functions as an imaging information acquisition unit that can acquire imaging information corresponding to each image selected by the learning image selection unit (for example, see S6 in FIG. 5).
  • the information association unit 17 may include an information association circuit, or may be realized by a processor including a CPU or the like executing a program.
  • the information association unit 17 uses the situation information determined by the situation determination unit 16, the similar image information determined by the image determination unit 15, and the difference determination unit 14 for the image data processed by the image processing unit 12. At least one piece of information, such as information regarding a difference in confidence values of the determined inferences, is associated.
  • the image data associated with information by the information association section 17 is recorded in the recording section 18 . Further, regarding the image data for which the difference determination unit 14 determines that the inference reliability value has a difference greater than a predetermined value, the communication unit 19 is notified of this fact. As will be described later, the communication unit 19 transmits the notified image data to the learning device 30, and uses it as teacher data when relearning.
  • the recording unit 18 includes electrically rewritable nonvolatile memory and/or volatile memory. This recording unit 18 stores a program for operating the CPU and the like in the control unit 20. It also stores model information of the endoscope device 10, various characteristic values and adjustment values within the endoscope 10, and the like. Furthermore, image data processed by the image processing section 12 and associated with information by the information association section 17 may be recorded. Details of the data structure recorded in the recording unit 18 will be explained using FIGS. 8 and 9. Note that the recording unit 18 may not record all of the information as shown in FIGS. 8 and 9, and may be shared with the recording unit 33 in the learning device 30.
  • the recording unit 18 stores the image data of the original moving image acquired by the imaging unit 11 and processed by the image processing unit 12, and the image data with information associated with it by the information association unit 17. Record.
  • the reason why the original moving image is recorded in the recording unit 18 is to improve the inference model later based on the recorded content. Therefore, it is not necessary that the image data is the selected image data itself, but it is sufficient that the image data can be easily found by searching for the image data.
  • the information may be such that the original examination video is recorded as is and a specific frame can be selected from among the recorded images. For example, it shows a specific frame, such as an image at the start of an inspection, an image at the time of detection of a specific object, etc. It can also be information for This information will be described later using FIGS. 8 and 9.
  • an inference model such as when improving an inference model by relearning, customizing it for a specific situation, increasing its versatility, or creating a new inference model.
  • data representing the various backgrounds at the time of data acquisition is useful when actually using the data, such as which hospital and which doctor performed the test, the presence or absence of personal information, and the presence or absence of various contracts such as informed consent, terms of use, etc. It may also be possible to record information such as:
  • the communication unit 19 has a communication circuit (transmission/reception circuit), and communicates with the communication unit 31 in the learning device 30.
  • the communication unit 19 can reduce the communication and recording load by selecting and transmitting necessary information from the information recorded in the recording unit 18. Additionally, security can be increased by not sending unnecessary data. For example, images acquired by the imaging unit 11 that have different inference results are transmitted to the learning device 30 for learning. However, if there are no restrictions or based on a contract, all information (including images) may be sent. At this time, the communication unit 19 sends the image data that has been associated with information by the information association unit 17 and that has a difference in the inference result of each image included in the similar image group by the difference determination unit 14 to the learning device 30. It may also be transmitted to the communication section 31 inside.
  • Communication can also be used when replacing an inference model, and the communication unit 19 receives the inference model generated by the learning unit 34 in the learning device 30 through the communication unit 31.
  • each request signal is sent and received, and information that satisfies the conditions is sent and received in accordance with the request signal.
  • the control unit 20 is a processor that includes a CPU (Central Processing Unit), its peripheral circuits, memory, and the like. This processor may be one, or may be composed of multiple chips.
  • the CPU executes the entire endoscope apparatus 10 by controlling each part within the endoscope apparatus 10 according to a program stored in the memory. Each part within the endoscope apparatus 10 is realized by being controlled by software using a CPU. All or part of the difference determination section 14, image determination section 15, situation determination section 16, and information association section 17 described above may be realized by a processor within the control section 20.
  • the above-mentioned processor may realize all or part of the similar image group determination section, the image identification section, the inference result difference calculation section, the inference result change calculation section, the learning image selection section, and the imaging information acquisition section.
  • control section 20 may realize all or part of the functions of the imaging section 11, the image processing section 12, and the inference section 13. Further, the control unit 20 may operate in cooperation with the control unit 35 in the learning device 30, so that the endoscope device 10 and the learning device 30 operate as one.
  • the control unit 20 performs a difference determination on the inference result (reliability etc.) inferred by the inference unit 13 for the image determined by the image determination unit 15 to be an image forming part of a similar image group.
  • an image to be used for relearning is selected (for example, see S7 in FIG. 2 and S8 in FIG. 5). For example, when a situation changes from a high-confidence situation to a low-confidence situation, an image with a high confidence level is generally likely to have been created using a similar image when creating an inference model. On the other hand, images with low reliability are likely not to be used when creating an inference model.
  • images included in a group of similar images may be images of the same object, even if they are dissimilar. Such images are not good for inference models, and it is desirable to be able to make accurate inferences even for such difficult images. Therefore, in this embodiment, when such an inference result (reliability or the like) changes, the image at that time is selected as a candidate image for learning.
  • an expert such as a doctor may determine whether a candidate image can be adopted as training data. In other words, if you leave it unselected, it will be adopted as training data, which will take time and effort to improve, but if you select it, it will be candidateized as training data and you can immediately use it to improve the inference model. This is because there are cases.
  • the control unit 20 functions as a learning image selection unit that selects an image to be used for relearning based on the difference in the inference results calculated by the inference result difference calculation unit (for example, S7 in FIG. 2, S7 in FIG. (S8, see learning image selection unit 46 in FIG. 7). Images are selected based on differences in inference results, for example, when the reliability changes by more than a predetermined value, when the size of a specific object changes suddenly, or when the position of the object in the image is determined. If there is a difference in the inference results, such as when the value suddenly changes by more than a predetermined value, the changed image is selected for relearning.
  • the selection here may be a selection for recording the image frame itself, but is not limited to this, and may also be a method of selecting information that allows an immediate search for the image frame.
  • the above-mentioned learning image selection unit may omit images that do not satisfy specific conditions or have extremely poor conditions. Under these specific conditions, for example, even if image correction or the like is performed, there may be cases where visibility is too low.
  • images to be used for relearning are selected from among the images included in the similar image group based on the difference in the inference results by the inference result calculation unit. Therefore, at first glance, images that are dissimilar to images observing the same region may also be selected as candidate images for learning.
  • images with extremely poor conditions such as images where the entire screen has become completely white due to water injection, or images where the entire screen has become completely black due to the position of the light source, may be used for specific purposes. It can be said that there is a very low possibility that something is included.
  • the image may become extremely poor. This is because when the exposure time etc. becomes longer, the relative positional relationship between the specific object and the imaging unit 11 changes during that time, resulting in poor images such as blurring of the image.
  • the learning image selection unit excludes from the selection targets images with extremely poor conditions such that it is impossible to include an image of such a specific object. This may be determined based on the contrast of the image data acquired by the imaging unit 11, the amount of image blur, etc. Furthermore, without directly determining the image quality, it may be determined that the image is in extremely poor condition based on the photographing conditions or the like. In addition, when combining multiple frames of images, such as HDR (High Dynamic Range) shooting or depth compositing shooting, it is easy to obtain an image with extremely poor conditions, so in this case as well, depending on the shooting conditions, It may be determined that the image is under extremely poor conditions.
  • HDR High Dynamic Range
  • the control unit 20 also functions as a learning image selection unit that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation unit (for example, S7 in FIG. 2, (See S8 in FIG. 7, the learning image selection unit 46 in FIG. 7).
  • This learning image selection unit selects a teacher data candidate image to be used for learning when the amount of change exceeds a specified value. In other words, if the amount of change is larger than a predetermined value, it may be a different image even though it is almost the same identified image, and it may not have been used for learning before, so it is considered a learning candidate. Select as image.
  • the learning device 30 is assumed to be a part that controls the creation and improvement of an inference model, and is assumed to be located outside a hospital (examination facility).
  • the learning device 30 includes a communication unit 31 for exchanging data through communication, and also includes an image processing unit 32, a recording unit 33, and a learning unit in order to efficiently perform highly specialized learning. 34, and a control section 35.
  • the recording unit 33 is located outside, communication may become a constraint during learning, but the learning device 30 may cooperate with a recording unit located at another location through communication.
  • the learning device 30 may be located on a server or the like, and in that case, it is connected to the endoscope device 10 through a communication network such as the Internet.
  • the learning device 30 is connected to a large number of devices, receives a large amount of teacher data from these devices, performs learning using this teacher data, and generates an inference model.
  • the learning device 30 may receive teacher data candidates such as image data, perform annotation to create teacher data, and use this teacher data to generate an inference model.
  • the communication unit 31 has a communication circuit (transmission/reception circuit), and communicates with the communication unit 19 in the endoscope device 10. As described above, the communication unit 31 receives image data in which the inference results of the images included in the similar image group are different. Further, the communication unit 31 transmits the inference model generated by the learning unit 34 to the communication unit 19. Further, the communication unit 31 transmits to the communication unit 19 an inference model generated by relearning using the image data for relearning selected from the endoscope apparatus 10 based on the difference in the inference results. As mentioned above, if the recording section 33 is located outside, communication etc. may become a constraint during learning, but it is possible to cooperate with a recording section located at another location through the communication section 31. Good too.
  • the image processing unit 32 has an image processing circuit, receives image data from the endoscope device 10, and performs various image processing such as development processing in accordance with instructions from the control unit 35.
  • the image processing may be the same as the processing in the image processing section 12, or the processing contents may be changed as appropriate.
  • the processed image data may be recorded in the recording unit 33, or may be displayed on a display device or the like.
  • image processing may be performed during learning.
  • the learning images recorded in the recording unit 33 may be subjected to image processing to make them easier to learn, such as changing the image size and performing emphasis processing to make them easier to process. .
  • the image processing section 32 may intentionally generate an image that is difficult to judge, and the learning section 34 may use this image for testing when learning.
  • the recording unit 33 includes electrically rewritable nonvolatile memory and/or volatile memory. This recording unit 33 stores a program for operating the CPU and the like in the control unit 35. The recording unit 33 also stores various characteristic values, adjustment values, etc. within the learning device 30. Furthermore, the recording unit 33 records training training data (including image data for re-learning selected based on differences in inference results) transmitted from the endoscope apparatus 10. Details of the data structure recorded in the recording unit 33 will be explained using FIGS. 8 and 9. Note that the recording unit 33 may not record all of the information as shown in FIGS. 8 and 9, and may be shared with the recording unit 18 in the endoscope apparatus 10.
  • the learning unit 34 includes an inference engine, and performs machine learning such as deep learning using the learning teacher data recorded in the recording unit 33 to generate an inference model.
  • machine learning such as deep learning using the learning teacher data recorded in the recording unit 33
  • relearning is performed. By performing relearning, it is possible to generate an inference model that can perform highly reliable inferences even for weak images or missed images.
  • Deep learning is a multilayered version of the "machine learning” process that uses neural networks.
  • a typical example is a forward propagation neural network, which sends information from front to back to make decisions.
  • the above-mentioned inference section 13 also includes a forward propagation neural network.
  • the simplest version of a forward propagation neural network (for example, the inference engine 13A in FIG. 6 has a similar structure) has an input layer consisting of N1 neurons and N2 neurons given by parameters. It is sufficient to have three layers: an intermediate layer composed of an output layer composed of N3 neurons corresponding to the number of classes to be discriminated. Each neuron in the input layer and the intermediate layer, and the intermediate layer and the output layer, are connected by connection weights, and a bias value is added to the intermediate layer and the output layer, thereby easily forming a logic gate.
  • a neural network may have three layers if it performs simple discrimination, but by having a large number of intermediate layers, it is also possible to learn how to combine multiple features in the process of machine learning. In recent years, systems with 9 to 152 layers have become practical in terms of learning time, judgment accuracy, and energy consumption.
  • a "convolutional neural network” that performs a process called “convolution” that compresses image features, operates with minimal processing, and is strong in pattern recognition may be used.
  • a “recurrent neural network” (fully connected recurrent neural network) that can handle more complex information and that allows information to flow in both directions may be used to support information analysis whose meaning changes depending on order and order.
  • NPUs neural network processing units
  • AI artificial intelligence
  • machine learning methods include, for example, support vector machine and support vector regression.
  • the learning here involves calculating the weights, filter coefficients, and offsets of the classifier, and in addition to this, there is also a method that uses logistic regression processing.
  • logistic regression processing When a machine makes a decision, humans need to teach the machine how to make a decision.
  • a method of deriving the image judgment by machine learning is adopted, but a rule-based method that applies rules acquired by humans using empirical rules and heuristics may also be used.
  • the control unit 35 is a processor that includes a CPU (Central Processing Unit), its peripheral circuits, memory, and the like. This processor may be one, or may be composed of multiple chips.
  • the CPU executes the entire learning device 30 by controlling each part within the learning device 30 according to a program stored in the memory. Each part within the learning device 30 is realized by software control by the CPU. Further, the control unit 35 may operate in cooperation with the control unit 20 in the endoscope device 10, so that the endoscope device 10 and the learning device 30 operate as one.
  • CPU Central Processing Unit
  • the recording unit 50 can be applied to the recording unit 18 in the endoscope device 10 and/or the recording unit 33 in the learning device 30, and the recording unit 50 has an electrically rewritable nonvolatile memory. Although all of the data as shown in FIG. 8 may be recorded in the recording unit 18 or the recording unit 33, the data may be recorded in a divided manner in both recording units. That is, data required by the endoscope device 10 and the learning device 30 may be recorded respectively.
  • the recording section 50 has two recording areas: an inspection image recording section 51 and a utilization information recording section 52.
  • the test image recording section 51 is an area for recording test videos in the same way as recording medical records, and test videos A51a and B51b are recorded therein.
  • FIG. 8 shows only two test videos, video A and video B, it is of course possible to record three or more test videos. Recording is sometimes important as evidence for patient diagnosis, and there are also cases where a still image corresponding to one frame of a video is recorded as a report, but the display of the still image is omitted in FIG.
  • the test image recording unit 51 may record not only images but also test results and the like.
  • the recording unit 50 is provided with a utilization information recording unit 52.
  • a utilization information recording unit 52 In order to use the videos recorded in the examination image recording unit 51 for evidence reports, for example, as teaching data, processing procedures (for example, informed consent) are required, and depending on the video, doctors and patients may not want to use it. There are cases where we do not agree. Furthermore, when learning, it is desirable to consider various relevant information at the time of the examination. Therefore, it is preferable to provide a usage information recording section 52 that records such usage conditions, etc., and to organize and record information on which videos can be used for what purposes. In this case, the video utilization information folder and the teacher data folder may be recorded separately. Note that the usage information recording units do not necessarily need to be located in the same recording device.
  • video usage information A52 and video usage information B53 are provided in the usage information recording unit 52.
  • FIG. 8 shows only two types of video usage information, usage information A and usage information B, it is of course possible to record three or more usage information depending on the number of inspection videos. be. Since the information recorded in the moving image utilization information A53 and the moving image utilization information B56 are similar, the information recorded in the moving image utilization information A53 will mainly be explained here.
  • the original video information 53a is information indicating which video this information corresponds to.
  • the video usage information A53 corresponds to the test video A51a
  • the video usage information B56 corresponds to the test video B51b.
  • the acquired information 53b includes information such as the date and time of the test, the testing institution, the name of the doctor who conducted the test, and the model name of the device used.
  • the inference model type 53c indicates the type of inference model used by the endoscope apparatus 10 used for the examination. If you do not know the inference model used, you will not know which inference model to use for relearning.
  • the usage condition information 53d indicates under what conditions this video can be used. For example, if the scope of use is determined by informed consent, etc., that fact is recorded.
  • the profile information 53e is information such as the sex, age, ID, and medical examination results of the subject (patient) who underwent the test.
  • the inference result information 53f is information regarding the result of inference for each frame of the image.
  • a first teacher data folder 54 and a second teacher data 55 are provided in the usage information recording section 52.
  • the first and second teacher data folders 54 and 55 contain images selected as teacher data candidates by the control unit 20 based on differences in inference results using the test video A51a, that is, the first teacher data candidates 54a, the Two teacher data candidates 55a are recorded.
  • This candidate image may record the candidate image data itself, or may contain information (for example, the number of frames) that can specify candidate image data from among the images recorded in the inspection image recording section images, or images taken at what time and minute, etc.) may be recorded.
  • related information first related information 54b, second related information 55b
  • the first related information 54b and the second related information 55b for example, situation information, similar image information, information regarding differences in reliability values, etc. are recorded.
  • the video usage information record B56 also includes original video information 56a, acquisition time information 56b, inference model type 56c, usage condition information 56d, profile information 56e, inference result information 56f, first teacher data folder 57, and first teacher data.
  • Candidates 57a, first related information 57b, second teacher data folder 58, second teacher data candidates 58a, and second related information 58b are recorded, but these pieces of information are stored in the video utilization information recording section A55. Since they are similar, detailed explanation will be omitted.
  • FIG. 9 is a modification of the method of recording data that is organized and recorded in the recording section 50, in which the moving image recording section A61 and the moving image recording section B65 are combined with the moving image utilization information A and the moving image utilization information B. It was designed to be recorded. Data is managed centrally and there is no need to search for video recording locations when using the information, but the file size may become too large.
  • the video recording unit A61 also includes acquisition time information 62a, usage condition information 62b, inference model type 62c, profile information 62d, inference result information 62e, and a first teacher data folder 63 (first teacher data folder 63).
  • a data candidate 63a and first related information 63b) and a second teacher data folder 64 are recorded.
  • acquisition time information 66a usage condition information 66b, inference model type 66c, profile information 66d, inference result information 66e, first teacher data folder 67 (first teacher data candidate 67a, 1 related information 67b) and a second teacher data folder 68 (including a second teacher data candidate 68a and second related information 68b) are recorded.
  • first teacher data folder 67 first teacher data candidate 67a, 1 related information 67b
  • second teacher data folder 68 including a second teacher data candidate 68a and second related information 68b
  • FIGS. 8 and 9 various applications are possible regarding which recording unit is placed in which recording device. Also, which folder should these data be placed in and which recording unit should record them may be determined as appropriate depending on the situation and environment at the time. Further, the teacher data folders may be divided according to the characteristics of the teacher data.
  • FIGS. 6A and 6B show an example when the distal end of the endoscope device 10 is inserted into a patient's body cavity, and show examples of images acquired at this time.
  • the black-painted images (P1, P2, P11, P12) are the images at the time of insertion
  • the black-painted images (P8, P9, P18, P19) are the images at the time of removal
  • these transition images P1 A specific object such as an affected area is not shown in P2, P8, and P9.
  • the diagonally shaded images (P5, P6, P15, P16) are observation images when a doctor or the like finds a specific object such as an affected area and carefully observes it.
  • Intermediate images P3, P4, and P7 are acquired between the transition images P1, P2, P8, and P9 and the observation images P5 and P6.
  • intermediate images P13, P14, and P17 are acquired between transition images P11, P12, P18, and P19 and observation images P15 and P16.
  • images P5 to P7 and P15 to P17 within the range of the broken line frame Is are images that depict similar objects, are similar to each other, and belong to a similar image group.
  • a known method may be used to determine whether the images are similar.
  • the image determination unit 15 may calculate feature amounts from the images, and if the feature amounts of two images are within a predetermined range, it may be determined that the images are similar.
  • the image determination unit 15 may determine whether the images are similar based on the composition, color distribution, etc. of the images.
  • the image data acquired by the image sensor of the imaging unit 11 is input to the input layer of the inference engine 13A of the inference unit 13 after being subjected to image processing.
  • the inference engine 13A performs inference using a neural network, and the inference result is output from the output layer.
  • the inference engine 13A infers, for example, whether or not there is an image of a specific object such as an affected area in the endoscopic image. Based on this inference result, it can be shown that images P5, P6, and P16 are specific target object images. In FIG. 6(a), images P5 and P6 are surrounded by a thick frame, and in FIG.
  • image P16 is surrounded by a thick frame to indicate that they are specific object images, but the display method can be changed as appropriate. Just choose the most suitable method.
  • the inference engine 13A calculates the reliability of the inference result. In FIG. 6A, when the reliability value is higher than a predetermined value, it is indicated by a circle ( ⁇ ), and when the reliability value is lower than the predetermined value, it is indicated by a cross ( ⁇ ).
  • the situation determining unit 16 determines the usage situation of the endoscope device 10.
  • the situation determining unit 16 determines whether the distal end of the endoscope device 10 is facing the wall surface in the body cavity directly, that is, whether it is facing the wall surface in the body cavity, whether it is facing diagonally, etc.
  • FIG. 6(a) shows the acquired angle information.
  • FIG. 6 shows only the angle information, the image information is not limited to this information, and other information may also be obtained, and an image to be used for relearning may be selected from a group of similar images.
  • the inference engine 13A infers that images P5 and P6 are images of a specific target such as an affected area, the reliability is high, and the images P5 and P6 are in a state where they are directly facing the specific target. It is.
  • the inference engine 13A infers that the image P16 is an image of a specific target such as an affected area, the reliability is high, and the image P16 is directly facing the specific target. It is.
  • images P5, P6, and P16 that include specific objects are high, it can be said that there is a high possibility that the images P5, P6, and P16 include specific objects such as affected areas.
  • images determined to have low reliability may also include specific objects such as affected areas.
  • the image P15 includes a specific target object such as an affected area, the reliability of the inference is low and it is displayed that the specific target object is not included.
  • the reliability of the inference when the endoscope device 10 is directly facing a specific object such as an affected area, the reliability of the inference is high; If the angle is oblique to the specific object, the reliability of the inference is low. That is, the inference model set in the inference engine 13A is unlikely to be able to provide highly reliable inference when the tip of the endoscope is oblique to the wall surface within the body cavity. In such a case, it is preferable to perform re-learning using images with different inference results (unfavorable images) to generate a more reliable inference model.
  • images that are candidates for teacher data are selected from a group of similar images that are similar to images of a specific target such as an affected area.
  • the similar image group may also include dissimilar images acquired while observing the same region.
  • imaging and inference are performed (S1).
  • the control unit 11 instructs the imaging device and the imaging control circuit in the imaging unit 11 to start an imaging operation.
  • imaging signals for one screen are sequentially read out at time intervals of a predetermined frame rate.
  • the imaging operation continues repeatedly until it is determined in step S9 that the imaging operation has ended.
  • the imaging operation is started, the imaging element in the imaging section 11 outputs an image signal, and the image processing section 12 processes the image signal so that it becomes visually recognizable information.
  • the image data subjected to this image processing is output to the display unit 12a and displayed as an endoscopic image.
  • the inference unit 13 receives image data that has been subjected to image processing and performs inference. That is, the inference unit 13 infers a specific object image included in the captured endoscopic image using a machine learning model. Make inferences using . This inference result is superimposed on the endoscopic image in the image processing section 12, and this image is displayed on the display section 12a. This inference is performed, for example, in order to display advice to assist in diagnosis to a user of the endoscope apparatus 10, such as a doctor. If a specific object such as an affected area is included in the endoscopic image, a display to that effect may be provided. Further, advice regarding endoscope operation to find a specific object may be inferred.
  • the inference unit 13 may infer at least one of classification, detection, and region extraction of the input image. That is, if there is an affected area in the input image, the affected area may be detected, and the type of affected area, the detected position, the range of the affected area, etc. may be inferred. After performing these inferences, the inference unit 13 calculates the reliability of this inference.
  • a similar image group is determined (S3).
  • the image determination unit 15 compares the contents of endoscopic images obtained by temporally consecutive imaging, and observes the same region even if dissimilar images are included for a predetermined number of frames. Determine a group of similar images.
  • an image for relearning is selected from among the images in the similar image group Is.
  • the contents of endoscopic images obtained by temporally consecutive imaging are compared, and a predetermined A group of similar images in which the same part is observed over a number of frames is determined as a group of similar images, including dissimilar images.
  • the content of an image refers to the characteristics of the image, such as the shape, pattern, shadow, color, size, and position (including rotation and distortion, but it may also be corrected) of the object being imaged. image features).
  • FIG. 4 shows an example of images acquired by the imaging unit 11, and images P21, P22, and P29 are transition images when the endoscope device 10 is inserted and removed, similar to FIGS. 6(a) and 6(b).
  • P23 and P24 are intermediate images.
  • Images P25 and P28 are images when the distal end of the endoscope is directly facing a specific target area such as an affected area. These images P25 and P28 clearly show specific objects such as the affected area, and the reliability of the inference is high.
  • Images P26 and P27 between image P25 and image P28 are images corresponding to a saccade (involuntary eye movement), and are expressed as image micromovement.
  • a saccade is an involuntary change in visual field
  • an image change equivalent to a saccade refers to a sudden change in the image due to an object at close range or the positional relationship between the light source and the object.
  • This image change equivalent to a saccade is related to the special nature of the examination using an endoscope or the like, as described above in the explanation of FIG. In this way, since figures P26 and P27 are images corresponding to saccades, they are unintended images and unexpected images, and may take various forms of change.
  • images corresponding to saccades are not always included in the training data during learning, which tends to make inference difficult.
  • the quality is changing. Therefore, images P26 and P28 are likely to be determined to be dissimilar to images P25 and P26.
  • the image has a quality change, if the image quality is not extremely deteriorated, it may show a specific object such as an affected area, and depending on the specifications of the inference model, it may be an object that should not be overlooked. There is a possibility that it is. Therefore, in this embodiment, dissimilar images included in a similar image group are also treated as a similar image group. The detailed operation of similar image group determination in step S3 will be described later using FIG.
  • the difference determination unit 14 calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the image determination unit 15.
  • step S5 reliability may be used as the difference in the inference results for determination. Furthermore, in addition to the reliability, it may be determined that there is a difference in the inference results, for example, when the detection range or position of a specific object such as an affected area varies. Furthermore, it may be determined that there is a difference in the inference results when the appearance state changes, such as when a specific object such as an affected area appears or does not appear. Furthermore, the determination may be made based on how much the difference in reliability values and the like has changed. In addition, just as the visual image is corrected in the brain even when the human eye makes a saccade, which is an unconscious movement, it is easier to use if the inference result is not disrupted even if the image suddenly changes. , it can be an inference model that is friendly to people who perform examinations with endoscopic equipment. There are many parameters that cause this sudden image change, and it can occur in a complex manner, so it is often difficult to prepare all patterns during machine learning.
  • a learning image is selected (S7).
  • the control unit 20 selects an image to be used for learning (which may include relearning) based on the difference in the inference results calculated by the difference determination unit 14. Based on the amount of change calculated by the determination unit 14, a teacher data candidate image to be used for learning is selected.
  • the control unit 20 selects such an image for relearning.
  • the selected images include images that do not include a specific object such as an affected area.
  • inappropriate images can be eliminated when performing annotation to create training data.
  • a better inference model can be generated by selecting weak images or missed images for relearning.
  • the selected image itself or the information that allows the selected image to be searched are organized and recorded in the recording section 18 so that the inference model can be improved immediately using these images and information, and the information is stored in the communication section 19.
  • the information is transmitted to the learning device 30 through Note that each time an image is selected, it is transmitted to the learning device 30 through the communication unit 19 and recorded in the recording unit 33, and when a predetermined number of images for re-learning have been collected, re-learning is performed to generate the inference model. May be generated. Note that when selecting learning images in step S7, it is preferable to exclude images with extremely poor image quality. Images that are not suitable for learning may be defined as having poor image quality.
  • step S9 it is determined whether or not to end (S9).
  • a doctor or the like performs an operation to end the endoscopy, it is determined that the endoscopy has ended. As a result of this determination, if it is not the end, the process returns to step S1, whereas if it is the end, the flow is ended after the end process is performed.
  • step S3 the similar image group determination operation in step S3 (see FIG. 2) will be described using the flowchart shown in FIG.
  • image features are temporarily recorded (S11).
  • the image processing section 16 or the image determination section 15 calculates the feature amount of the image acquired by the imaging section 11.
  • the feature amount of the image may be calculated using a known method.
  • the calculated image feature amount is temporarily recorded in a memory provided inside the recording section 18, the control section 20, etc. In this memory, a history of image feature amounts, that is, image feature amounts are temporarily recorded in chronological order.
  • the difference determination unit 14 compares the image feature amount of the latest image frame and the image feature amount of the immediately previous image frame based on the image feature amount temporarily recorded in step S11, and the image features are similar. Determine whether or not. In this case, if the image feature amounts of the two images are outside the predetermined range, it is determined that the two images are not similar.
  • step S13 if it is determined that the features of the image are not similar to the immediately preceding frame, then it is determined whether the features are similar to the images before the immediately preceding frame (S15). .
  • the difference determination unit 14 compares the image feature amount of the latest image frame with the image feature amount of the image before the immediately preceding frame, based on the image feature amount provisionally recorded in step S11, It is determined whether the features are similar to the image before the previous frame.
  • the number of previous frames to be compared as the immediately previous image may be determined according to the design concept, and may be changed as appropriate depending on the state of the image.
  • the number of frames before the previous frame is determined to avoid the time during which a specialist such as a doctor may miss a specific object when performing an examination such as an endoscopy, or the occurrence of weak images.
  • the number of frames may be any number that corresponds to the time period during which there is a possibility that the image will be lost. For example, the number of frames corresponding to the time it takes for camera shake to subside, the time it takes for the slight vibration of the tip to subside, the number of frames corresponding to the time it takes for tip shake to subside during endoscope operation, or the number of frames corresponding to the time it takes for the tip to stop shaking during endoscope operation, or the operator's hand.
  • the number of frames may be any number that corresponds to the time it takes for the blur to subside. Further, in images acquired during a specific procedure, only the image of that procedure may appear to be a different image. It is better to treat such images as a group of similar images.If the type of treatment is known, this can be detected and the "previous frame" You may also decide on the number of frames and identify the images before the previous frame. Similarly, if the situation can be determined, such as when there is bleeding or when water vapor is generated due to the treatment, the number of frames immediately before may be determined depending on the situation.
  • step S13 If the result of the determination in step S13 is that the features of the latest frame image and the image of the immediately preceding frame are similar, or as a result of the determination of step S15, the characteristics of the image of the latest frame and the images before the immediately preceding frame are similar. If so, the image is determined to be a similar image (S17). As a result of the determination in steps S13 and S15, the latest image is similar to the immediately preceding frame or the immediately preceding frame, and is therefore determined to be a similar image. Similar images can be determined based on pattern matching or similarity determination when the images are digitized.
  • the images may be determined as not similar (dissimilar), but if similar images are acquired temporally consecutively, the images may simply be determined as dissimilar. However, there is a high possibility that important images are included. Therefore, as described above, dissimilar images that satisfy the conditions are treated as images that constitute a group of similar images.
  • step S15 if the features are not similar to the images before the previous frame, or if it is determined that the image group is a similar image group in step S17, the similar image group determination flow is ended and the process returns to the original flow.
  • this similar image group determination flow even if the images are dissimilar as a result of determining the feature amount of the images, if the characteristics are similar to the previous frame image, the images are determined to be a similar image group. That is, the similar image group may include dissimilar images.
  • images are acquired by temporally continuous imaging, and the images obtained by this imaging are inferred using a machine learning model (S1). Further, the content of the acquired images is digitized, the numeric values are compared, and images whose image change is less than or equal to a predetermined value are determined as a group of similar images (S3). The difference between the inference results of each image included in the determined similar image group is calculated, and it is determined whether or not there is a difference (S5). Based on the difference in the inference results, images to be used for learning are selected (S7). In this manner, in this embodiment, teacher data candidate images for inference model learning can be efficiently acquired.
  • a machine learning model S1
  • the content of the acquired images is digitized, the numeric values are compared, and images whose image change is less than or equal to a predetermined value are determined as a group of similar images (S3).
  • the difference between the inference results of each image included in the determined similar image group is calculated, and it is determined whether or not there is a
  • step S6 is simply added and step S7 is replaced with step S8 in the flowchart shown in FIG. 2, so this difference will be explained.
  • a situation difference determination is performed (S6).
  • the situation determination unit 16 performs a difference determination in the usage status of the endoscope apparatus 10 and the like.
  • the information acquired as the situation determination includes, for example, angle information (angle information between the wall surface in the body cavity and the tip of the endoscope) as shown in FIGS. 4 and 6(a) and (b).
  • the angle information may be determined from changes in the image, or may be detected using a built-in sensor. Detection is also possible based on the uniformity and distribution of illumination. Sensor data, illumination light brightness distribution detection results, etc. may be recorded as they are.
  • Information used for situation determination includes focus information and depth information in the imaging unit, information on the light source in the imaging unit 11, and treatment information such as water injection operation and suction operation.
  • treatment information such as water injection operation and suction operation.
  • a learning image is selected (S8).
  • the control unit 20 selects an image to be used for learning (which may include relearning) based on the difference in the inference results calculated by the difference determining unit 14. That is, similarly to step S7, in step S5, an image among the similar image group and for which it has been determined that there is a difference in the inference result is selected.
  • the situation difference information acquired in step S7 is organized, that is, recorded in association with the selected image. Based on this recorded information, learning can be customized for each situation.
  • step S9 it is determined whether or not to end (S9).
  • a doctor or the like performs an operation to end the endoscopy, it is determined that the endoscopy has ended. As a result of this determination, if it is not the end, the process returns to step S1, whereas if it is the end, the flow is ended after the end process is performed.
  • a situation difference determination is performed (see S6), and this situation difference determination information is acquired and associated with the image data (see S8).
  • FIG. 7 shows the flow of data among the imaging section 41, the developing section 42, the image identification section 43, the inference section 44, the inference result change calculation section 45, the learning image selection section 46, and the recording section 47.
  • Each of these parts corresponds to each part of FIG. 1, as described later. All or part of the functions of these units may be realized by a processor, or may be realized by hardware and software.
  • the imaging unit 41 corresponds to the imaging unit 11 in FIG. 1, and acquires an image of a specific object such as an affected area as RAW data, and outputs it to the developing unit 42.
  • the image data output from the imaging unit 11 is continuous in time, and when the latest RAW data 2 is output, the immediately preceding RAW data 1 is an image pair, and the latest RAW data When the next RAW data 3 is output after outputting the data 2, the RAW data 2 and the RAW data 3 become an image pair. In other words, temporally consecutive images are paired.
  • the developing unit 42 develops the RAW data output from the imaging unit 41.
  • the image processing section 12 performs the function of the developing section 42.
  • the developed image data is output to the image identification section 43.
  • the image identification unit 43 determines whether or not there is a large change between image pairs, and extracts one or more identical image pairs that are determined (identified) as having no large change.
  • the extracted identical image pair is output to the inference section 44. Note that in FIG. 1, the image determination section 15 functions as the image identification section 43.
  • the image identification unit 43 functions as an image identification unit that determines that the temporally continuous input pair images have hardly changed (for example, S3 in FIGS. 2 and 5, the image determination unit in FIGS. 3 and 1) 15).
  • the input pair image is, for example, when a plurality of images such as a first image, a second image, a third image, etc. are sequentially input to the image identification unit, the first image and the second image are paired. The second image and the third image form a pair image.
  • the image identification unit determines that the input pair images are substantially unchanged based on at least one of the amount of movement of the corresponding points and the amount of change in brightness, saturation, and contrast of the images.
  • the inference unit 44 inputs the same image pair from the image identification unit 43, performs inference on each image, and outputs a pair of inference results (inference result pair) to the inference result change calculation unit 45. When outputting this inference result, the reliability of each inference result is also output to the inference result change calculation unit 45.
  • the inference section 13 functions as the inference section 44. In FIG. 1, the inference unit 13 does not directly input the same image pair from the image identification unit 43, but it may receive information from the control unit 20 that the images are the same pair.
  • the inference result change calculation unit 45 calculates the inference result for the same image pair and the amount of change in its reliability. That is, the inference result for the latest image and the inference result for the immediately previous image are compared to calculate the amount of change, and the reliability of both inference results is compared to calculate the amount of change. This calculated amount of change is output to the learning image selection section 46. Note that in FIG. 1, the difference determination unit 14 functions as the inference result change calculation unit 45.
  • the inference result change calculation unit 45 functions as an inference result change calculation unit that calculates the amount of change in the inference result of the identified image pair (for example, see S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). .
  • the identified image pair is, for example, an image in which two temporally consecutive input images (image pair) are almost unchanged.
  • the inference result change calculation unit uses at least one differential value of the inference result and reliability (for example, see S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1).
  • the inference result change calculation unit 45 functions as an inference result change calculation unit that calculates the amount of change in the inference results of temporally consecutive image pairs (for example, S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). reference).
  • step S5 of FIG. 2 described above it is determined whether or not there is a difference in the inference results within the image group, and in this process, the inference result change calculation unit 45 This corresponds to calculating the amount of change in .
  • the difference in the inference results for example, if the reliability values of the inferences differ by a predetermined value or more, it may be determined that there is a difference in the inference results. In other words, it can be said that the inference result change output unit 45 calculates the amount of change in the inference results of temporally consecutive image pairs.
  • the learning image selection unit 46 selects the same image pair as an image (learning image candidate) to be used for learning, and records it as the selected learning image candidate in the recording unit. 47. If the amount of change in the inference result (or confidence level) is large, the image is likely to be difficult to identify by machine learning, and is therefore selected as a learning candidate image. In fact, whether or not to use it for learning can be determined by experts such as doctors when creating training data. Note that in FIG. 1, the control section 20 functions as the learning image selection section 46.
  • the recording unit 47 is an electrically rewritable nonvolatile memory, and sequentially records the learning image candidates input from the learning image selection unit 46.
  • Teacher data for learning is created from among the learning image candidates recorded in the recording unit 47, and an inference model is generated by machine learning or the like. As described above, if an image is inappropriate as a learning image, it is sufficient to exclude the image and create training data using an appropriate image. Note that in FIG. 1, the recording section 18 and/or the recording section 33 fulfills the function of the recording section 47.
  • both of the same image pair may be selected, or the image may be selected based on the comparison results with the previous and subsequent pairs of the same image pair. You may choose.
  • the teacher data candidate image acquisition device acquires a teacher data candidate image based on an endoscope-obtained image for learning an inference model for an endoscope.
  • This training data candidate image acquisition device compares the contents of endoscopic images obtained by temporally consecutive imaging, and selects a group of similar images in which the same region is observed over a predetermined number of frames. , has a similar image group determination unit (for example, see the image determination unit 15 in FIG. 1, S3 in FIGS. 2 and 5, and the image identification unit 43 in FIGS. 3 and 7) that performs determination including dissimilar images.
  • the teacher data candidate image acquisition device also includes an inference unit (for example, the inference unit 13 in FIG.
  • the teacher data candidate image acquisition device further includes a learning image selection unit (for example, the control shown in FIG. 20, S7 in FIG. 2, S8 in FIG. 5, learning image selection section 46 in FIG. 7, etc.). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, from among the image group determined as a similar image group, images to be used for relearning are selected based on differences in the inference results by the inference unit. Therefore, the inference model can be improved by relearning using weak images.
  • the teacher data candidate image acquisition device includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3, see image identification unit 43 in FIGS. 3 and 7), and a learning image selection unit (for example, , the control section 20 in FIG. 1, S7 in FIG. 2, S8 in FIG. 5, the learning image selection section 46 in FIG. 7, etc.). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, an image to be used for learning is selected based on the difference in the inference result by the inference unit between the identified image pair and the determined image. Therefore, the inference model can be improved by relearning using weak images.
  • an inference unit for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3, see image identification unit 43 in FIGS. 3 and 7
  • a learning image selection unit for example, the control section 20 in FIG
  • the teacher data candidate image acquisition device includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3 in FIG. 5, image identification section 43 in FIG. 3, FIG. 7, etc.) and a learning image selection section that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation section.
  • an inference unit for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3 in FIG. 5, image identification section 43 in FIG. 3, FIG. 7, etc.
  • a learning image selection section that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation section.
  • the control section 20 in FIG. 1, S7 in FIG. 2, S8 in FIG. 5, and the learning image selection section 46 in FIG. 7 Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That
  • the endoscope apparatus 10 in addition to the imaging section 11, includes an image processing section 12, an inference section 13, a difference determination section 14, an image determination section 15, and a situation determination section.
  • the explanation has been made on the assumption that the section 16, the information association section 17, the recording section 18, etc. are provided.
  • the endoscope device 10 may be provided with other configurations, such as a display unit.
  • external blocks such as the learning section 34 within the learning device 30 may be arranged.
  • control units 20 and 35 have been described as devices composed of a CPU, memory, and the like.
  • some or all of each part may also be configured as hardware circuits, such as those written in Verilog, VHDL (Verilog Hardware Description Language), etc.
  • a hardware configuration such as a gate circuit generated based on a programming language may be used, or a hardware configuration using software such as a DSP (Digital Signal Processor) may be used. Of course, these may be combined as appropriate.
  • control units 20 and 35 are not limited to the CPU, and may be any element that functions as a controller, and the processing of each unit described above may be performed by one or more processors configured as hardware.
  • each unit may be a processor configured as an electronic circuit, or each unit may be a circuit unit in a processor configured with an integrated circuit such as an FPGA (Field Programmable Gate Array).
  • a processor including one or more CPUs may execute the functions of each unit by reading and executing a computer program recorded on a recording medium.
  • the endoscope device 10 and the learning device 30 have been described as having blocks that each perform respective functions. However, these need not be provided in a single device; for example, the above-mentioned units may be distributed as long as they are connected via a communication network such as the Internet.
  • the explanation is based on an endoscope, since it is easy to explain an inspection scene using an endoscope.
  • the device performs some kind of inference when observing an object using image data, it can be widely applied.
  • even cameras built into mobile terminals and consumer cameras are sometimes used for purposes of determining whether they are medical devices. In this case, holding instability may cause a situation described as saccade-like image movement.
  • holding instability may cause a situation described as saccade-like image movement.
  • similar image fluctuations are likely to occur in situations where the positional relationship with the object is unstable.
  • the invention of the present application can be applied to these devices.
  • logic-based determination has been mainly described, and determination has been partially performed by inference using machine learning.
  • determination has been partially performed by inference using machine learning.
  • either logic-based determination or inference-based determination may be appropriately selected and used.
  • a hybrid type determination may be performed by partially utilizing the merits of each.
  • control mainly explained in the flowcharts can often be set by a program, and may be stored in a recording medium or a recording unit.
  • the method of recording on this recording medium and recording unit may be recorded at the time of product shipment, a distributed recording medium may be used, or a recording medium may be downloaded from the Internet.
  • the present invention is not limited to the above-mentioned embodiment as it is, and can be embodied by modifying the constituent elements within the scope of the invention at the implementation stage.
  • various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components of different embodiments may be combined as appropriate.

Abstract

Provided are an endoscope, a training data candidate-image acquisition device, a training data candidate-image acquisition method, and a program that make it possible to accurately select medical images with which machine learning prepared in advance would struggle in identification. Training data candidate-image acquisition is performed using endoscope-acquired images for endoscopic inferential model learning, and comprises: comparing the content of endoscopic images obtained by imaging consecutively in time, and determining a similar image group in which the same site is observed even though non-similar images may be included over a predetermined number of frames (S3); using machine learning to infer a specific target image included in the endoscopic images obtained by imaging (S1); calculating the difference in inference results from making inferences with regard to each of the images determined to be included in the similar image group (S5); and selecting images to use for learning on the basis of the calculated difference in the inference results (S7).

Description

内視鏡、教師データ候補画像取得装置、教師データ候補画像取得方法、およびプログラムEndoscope, training data candidate image acquisition device, training data candidate image acquisition method, and program
 本発明は、学習によって医療分野用の推論モデルを生成する際に使用する内視鏡、教師データ候補画像を取得する教師データ候補画像取得装置および教師データ候補画像取得方法、およびプログラムに関する。 The present invention relates to an endoscope used when generating an inference model for the medical field through learning, a teacher data candidate image acquisition device and a teacher data candidate image acquisition method for acquiring teacher data candidate images, and a program.
 医療分野における画像診断においては、病状を分類及び評価するため、個々の患者に係る種々の解剖学的構造の医用画像を撮影する種々のシステムが開発されている。これらの撮影システムとして、例えば、内視鏡システム、CT(コンピュータ断層撮影)システム、MRI(磁気共鳴撮影)システム、X線システム、超音波システム、PET(陽電子放出断層撮影)システム等が知られている。また、医者等の医療関係者によるアノテーションデータを機械学習することによって、いわゆるコンピュータ検出/診断支援であるCADe/x(Computer-Aided Detection/Diagnosis)による病変検出機能を実現する方法が知られている。 In image diagnosis in the medical field, various systems have been developed to take medical images of various anatomical structures of individual patients in order to classify and evaluate medical conditions. Known examples of these imaging systems include endoscope systems, CT (computed tomography) systems, MRI (magnetic resonance imaging) systems, X-ray systems, ultrasound systems, and PET (positron emission tomography) systems. There is. Additionally, there is a known method for realizing a lesion detection function using CADe/x (Computer-Aided Detection/Diagnosis), which is so-called computer detection/diagnosis support, by applying machine learning to annotated data by medical personnel such as doctors. .
 ところで、上述の如き機械学習による識別器の性能を改善するには、一般的には、大量のデータを必要とするとされている。したがって、識別器に機械学習を要するシステムにおいては、取り扱うデータ量が膨れ上がることが予測される。しかし、大量のデータは莫大な記録容量を必要とし、かつデータの転送においてはネットワーク回線を占有してしまい、個々のデータ確認に膨大な手間がかかってしまうことから、今後は「効率的なデータ収集」が必要になると予測される。この「効率的なデータ収集」としては、例えば、「機械学習による識別を苦手とする医療画像」のみを有用なデータとして収集することが考えられる。ここで、大量の医療画像から有用な医療画像を選定する技術として、例えば、特許文献1においては、複数の医療画像の中から所望の部位が撮影された医療画像のみを効率よく転送する技術が開示されている。 By the way, it is generally said that a large amount of data is required to improve the performance of a classifier using machine learning as described above. Therefore, in a system that requires machine learning in a classifier, it is predicted that the amount of data handled will increase. However, a large amount of data requires a huge amount of storage capacity, and data transfer occupies network lines, and it takes a huge amount of time to check each piece of data. It is expected that "collection" will be necessary. As for this "efficient data collection," for example, it is conceivable to collect only "medical images that are difficult to identify using machine learning" as useful data. Here, as a technique for selecting useful medical images from a large amount of medical images, for example, Patent Document 1 discloses a technique for efficiently transferring only medical images in which a desired body part is photographed from among a plurality of medical images. Disclosed.
特許第5048286号公報Patent No. 5048286
 上述の特許文献1に記載された技術は、あくまで所望の部位が撮影された医療画像を効率よく転送する技術である。したがって、特許文献1に開示された技術を用いて、「機械学習による識別を苦手とする医療画像」を選定することは困難と思われる。すなわち、従来の技術では、「機械学習による識別を苦手とする医療画像」のみを有用なデータとして収集することは困難であった。 The technology described in Patent Document 1 mentioned above is a technology that efficiently transfers medical images in which a desired region is photographed. Therefore, it seems difficult to select "medical images that are difficult to identify by machine learning" using the technique disclosed in Patent Document 1. That is, with conventional technology, it is difficult to collect only "medical images that are difficult to identify using machine learning" as useful data.
 本発明は、このような事情を鑑みてなされたものであり、予め用意された機械学習が識別を苦手とする医療画像を的確に選定することを可能とする内視鏡、教師データ候補画像取得装置、教師データ候補画像取得方法、およびプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides an endoscope and training data candidate image acquisition system that makes it possible to accurately select medical images that are difficult to identify using pre-prepared machine learning. The purpose of the present invention is to provide a device, a method for acquiring training data candidate images, and a program.
 上記目的を達成するため第1の発明に係る内視鏡は、時間的に連続して撮像して得た内視鏡画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて判定する類似画像群判定部と、撮像して得た内視鏡画像に含まれる特定対象物画像を、機械学習を用いて推論する推論部と、上記類似画像群判定部によって判定された類似画像群に含まれる各々の画像について、上記推論部による推論結果の差異を算出する推論結果差異算出部と、上記推論部結果差異算出部によって算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する学習画像選択部と、を有する。 In order to achieve the above object, the endoscope according to the first invention compares the contents of endoscopic images obtained by temporally consecutive imaging, and observes the same region over a predetermined number of frames. a similar image group determination unit that determines a group of similar images including dissimilar images, and an inference unit that uses machine learning to infer a specific object image included in an endoscopic image obtained by imaging. , an inference result difference calculation unit that calculates a difference in inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit; and a learning image selection unit that selects an image to be used for learning based on the difference between the above-mentioned inference results.
 第2の発明に係る内視鏡は、上記第1の発明において、上記類似画像群は、急変する視点位置移動に相当する画像群を、上記時間的に連続して撮像して得た内視鏡画像の前後の画像の類似性に従って判定した結果を含む。
 第3の発明に係る内視鏡は、上記第1の発明において、上記学習画像選択部によって選択された各々の画像に対応する撮像情報を取得可能な撮像情報取得部を有する。
In the endoscope according to a second invention, in the first invention, the similar image group is an endoscope obtained by sequentially capturing a group of images corresponding to a sudden change in viewpoint position movement. Contains the results determined according to the similarity of the images before and after the mirror image.
The endoscope according to a third aspect of the invention, in the first aspect, includes an imaging information acquisition section capable of acquiring imaging information corresponding to each image selected by the learning image selection section.
 第4の発明に係る内視鏡は、上記第1の発明において、上記類似画像群判定部は、上記内視鏡画像の画像内の同一対象物のパターンを追尾できるような数値に数値化し、上記数値を用いて類似画像であるか否かを判定する。
 第5の発明に係る内視鏡は、上記第1の発明において、上記学習画像選択部は、極端に条件が悪い画像は省く。
In the endoscope according to a fourth aspect of the present invention, in the first aspect, the similar image group determining section digitizes a pattern of the same object in the endoscopic image into a numerical value that allows tracking of the pattern of the same object in the image, It is determined whether the images are similar using the above numerical values.
In the endoscope according to a fifth aspect of the invention, in the first aspect, the learning image selection section excludes images with extremely poor conditions.
 第6の発明に係る教師データ候補画像取得装置は、推論モデル学習用の画像を取得するための教師データ候補画像取得装置であって、時間的に連続して撮像して得た画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて判定する類似画像群判定部と、撮像して得た画像に含まれる特定対象物画像を、機械学習を用いて推論する推論部と、上記類似画像群判定部によって判定された類似画像群に含まれる各々の画像について、上記推論部による推論結果の差異を算出する推論結果差異算出部と、上記推論部結果差異算出部によって算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する学習画像選択部と、を有する。 A teacher data candidate image acquisition device according to a sixth aspect of the present invention is a teacher data candidate image acquisition device for acquiring images for inference model learning, and the teacher data candidate image acquisition device is a teacher data candidate image acquisition device for acquiring images for inference model learning. a similar image group determination unit that compares and determines a group of similar images in which the same region is observed over a predetermined number of frames, including dissimilar images; and a specific target object image included in the captured images. an inference unit that infers the above using machine learning, and an inference result difference calculation unit that calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit. and a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit.
 第7の発明に係る教師データ候補画像取得方法は、推論モデル学習用の画像を取得するための教師データ候補画像取得方法であって、時間的に連続して撮像して得た画像の内容を比較し、所定のコマ数に亘って非類似画像を含んでいても同一部位を観察している類似画像群を判定し、撮像して得た画像に含まれる特定対象物画像を、機械学習モデルを用いて推論し、上記類似画像群に含まれると判定された各々の画像について、上記機械学習モデルを用いて推論した推論結果の差異を算出し、算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する。
 第8の発明に係るプログラムは、推論モデル学習用の教師データ候補画像を取得するコンピュータに、時間的に連続して撮像して得た画像の内容を比較し、所定のコマ数に亘って非類似画像を含んでいても同一部位を観察している類似画像群を判定し、撮像して得た画像に含まれる特定対象物画像を、機械学習モデルを用いて推論し、上記類似画像群に含まれると判定された各々の画像について、上記機械学習モデルを用いて推論した推論結果の差異を算出し、算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する、ことを実行させる。
A teaching data candidate image acquisition method according to a seventh aspect of the present invention is a teaching data candidate image acquisition method for acquiring images for inference model learning, in which the contents of images obtained by temporally continuous imaging are acquired. A machine learning model is used to compare images of specific objects included in the captured images to determine whether similar images include dissimilar images over a predetermined number of frames but which observe the same area. For each image determined to be included in the similar image group, the difference between the inference results inferred using the machine learning model is calculated, and based on the difference in the calculated inference results. , select images to use for learning.
The program according to the eighth invention has a computer that acquires teacher data candidate images for inference model learning compare the contents of images obtained by temporally consecutive images, and compares the contents of images obtained by sequentially capturing images over a predetermined number of frames. Determine the similar image group that observes the same part even if it contains similar images, infer the specific target image included in the captured image using a machine learning model, and apply it to the above similar image group. For each image determined to be included, calculate a difference in inference results inferred using the machine learning model, and select an image to be used for learning based on the calculated difference in the inference results. Execute.
 第9の発明に係る教師データ候補画像取得装置は、入力画像を、機械学習モデルを用いて推論する推論部と、同定された画像ペアの推論結果の変化量を算出する推論結果変化算出部と、上記推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部と、を有する。 A teacher data candidate image acquisition device according to a ninth invention includes an inference unit that infers an input image using a machine learning model, and an inference result change calculation unit that calculates an amount of change in an inference result of an identified image pair. , a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit.
 第10の発明に係る教師データ候補画像取得装置は、上記第9の発明において、時間的に連続した入力ペア画像がほぼ変化していないことを判定する画像同定部を有する。
 第11の発明に係る教師データ候補画像取得装置は、上記第10の発明において、上記画像同定部は、対応点の移動量、および画像の明度・彩度・コントラストの変化量の内の少なくとも1つに基づいて上記入力ペア画像がほぼ変化していないことを判定する。
A teacher data candidate image acquisition device according to a tenth invention, in the ninth invention, includes an image identification section that determines that the temporally continuous input pair images are substantially unchanged.
In the teaching data candidate image acquisition device according to an eleventh invention, in the tenth invention, the image identification unit detects at least one of the amount of movement of the corresponding points and the amount of change in brightness, saturation, and contrast of the image. Based on this, it is determined that the input pair image has not changed substantially.
 第12の発明に係る教師データ候補画像取得装置は、上記第9の発明において、上記入力画像は、医療用画像である。
 第13の発明に係る教師データ候補画像取得装置は、上記第9の発明において、上記推論部は、上記入力画像の分類、検出、および領域抽出の少なくとも1つを推論する。
A teacher data candidate image acquisition device according to a twelfth invention is the teacher data candidate image acquisition device according to the ninth invention, wherein the input image is a medical image.
In the teaching data candidate image acquisition device according to a thirteenth invention, in the ninth invention, the inference section infers at least one of classification, detection, and region extraction of the input image.
 第14の発明に係る教師データ候補画像取得装置は、上記第9の発明において、上記推論部は、上記推論の信頼度を出力する。
 第15の発明に係る教師データ候補画像取得装置は、上記第9の発明において、上記推論結果変化算出部は、上記推論結果および信頼度の少なくとも1つの微分値を用いる。
 第16の発明に係る教師データ候補画像取得装置は、上記第9の発明において、上記学習画像選択部は、上記変化量が規定値を超えた場合に、上記学習に利用する教師データ候補画像を選択する。
A teacher data candidate image acquisition device according to a fourteenth invention is based on the ninth invention, wherein the inference section outputs the reliability of the inference.
In the teacher data candidate image acquisition device according to a fifteenth invention, in the ninth invention, the inference result change calculation section uses at least one differential value of the inference result and reliability.
In the teacher data candidate image acquisition device according to a sixteenth invention, in the ninth invention, the learning image selection section selects the teacher data candidate image to be used for learning when the amount of change exceeds a specified value. select.
 第17の発明に係る教師データ候補画像取得方法は、入力画像を、機械学習モデルを用いて推論し、同定された画像ペアの推論結果の変化量を算出し、算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する。
 第18の発明に係るプログラムは、推論モデル学習用の教師データ候補画像を取得するコンピュータに、入力画像を、機械学習モデルを用いて推論し、同定された画像ペアの推論結果の変化量を算出し、算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する、ことを実行させる。
A teaching data candidate image acquisition method according to the seventeenth invention infers an input image using a machine learning model, calculates an amount of change in the inference result of the identified image pair, and based on the calculated amount of change. to select the training data candidate images to be used for learning.
The program according to the eighteenth invention causes a computer that acquires training data candidate images for inference model learning to infer an input image using a machine learning model, and calculates the amount of change in the inference result of the identified image pair. Then, based on the calculated amount of change, a teacher data candidate image to be used for learning is selected.
 第19の発明に係る教師データ候補画像取得装置は、入力画像を、機械学習モデルを用いて推論する推論部と、時間的に連続した画像ペアの推論結果の変化量を算出する推論結果変化算出部と、上記推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部と、を有する。 The teacher data candidate image acquisition device according to the nineteenth invention includes an inference unit that infers an input image using a machine learning model, and an inference result change calculation that calculates the amount of change in the inference results of a pair of temporally consecutive images. and a learning image selection unit that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation unit.
 第20の発明に係る教師データ候補画像取得装置は、上記第19の発明において、上記入力画像は、医療用画像である。
 第21の発明に係る教師データ候補画像取得装置は、上記第19の発明において、上記推論部は、上記入力画像の分類、検出、および領域抽出の少なくとも1つを推論する。
A teacher data candidate image acquisition device according to a twentieth invention is the nineteenth invention, wherein the input image is a medical image.
In the teacher data candidate image acquisition device according to a twenty-first invention, in the nineteenth invention, the inference section infers at least one of classification, detection, and region extraction of the input image.
 第22の発明に係る教師データ候補画像取得装置は、上記第19の発明において、上記推論部は、上記推論の信頼度を出力する。
 第23の発明に係る教師データ候補画像取得装置は、上記第19の発明において、上記推論結果変化算出部は、上記推論結果および信頼度の少なくとも1つの微分値を用いる。
 第24の発明に係る教師データ候補画像取得装置は、上記第19の発明において、上記学習画像選択部は、上記変化量が規定値を超えた場合に、上記学習に利用する教師データ候補画像を選択する。
A teacher data candidate image acquisition device according to a twenty-second invention is the nineteenth invention, wherein the inference section outputs the reliability of the inference.
In the teacher data candidate image acquisition device according to a twenty-third invention, in the nineteenth invention, the inference result change calculation section uses at least one differential value of the inference result and reliability.
In the teacher data candidate image acquisition device according to a twenty-fourth invention, in the nineteenth invention, the learning image selection unit selects the teacher data candidate image to be used for learning when the amount of change exceeds a specified value. select.
 第25の発明に係る教師データ候補画像取得方法は、入力画像を、機械学習モデルを用いて推論し、時間的に連続した画像ペアの推論結果の変化量を算出し、算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する。
 第26の発明に係るプログラムは、推論モデル学習用の教師データ候補画像を取得するコンピュータに、入力画像を、機械学習モデルを用いて推論し、時間的に連続した画像ペアの推論結果の変化量を算出し、算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する、ことを実行させる。
The training data candidate image acquisition method according to the twenty-fifth invention infers an input image using a machine learning model, calculates the amount of change in the inference result of a temporally consecutive pair of images, and calculates the amount of change that is calculated. Based on this, select teacher data candidate images to be used for learning.
A program according to the twenty-sixth invention provides a computer that acquires training data candidate images for inference model learning, infers an input image using a machine learning model, and calculates the amount of change in inference results for temporally consecutive image pairs. is calculated, and a teacher data candidate image to be used for learning is selected based on the calculated amount of change.
 本発明によれば、予め用意された機械学習が識別を苦手とする医療画像を的確に選定することを可能とする内視鏡、教師データ候補画像取得装置、教師データ候補画像取得方法、およびプログラムを提供することができる。 According to the present invention, there is provided an endoscope, a teacher data candidate image acquisition device, a teacher data candidate image acquisition method, and a program that enable machine learning prepared in advance to accurately select medical images that are difficult to identify. can be provided.
本発明の一実施形態に係る内視鏡システムの主として電気的構成を示すブロック図である。1 is a block diagram mainly showing the electrical configuration of an endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムにおける撮像動作を示すフローチャートである。3 is a flowchart showing an imaging operation in an endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムにおける類似画像群判定の動作を示すフローチャートである。7 is a flowchart showing the operation of similar image group determination in the endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムにおいて、類似画像群の判定を説明する図である。FIG. 3 is a diagram illustrating determination of a similar image group in an endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムにおける類似画像群判定の動作の変形例を示すフローチャートである。12 is a flowchart illustrating a modified example of the similar image group determination operation in the endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムにおいて、類似画像群の選択を説明する図である。FIG. 2 is a diagram illustrating selection of a similar image group in an endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムにおいて、ブロック間のデータの流れを説明する図である。FIG. 2 is a diagram illustrating the flow of data between blocks in an endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムの記録部におけるデータの構造の一例を示す図である。FIG. 2 is a diagram showing an example of a data structure in a recording unit of an endoscope system according to an embodiment of the present invention. 本発明の一実施形態に係る内視鏡システムの記録部におけるデータの構造の変形例を示す図である。It is a figure which shows the modification of the structure of the data in the recording part of the endoscope system based on one Embodiment of this invention.
 本発明を、例えば、内視鏡のような撮像信号に基づく画像データを使った推論モデルを作成、あるいは改良、性能を維持する際に、学習用の教師データとなる候補画像を取得する装置に適用した場合を想定して説明する。内視鏡のような機器は、医師のような専門家が利用するものであり、対象物に近づき、それを必要に応じて照明し、連続して撮像された画像データを画像として表示装置に表示することが可能である。この画像を目視する際に、または推論モデルの補助を受けながら、専門家は気になる部分を注意して観察する。そこで、同一の対象物を観察する際に、連続して得られた画像コマから類似の画像コマを複数枚取得することが可能である。時間的に連続して撮像して得た画像の内容を数値化し、この画像毎の数値を比較する、あるいは画像データの差を数値化すること等によって、画像変化の度合いが分かる。本発明に係る実施形態は、画像変化が所定の値以下または未満である類似画像群を判定する類似画像群判定部を有し、同じ対象物を観察しているかどうかを判定することが出来る。 The present invention can be applied to a device that acquires candidate images that serve as training data for learning, for example, when creating or improving an inference model using image data based on an imaging signal such as an endoscope, and maintaining performance. The explanation will be based on the case where it is applied. Equipment such as endoscopes are used by professionals such as doctors, and they approach an object, illuminate it as necessary, and display the continuously captured image data as an image on a display device. It is possible to display. While viewing this image visually or with the aid of an inference model, the expert carefully observes areas of concern. Therefore, when observing the same object, it is possible to obtain a plurality of similar image frames from sequentially obtained image frames. The degree of image change can be determined by digitizing the content of images obtained by temporally continuous imaging and comparing the numeric values for each image, or by digitizing differences in image data. Embodiments according to the present invention include a similar image group determining unit that determines a similar image group in which the image change is less than or equal to a predetermined value, and can determine whether the same object is being observed.
 また、本発明の一実施形態では、順次得られる各画像コマを機械学習に入力し、推論する推論部と、類似画像判定部によって類似画像群に含まれると判定された、各々の画像の推論結果間の差異を判定する判定部を有する。この構成によって、同様の対象物を観察しているのにも関わらず、推論結果に差異が生じる場合に、その差異の原因を分析すれば、当該推論部が苦手とする状況を分析すること可能となる。つまり、推論結果の差異に基づき、学習に利用する画像を選択すれば、その選択された画像が有効な改良推論モデルを生成する際の手がかりとなる。このため、本発明の一実施形態においては、学習画像選択部を設けている。 Further, in an embodiment of the present invention, an inference unit inputs each sequentially obtained image frame into machine learning and performs inference, and an inference unit performs inference on each image determined to be included in a similar image group by a similar image determination unit. It has a determination unit that determines the difference between the results. With this configuration, if a difference occurs in the inference results even though the same object is observed, by analyzing the cause of the difference, it is possible to analyze the situation in which the inference section is weak. becomes. In other words, if an image to be used for learning is selected based on the difference in the inference results, the selected image becomes a clue for generating an effective improved inference model. For this reason, in one embodiment of the present invention, a learning image selection section is provided.
 次に、図1ないし図9を用いて、本発明を内視鏡システムに適用した一実施形態について説明する。この内視鏡システムは、推論モデル学習用の画像を取得するための教師データ候補画像取得装置(内視鏡)としての機能を果たす。図1は、内視鏡システムの主として電気的構成を示すブロック図である。この内視鏡システムは、内視鏡装置10と学習装置30とから構成されている。しかし、内視鏡装置10内の各ブロックが一体の装置内に設けられている必要はなく、複数の装置に分割してもよい。例えば、撮像部11と他の各ブロックを別々の装置に設けるようにしてもよい。この場合、撮像部11以外の各部を、病院内等のイントラネットと接続可能としてもよい。 Next, an embodiment in which the present invention is applied to an endoscope system will be described using FIGS. 1 to 9. This endoscope system functions as a teacher data candidate image acquisition device (endoscope) for acquiring images for inference model learning. FIG. 1 is a block diagram mainly showing the electrical configuration of the endoscope system. This endoscope system includes an endoscope device 10 and a learning device 30. However, each block in the endoscope device 10 does not need to be provided in a single device, and may be divided into a plurality of devices. For example, the imaging unit 11 and each other block may be provided in separate devices. In this case, each unit other than the imaging unit 11 may be connectable to an intranet within a hospital or the like.
 また、内視鏡システムは、院内システムや電子カルテシステム等と連携をしてもよい。後述する記録部18に記録された画像等が、守秘義務等を有する場合には、内視鏡装置10から、例えば学習装置30に出力しないようにしてもよい。また、病院内に学習装置30が配置されている場合や、契約でデータのやり取り等が許可された場合には、学習装置30は病院内のシステムとしてもよい。ただし、以下の説明では、病院外の組織が学習装置を使って推論モデルを開発している状況を想定して説明しているが、これに限るものではない。また、学習装置30内の各ブロックも複数の装置に分割してもよく、また学習部34をサーバ内に配置するようにしてもよい。 Additionally, the endoscope system may be linked with an in-hospital system, an electronic medical record system, etc. If images and the like recorded in the recording unit 18, which will be described later, have confidentiality obligations, etc., the endoscope device 10 may not output them to the learning device 30, for example. Furthermore, if the learning device 30 is located within a hospital, or if the contract allows data exchange, etc., the learning device 30 may be an in-hospital system. However, although the following explanation assumes a situation where an organization outside the hospital is developing an inference model using a learning device, the present invention is not limited to this. Furthermore, each block within the learning device 30 may be divided into a plurality of devices, and the learning section 34 may be located within the server.
 内視鏡装置10は、撮像部11、画像処理部12、表示部12a、推論部13、差異判定部14、画像判定部15、状況判定部16、情報関連付け部17、記録部18、通信部19、および制御部20を有する。内視鏡装置10内の各ブロック等は、前述したように、内視鏡装置10から分離して他の装置としても使用してもよく、いわばIoT機器として発展することを想定して、端末機器と同様のハードウエアを搭載した例を示している。しかし、内視鏡装置10が学習装置30と連携する等によって、役割を分担すれば、省略が可能な機能ブロックもある。内視鏡装置10は、空洞や管状物の中に挿入して内部を観察するために、長い円筒状チューブからなる挿入部用の部分を備えているが、この挿入部は可撓性を有するものや有しないものがある。また挿入部の先端に撮像部11が設けられていることが多い。 The endoscope apparatus 10 includes an imaging section 11, an image processing section 12, a display section 12a, an inference section 13, a difference determination section 14, an image determination section 15, a situation determination section 16, an information association section 17, a recording section 18, and a communication section. 19, and a control section 20. As mentioned above, each block in the endoscope device 10 may be separated from the endoscope device 10 and used as another device. An example is shown that is equipped with the same hardware as the device. However, some functional blocks can be omitted if the roles are shared, such as by the endoscope device 10 cooperating with the learning device 30. The endoscope device 10 includes an insertion portion made of a long cylindrical tube for inserting into a cavity or a tubular object to observe the inside, but this insertion portion is flexible. There are things we have and things we don't have. Further, an imaging section 11 is often provided at the distal end of the insertion section.
 撮像部11は、撮像部、光源部、操作部および処置部等を有する部分を想定している。撮像部は、撮像用の光学レンズ、撮像素子、撮像制御回路等を有する。また、自動焦点調節機能を有する場合には、焦点検出回路、自動焦点調節装置等を有する。光学レンズは対象物の光学像を形成する。撮像素子は光学像が形成される位置付近に配置され、光学像を画像信号に変換し、この画像信号をAD変換した後に、画像処理部12に出力する。撮像制御回路は、制御部20から撮像開始指示を受けると、撮像素子から所定のレートで画像信号の読み出し制御等を行う。 The imaging unit 11 is assumed to have an imaging unit, a light source unit, an operation unit, a treatment unit, and the like. The imaging unit includes an optical lens for imaging, an imaging element, an imaging control circuit, and the like. Furthermore, if the camera has an automatic focus adjustment function, it includes a focus detection circuit, an automatic focus adjustment device, and the like. Optical lenses form optical images of objects. The image sensor is arranged near a position where an optical image is formed, converts the optical image into an image signal, performs AD conversion on this image signal, and then outputs it to the image processing section 12. When the imaging control circuit receives an instruction to start imaging from the control unit 20, it performs readout control of image signals from the imaging device at a predetermined rate.
 また、撮像部11内の光源部は、観察を容易にするために、体腔内の消化管壁等を明るくするための照明を行う。光源部は、レーザ光源、LED光源、キセノンランプ、ハロゲンランプ等の光源を含み、また照明用の光学レンズを有する。照明光の波長に応じて組織の検出特性が変わるので、光源部は光源の波長を変える機能を有してもよく、波長変更に応じて、公知の手法によって画像処理を変えてもよい。検出は必ずしも医師が目視によって行う必要はない。また撮像部11内の操作部は、内視鏡静止画像の撮影指示、内視鏡動画像の撮影開始と撮影終了指示を行うための操作部材や、操作に関連して動作する動作部、処置部、機能実行部等を有する。また内視鏡画像のフォーカスの切替指示を行うための操作部材を有していてもよい。上述の操作による変化が画像変化のパラメータとなり得る。 Furthermore, the light source section in the imaging section 11 provides illumination to brighten the walls of the digestive tract in the body cavity, etc., in order to facilitate observation. The light source section includes a light source such as a laser light source, an LED light source, a xenon lamp, and a halogen lamp, and also has an optical lens for illumination. Since tissue detection characteristics change depending on the wavelength of the illumination light, the light source unit may have a function of changing the wavelength of the light source, and image processing may be changed using a known method in accordance with the wavelength change. Detection does not necessarily need to be performed visually by a doctor. In addition, the operation unit in the imaging unit 11 includes operation members for instructing the shooting of endoscopic still images, the start and end of shooting of endoscopic moving images, operating units that operate in connection with operations, and treatment units. It has a function execution section, etc. Further, it may include an operation member for instructing switching of the focus of the endoscopic image. Changes caused by the above operations can become parameters for image changes.
 さらに、撮像部11内の操作部は、内視鏡の挿入部の先端を湾曲させるための操作用アングルノブを有する。また、可撓管を通じて体腔内に送気・送水する機能や、空気や液体を吸引する機能を実行するための機能実行部を有する。また、処置部は、組織の一部を採取する生体検査を行うための生検鉗子等の処置具等の機能実行部を有し、また、ポリープ等の患部を除去するためのスネア、高周波メス等の処置具を有していてもよい。これらの処置具等の機能実行部(大括りには操作部に分類してもよい)は、機能実行部を動作させる操作部材によって操作することができる。上述の操作に伴い、患部形状が変わったり、出血があったりすることがあり、また、操作の際に、熱等を使用する場合には、水蒸気や煙や水しぶき等が発生しうる。また、処置の際に、体内に留置させる治具もある。これらの操作変更の際の変化や、操作に伴う対象物の状況変化が画像変化のパラメータとなりうる。操作変更の際に、画像の明度・彩度・コントラストの変化が起こりうる。 Furthermore, the operating section in the imaging section 11 has an operating angle knob for bending the tip of the insertion section of the endoscope. It also has a function execution unit for executing the function of supplying air and water into the body cavity through the flexible tube, and the function of suctioning air and liquid. In addition, the treatment section has a function execution section such as treatment tools such as biopsy forceps for performing a biopsy to collect a part of tissue, and a snare and high-frequency scalpel for removing affected areas such as polyps. It may also have a treatment tool such as the following. The function execution unit (which may be broadly classified as an operation unit) of these treatment tools and the like can be operated by an operation member that operates the function execution unit. The above operations may change the shape of the affected area or cause bleeding, and if heat or the like is used during the operation, steam, smoke, water spray, etc. may be generated. There are also jigs that are left in the body during treatment. Changes that occur when these operations are changed and changes in the state of the object accompanying the operations can be parameters for image changes. Changes in image brightness, saturation, and contrast may occur when changing operations.
 画像処理部12は、画像処理回路を有し、撮像部11から画像信号を入力し、制御部20からの指示に応じて、現像処理等、種々の画像処理を施す。画像処理部12は、画像処理された画像データを、表示部12a、推論部13、画像判定部15、および通信部19に出力する。画像処理部12における画像処理として、画像の色味や明るさ調節の他、視認性を向上させるコントラスト強調やエッジ強調等の強調処理や、自然な階調にするための階調処理を行ってもよく、さらには、複数の画像コマを用いて画質を向上させるHDR(High Dynamic Range)処理や超解像処理等の複数枚合成等の処理を行ってもよい。画像処理部12は、画像コマ情報を視認可能な情報に画像処理する画像処理部(画像処理回路)として機能する。操作変更の際の変化が画像変化のパラメータとなる。なお、画像処理部12は、学習装置30内の画像処理部32に上述した機能を委ね、内視鏡装置10から省略してもよい。しかし、内視鏡装置10がIoT機器として独立性を求める場合は、内視鏡装置10内に画像処理部12を設けておけば、外部へ画像送信を行うことができるなど、自由度を増すことができる。 The image processing section 12 has an image processing circuit, receives image signals from the imaging section 11, and performs various image processing such as development processing in accordance with instructions from the control section 20. The image processing section 12 outputs the image data subjected to image processing to the display section 12a, the inference section 13, the image determination section 15, and the communication section 19. Image processing in the image processing unit 12 includes adjustment of the color and brightness of the image, as well as enhancement processing such as contrast enhancement and edge enhancement to improve visibility, and gradation processing to create natural gradations. Furthermore, processing such as HDR (High Dynamic Range) processing and super resolution processing, which improve image quality using a plurality of image frames, may be performed. The image processing unit 12 functions as an image processing unit (image processing circuit) that processes image frame information into visible information. Changes during operation changes become parameters for image changes. Note that the image processing section 12 may be omitted from the endoscope apparatus 10 by entrusting the above-mentioned functions to the image processing section 32 in the learning device 30. However, if the endoscope device 10 requires independence as an IoT device, providing the image processing unit 12 within the endoscope device 10 increases the degree of freedom, such as being able to transmit images to the outside. be able to.
 表示部12aは、ディスプレイ・モニタ等の表示装置および表示制御回路を有する。表示部12aは、制御部20による制御信号に従って、画像処理部12によって処理された画像データを入力し、内視鏡画像等を表示する。また、推論部13による推論結果が重畳された内視鏡画像等の表示を行うようにしてもよい。 The display unit 12a has a display device such as a display monitor and a display control circuit. The display section 12a receives image data processed by the image processing section 12 according to control signals from the control section 20, and displays endoscopic images and the like. Further, an endoscopic image or the like on which the inference result by the inference unit 13 is superimposed may be displayed.
 推論部13は、推論エンジンを備えており、撮像部11からの画像コマの画像データ等を入力し、推論を行う。推論エンジンには、このアドバイスを推論するための推論モデルが設定されている。推論エンジンは、ハードウエアによって構成されていてもよく、またソフトウエアによって構成されていてもよい。なお、推論部13は、順伝搬型ニューラル・ネットワーク等を備えていてもよく、この順伝搬型ニューラル・ネットワークについては、学習部34において説明する。推論エンジンに入力するデータとしては、画像コマの画像データに限らず、コマ画像の取得時における状況情報(関連情報、補助情報)を入力し、推論するようにしてもよい。状況情報を用いて推論することによって、より信頼性の高い推論結果を得ることができる。なお、推論は、コマ毎に行うが、1コマ毎に限らず、推論結果の視認性等の要求に従って、数コマおきに行ってもよく、また連続する数コマを推論部に入力し、推論を行うようにしてもよい。 The inference unit 13 includes an inference engine, receives image data of image frames from the imaging unit 11, and performs inference. An inference model for inferring this advice is set in the inference engine. The inference engine may be configured by hardware or software. Note that the inference unit 13 may include a forward propagation neural network or the like, and this forward propagation neural network will be explained in the learning unit 34. The data input to the inference engine is not limited to the image data of the image frame, but situational information (related information, auxiliary information) at the time of obtaining the frame image may be input to perform inference. By making inferences using situational information, more reliable inference results can be obtained. Note that inference is performed frame by frame, but it is not limited to each frame, but may be performed every few frames according to requirements such as visibility of the inference results, or by inputting several consecutive frames to the inference section. You may also do this.
 推論部13は、画像データを入力し、機械学習によって生成された推論モデルを用いて推論することによって、例えば、医師に診断の際の支援を行うためのアドバイスを推論する。また、推論部13は、撮像部11によって取得された画像の中に写っている対象物(患部や臓器等)が何であるかを特定したり、その位置を特定したり、その患部等の範囲を特定したり、その画像を区分けしたり(セグメンテーション)、分類したり、良否判定等の推論を行ってもよい。推論部13は、制御部20からの指示に従って、推論を行い、推論結果を制御部20に出力する。また、推論部13は、推論を行った際に、その推論の信頼性を算出し、算出された信頼値を差異判定部14に出力する。 The inference unit 13 inputs image data and infers using an inference model generated by machine learning, thereby inferring advice for providing support to a doctor during diagnosis, for example. The inference unit 13 also specifies the object (affected part, organ, etc.) shown in the image acquired by the imaging unit 11, specifies its position, and determines the range of the affected part, etc. It is also possible to perform inferences such as specifying the image, segmenting the image, classifying it, and determining whether it is good or bad. The inference unit 13 performs inference according to instructions from the control unit 20 and outputs the inference result to the control unit 20. Furthermore, when inference is made, the inference unit 13 calculates the reliability of the inference and outputs the calculated reliability value to the difference determination unit 14 .
 推論部13の推論仕様によって、推論結果の使い方(表示のさせ方等)が変化し、推論結果の差異の検出方法も様々な方法を取りうる。例えば、良否判断を推論する場合には〇・×表示が時間的に変化したり、位置特定を推論する場合には画面内の指定位置の座標が時間に変化したり、セグメンテーション(対象物の占める範囲)を推論する場合であれば、画面内の各セグメントの形状や面積が時間的に変化するので、これらを推論結果の変化として利用できる。 Depending on the inference specifications of the inference unit 13, how the inference results are used (how to display them, etc.) changes, and various methods can be used to detect differences in the inference results. For example, when inferring pass/fail judgment, the 〇/× display changes over time; when inferring location, the coordinates of a specified position on the screen change over time; When inferring a range), the shape and area of each segment within the screen change over time, so these can be used as changes in the inference result.
 なお、推論部13に入力する画像は、撮像部11からの画像であり、ここでは内視鏡等の医療用画像を想定している。しかし、医療用画像としては、内視鏡装置によって取得された内視鏡画像に限らず、レントゲン装置やMRI装置、超音波診断装置、皮膚科用カメラ、歯科用カメラ等の医療機器によって取得される画像であってもよい。内視鏡画像を入力画像として使用する場合には、本実施形態では、内視鏡ならでの視野急変について対応することができる。 Note that the image input to the inference unit 13 is an image from the imaging unit 11, and here it is assumed that it is a medical image from an endoscope or the like. However, medical images are not limited to endoscopic images acquired by endoscopic devices, but can also be acquired by medical equipment such as X-ray machines, MRI machines, ultrasound diagnostic machines, dermatology cameras, dental cameras, etc. It may also be an image. When using an endoscopic image as an input image, this embodiment can deal with sudden changes in visual field that are unique to an endoscope.
 さらに、推論部13に入力する入力画像は医療用画像に限らなくてもよい。例えば、工業用内視鏡、ロボット車載ドローン搭載のカメラでも視野の急変が起こり得る。また、内視鏡は狭い挿入孔から挿入して、使用する機器である。このため、細い外径の管状の先端部を有し、先端部が揺れ易くなってしまい、この揺れが取得画像に大きく影響する。対象物が近距離の場合には、その変化が特に大きくなる。先端部の上下左右のゆれは構図の変化となるし、先端部の挿入方向の揺れは対象物の大きさの変化につながる。上下左右の変化によって余計なものが写り込むことがあり、また相対的な距離が変化することによって、対象物の大きさが変わることがある。また、ねじれによって画像が回転した場合や、観察対象に撮像部が正対せず斜めから観察する場合等では、画像に歪等が生じ得る。これらの状況変化も画像を変化させうるパラメータとなり、繰り返しになるが複合的に画像が大きく変化することがある。これの変化は、内視鏡に限らず、画像を使用する多くの検査装置で起こりうる。 Furthermore, the input image input to the inference unit 13 does not have to be limited to a medical image. For example, sudden changes in field of view can occur with industrial endoscopes and cameras mounted on robotic vehicle-mounted drones. Furthermore, an endoscope is a device that is inserted and used through a narrow insertion hole. For this reason, it has a tubular tip portion with a narrow outer diameter, and the tip portion tends to shake, and this shaking greatly affects the acquired image. When the object is close, the change becomes particularly large. Shaking the tip vertically and horizontally changes the composition, and shaking the tip in the direction of insertion leads to a change in the size of the object. Changes in the vertical and horizontal directions may cause unnecessary objects to appear in the image, and changes in relative distance may cause the size of the object to change. Further, when the image is rotated due to twisting, or when the imaging unit is not directly facing the object to be observed and is observed from an oblique angle, distortion or the like may occur in the image. These situational changes also become parameters that can change the image, and as I repeat, the image may change significantly in a complex manner. This change can occur not only in endoscopes but in many inspection devices that use images.
 推論部13は、撮像して得た画像(内視鏡画像でも良い)に含まれる特定対象物画像を、機械学習モデルを用いて推論する推論部(推論エンジン)として機能する(例えば、図2および図5のS1、図7の推論部44参照)。また、推論部13は、入力画像を、機械学習モデルを用いて推論する推論部(推論エンジン)として機能する(例えば、図2および図5のS1、図7の推論部44参照)。上述の入力画像は、医療用画像である。上述の推論部は、入力画像の分類、検出、および領域抽出の少なくとも1つを推論する(例えば、図2および図5のS1参照)。また、推論部は、推論の信頼度を出力する(例えば、図2および図5のS1参照)。 The inference unit 13 functions as an inference unit (inference engine) that infers a specific object image included in a captured image (an endoscopic image may be used) using a machine learning model (for example, as shown in FIG. (See S1 in FIG. 5 and the inference unit 44 in FIG. 7). Further, the inference unit 13 functions as an inference unit (inference engine) that infers an input image using a machine learning model (for example, see S1 in FIGS. 2 and 5, and the inference unit 44 in FIG. 7). The input image described above is a medical image. The above-mentioned inference unit infers at least one of classification, detection, and region extraction of the input image (for example, see S1 in FIGS. 2 and 5). The inference unit also outputs the reliability of the inference (for example, see S1 in FIGS. 2 and 5).
 差異判定部14は、差異判定回路を含んでいてもよく、またCPU等を含むプロセッサがプログラムを実行することによって実現するようにしてもよい。差異判定部14は、推論部13による推論結果の差異(この差異については、既に例を挙げて説明している)を算出する。すなわち、推論部13は、コマ毎に推論を行い、推論結果を出力するので、差異判定部14は、推論結果の差異を算出する。差異(変化)を算出するにあたって、推論結果の微分値を用いてもよい。また、差異(変化)の算出にあたって、推論部13によって算出された信頼性を示す値の変化を算出してもよい。すなわち、推論部13は、画像データを入力する毎に推論を行い、推論結果について信頼性を算出するので、この信頼値の変化に基づいて、差異を判定してもよい。また、後述するように、画像判定部15が類似画像群に属する画像であるか否かを判定するので、差異判定部14は、類似画像群に含まれる個々の画像について、差異を判定する。 The difference determination unit 14 may include a difference determination circuit, or may be realized by a processor including a CPU or the like executing a program. The difference determining unit 14 calculates the difference between the results of the inference by the inference unit 13 (this difference has already been explained using an example). That is, since the inference unit 13 performs inference for each frame and outputs the inference result, the difference determination unit 14 calculates the difference between the inference results. In calculating the difference (change), the differential value of the inference result may be used. Furthermore, in calculating the difference (change), a change in the value indicating reliability calculated by the inference unit 13 may be calculated. That is, since the inference unit 13 performs inference every time image data is input and calculates the reliability of the inference result, a difference may be determined based on a change in this reliability value. Furthermore, as will be described later, since the image determining unit 15 determines whether the image belongs to a similar image group, the difference determining unit 14 determines the difference between each image included in the similar image group.
 差異判定部14は、類似画像群判定部によって判定された類似画像群に含まれる各々の画像について、推論部による推論結果の差異を算出する推論結果差異算出部として機能する(例えば、図2および図5のS5参照)。推論結果の差異の算出としては、例えば、特定対象物の位置の変化、大きさの変化、推論した範囲の変化等の少なくとも1つを算出すればよい。 The difference determination unit 14 functions as an inference result difference calculation unit that calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit (for example, in FIGS. (See S5 in FIG. 5). To calculate the difference between the inference results, for example, at least one of a change in the position of the specific object, a change in the size, a change in the inferred range, etc. may be calculated.
 画像判定部15は、画像判定回路を含んでいてもよく、またCPU等を含むプロセッサがプログラムを実行することによって実現するようにしてもよい。画像判定部15は、画像処理部12から時系列的に入力する画像データについて、類似画像群に属するか否かを判定する。医師等が、撮像素子が配置された内視鏡装置の先端部を体腔内に挿入すると、撮像素子が内視鏡画像を取得する。先端部が、患部等の特定対象物に近づくと、医師等は患部等をじっくり観察するために、患部付近の画像が多く取得される。画像判定部15は、この特定対象物が含まれるような画像群であるか否かを判定してもよい。なお、特定対象物は、内視鏡等によって観察する・検査対象であり、特定対象物画像は、画面内の特定対象物の画像の部分を意味する。内視鏡画像には色々な対象物が写り込むが、これらの対象物の中において、AI(推論部)の仕様として検出対象として決定した対象物を特定対象物という。例えば、上述の内視鏡画像の特徴として、先端の微小の揺れによって、取得画像内の対象物が変化し、画像内における特定対象物の位置が急変したり、余計なものが画像内に入り込むことが多い。このため、対象物の位置や大きさが激しく変化してしまう。 The image determination unit 15 may include an image determination circuit, or may be realized by a processor including a CPU or the like executing a program. The image determination unit 15 determines whether or not the image data inputted in time series from the image processing unit 12 belongs to a similar image group. When a doctor or the like inserts the distal end of an endoscopic device in which an imaging device is arranged into a body cavity, the imaging device acquires an endoscopic image. When the distal end approaches a specific object such as an affected area, many images of the vicinity of the affected area are acquired in order for a doctor or the like to carefully observe the affected area. The image determination unit 15 may determine whether the image group includes this specific target object. Note that the specific object is an object to be observed or inspected with an endoscope or the like, and the specific object image refers to the portion of the image of the specific object within the screen. Various objects appear in an endoscopic image, and among these objects, an object that has been determined as a detection target according to the specifications of the AI (inference unit) is called a specific object. For example, as a characteristic of the endoscopic image mentioned above, the object in the acquired image changes due to minute vibrations of the tip, the position of a specific object in the image changes suddenly, or unnecessary objects enter the image. There are many things. Therefore, the position and size of the object change drastically.
 画像判定部15は、画像の内容を数値化し、類似画像であるか否か、また連続した入力ペア画像がほぼ変化していないか否かを判定する。この数値は、画像内の同一対象物のパターンを追尾できるような値である。この数値は前述したように結果等の変化を時系列に検出すればよい。また、時間的に隣接する画像(または画像の推論結果)等を比較してもよい。画像の類似度を示すための数値化は、公知の手法を適宜採用すればよい。ここでは、構図や色や明るさ等の数値を想定しており、対象物の大きさの変化や対象物が画面内にあるかどうか等も、この数値で判定できるものとする。 The image determination unit 15 digitizes the content of the image and determines whether the images are similar or not, and whether or not the continuous input pair images are substantially unchanged. This numerical value is such that the pattern of the same object in the image can be tracked. This numerical value may be determined by detecting changes in results, etc. in time series, as described above. Furthermore, temporally adjacent images (or inference results of images), etc. may be compared. A known method may be used as appropriate to quantify the similarity of images. Here, numerical values such as composition, color, brightness, etc. are assumed, and it is assumed that changes in the size of the target object, whether the target object is within the screen, etc. can also be determined using these numerical values.
 画像判定部15が、類似画像群を判定するにあたって、画像の内容を数値化して行うが、時間的に連続して撮像した画像の中、すなわち一連の画像群の中には、数値が類似しているといは言えないような非類似の画像が含まれていてもよい。図6を用いて後述するように、内視鏡の先端部は細かな動きがあり、このため、凝視が難しく、対象物が画像内から外れ易いという特性がある。そのため、類似画像群は、所定のコマ数に亘って、同一部位を観察している場合、観察途中に非類似画像を含んでもよい。類似画像群の判定については、図3を用いて後述する。また、画像判定部15は、入力した2つの画像がほぼ変化していないことを、例えば、対応点の移動量を算出して判定してもよく、また画像の明度・彩度・コントラストの変化量の内の少なくとも1つに基づいて2つの画像がほぼ変化していないことを判定してもよい。 When the image determination unit 15 determines similar image groups, it converts the content of the images into numerical values. may include dissimilar images that cannot be said to be similar. As will be described later with reference to FIG. 6, the distal end of the endoscope has small movements, which makes it difficult to stare at it and makes it easy for objects to fall out of the image. Therefore, when the same region is observed over a predetermined number of frames, the similar image group may include dissimilar images during the observation. The determination of similar image groups will be described later using FIG. 3. The image determining unit 15 may also determine that the two input images have not changed substantially by, for example, calculating the amount of movement of corresponding points, or may determine that the two input images have not changed substantially. It may be determined that the two images are substantially unchanged based on at least one of the quantities.
 画像判定部15は、時間的に連続して撮像して得た画像(内視鏡画像でもよい)の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて判定する類似画像群判定部として機能する(例えば、図2および図5のS3、図3、図7の画像同定部43参照)。また、上述の類似画像群判定部は、内視鏡画像の画像内の同一対象物のパターンを追尾できるような数値に数値化し、数値を用いて類似画像であるか否かを判定する(例えば、図3参照)。観察者は、内視鏡検査を行う場合、肉眼で対象物を十分に確認するために、同一の対象物を凝視したり、観察方法を変えながら特定対象物の確認を行う。したがって、このような観察に必要な時間に相当するコマ数が所定のコマ数である。また、上述したように、内視鏡で観察する場合には、先端部は空間上で固定できず微動してしまうので凝視することが難しく、観察画像が対象物から外れ易くなってしまう。類似画像群としては、上述したような苦手画像や見逃し画像に相当する診断に重要な教師画像候補が含まれる可能性のある画像(例えば、同一部位を観察している画像等)を含めるようにしている。このため、類似画像が含まれる可能性のある画像群の中にある非類似画像も含めて類似画像群と判定する。 The image determination unit 15 compares the contents of images (which may be endoscopic images) obtained by temporally consecutive imaging, and selects a group of similar images in which the same region is observed over a predetermined number of frames. , functions as a similar image group determination unit that performs determination including dissimilar images (for example, see S3 in FIGS. 2 and 5, and the image identification unit 43 in FIGS. 3 and 7). Further, the similar image group determination unit described above converts the pattern of the same object in the endoscopic image into a numerical value that can be tracked, and uses the numerical value to determine whether or not the images are similar (for example, , see Figure 3). When performing an endoscopy, an observer may stare at the same object or change the observation method to confirm a specific object in order to sufficiently confirm the object with the naked eye. Therefore, the number of frames corresponding to the time required for such observation is the predetermined number of frames. Furthermore, as described above, when observing with an endoscope, the distal end cannot be fixed in space and moves slightly, making it difficult to stare at the object and making it easy for the observed image to deviate from the object. Similar image groups should include images that may include teacher image candidates important for diagnosis (e.g., images in which the same body part is observed), which correspond to weak images or missed images as described above. ing. Therefore, a group of images that may include similar images is also determined to be a group of similar images, including dissimilar images.
 上述の類似画像群は、急変する視点位置移動に相当する画像群を、時間的に連続して撮像して得た内視鏡画像の前後の画像の類似性に従って判定した結果を含んでいてもよい。また、類似画像群は、サッカードと呼ばれる無意識の眼球運動によって生じる視覚イメージのぶれ(人間の場合は、脳でこれを補正しているので自覚されない)に相当する急変する画像群を、時間的に連続して撮像して得た内視鏡画像の前後に、画像の類似性に従って判定した結果を含むようにしてもよい。上述のサッカードに相当する画像は、図4および図6を用いて後述するが、眼球移動に相当する画像である。ここで言う、サッカード相当の画像変化とは、内視鏡の先端部と対象物との相対的位置関係の変化によって、すなわち、至近距離の対象物と、内視鏡先端部等に併設した光源との位置関係によって画像が急変することを概念的に表現したものである。 The above-mentioned similar image group may include the results determined according to the similarity of images before and after endoscopic images obtained by sequentially capturing a group of images corresponding to a sudden change in viewpoint position. good. In addition, the similar image group is a group of rapidly changing images that corresponds to the blurring of visual images caused by unconscious eye movements called saccades (in the case of humans, this is corrected by the brain and is not noticed). The results determined according to the similarity of the images may be included before and after the endoscopic images obtained by consecutively capturing images. The image corresponding to the above-mentioned saccade will be described later using FIGS. 4 and 6, but it is an image corresponding to eyeball movement. Here, the image change equivalent to a saccade is caused by a change in the relative positional relationship between the tip of the endoscope and the object, that is, a change in the relative position between the tip of the endoscope and the object. This is a conceptual representation of the sudden change in the image depending on the positional relationship with the light source.
 上述のサッカード相当の画像変化は、内視鏡等を用いた検査の際における特殊性に関係している。つまり、同じように観察しようとしても対象物や撮像部に動きがあると、比較的微小な相対的変化にもかかわらず大きな画像変化となる。つまり、対象物が画面内に捉えられていても、その大きさや画面内の位置が急変したり、露出や明るさの分布が急変したりしやすい。しかも、こうした画像は狙って撮れるものではなく、状況によって起こったり起こらなかったりするので、意図的な教師データにすることが困難な画像でもあった。 The above-mentioned image change equivalent to a saccade is related to the special characteristics of an examination using an endoscope or the like. In other words, even if you try to observe the object in the same way, if there is movement in the object or the imaging unit, the image will change significantly even though the relative change is relatively small. In other words, even if an object is captured within the screen, its size and position within the screen are likely to change suddenly, and its exposure and brightness distribution are likely to change suddenly. Moreover, these images cannot be taken on purpose, and may or may not occur depending on the situation, so it is difficult to use them as intentional training data.
 上述したようなサッカード相当の画像変化と同様の画像急変が起こる状況は、内視鏡画像では起こりやすい。対象物が動きがちなうえに、対象物が体液に浸されたり洗浄液に浸されたり、吸引や送気などの内視鏡観察時の操作を行ったり、また特殊光観察を行ったり、染色を行ったり、処置具を連携して操作する等、対象物観察のために様々な手法が併用される場合も多い。これらが同時に起こり、複合的に変化する場合もあり、意図せぬ(あるいは無意識の)画像急変が極めて起こりやすい。このような状況に対応して、正しく対象物を確認することが望ましい。 A situation in which a sudden image change similar to the image change equivalent to a saccade as described above occurs is likely to occur in endoscopic images. In addition to the fact that the object tends to move, the object may be immersed in body fluids or cleaning fluids, may be subjected to operations during endoscopic observation such as suction or air supply, special light observation, or staining. In many cases, various methods are used in combination to observe the object, such as moving the object and operating the treatment instruments in conjunction. In some cases, these things occur simultaneously, leading to complex changes, and unintended (or unconscious) sudden changes in the image are extremely likely to occur. In response to such situations, it is desirable to correctly confirm the target object.
 このように、サッカード相当の画像変化(急変)は原因となるパラメータが多く、複合的に起こりうるため、機械学習時に、すべての複合原因のパターンを用意するのが困難な場合が多い。これらの急変時の画像は、パターンマッチングを用いた類似度判定や、画像を数値化して行う類似度判定を行った場合に、類似画像ではないとして判定される可能性がある。つまり、内容が異なる非類似画像と判定される可能性がある。なお、パターンマッチングは、画素毎に、あるいは画面内の座標毎の数値データを二次元的に扱ったものなので、データを数値比較していると言える。 In this way, image changes (sudden changes) equivalent to saccades have many causal parameters and can occur in a complex manner, so it is often difficult to prepare patterns for all complex causes during machine learning. Images at the time of these sudden changes may be determined not to be similar images when a similarity determination using pattern matching or a similarity determination performed by quantifying the images is performed. In other words, there is a possibility that the images are determined to be dissimilar images with different contents. Note that pattern matching is two-dimensional handling of numerical data for each pixel or for each coordinate within a screen, so it can be said to compare data numerically.
 類似画像が時間的に連続して取得されている場合には、単純には非類似とされてしまう画像であっても重要な画像が含まれている可能性が高いので、本実施形態においては、上述の条件を満たす場合の非類似画像は類似画像のグループを構成する画像として扱っている。つまり、内視鏡用推論モデル学習用の内視鏡画像を取得するための教師データ候補画像取得において、時間的に連続して撮像して得た内視鏡画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて類似画像群判定を行っている。画像の内容とは、撮像されているものの対象物の形状やパターンや陰影や色や大きさ、位置(回転や歪などもあるが、それを補正したものでもよい)など画像を特徴づけるもの(画像特徴)である。 When similar images are acquired sequentially in time, there is a high possibility that important images are included even if the images are simply considered dissimilar. , dissimilar images that satisfy the above conditions are treated as images that form a group of similar images. In other words, when acquiring training data candidate images for acquiring endoscopic images for learning endoscopic inference models, the contents of endoscopic images obtained by temporally consecutive imaging are compared, and a predetermined A similar image group determination is performed on a group of similar images in which the same part is observed over a number of frames, including dissimilar images. The content of an image refers to the characteristics of the image, such as the shape, pattern, shadow, color, size, and position (including rotation and distortion, but it may also be corrected) of the object being imaged. image features).
 状況判定部16は、状況判定回路を含んでいてもよく、またCPU等を含むプロセッサがプログラムを実行することによって実現するようにしてもよい。状況判定部16は、医師等が使用する内視鏡装置10の使用状況に関する情報を判定する。このために、状況判定部16は、画像処理部12によって取得された画像データに基づいて、例えば、撮像素子が設けられている先端部と、体腔内の壁面との位置関係が正対しているか、斜めであるかを判定してもよい。また、ピントが合っているか否か、また奥行がどの位ある等について判定してもよい。また、撮像部11内の光源部の使用状況を判定してもよい。例えば、光源光の波長が分かれば、狭帯域光観察(NBI:Narrow Band Imaging)を行っているか否かを判定できる。さらに、撮像部11内の処置部の使用状況を判定してもよい。例えば、医師等が注水操作を行えば、画像に注水の影響が現れ、吸引操作を行えば、吸引の影響が現れる。 The situation determination unit 16 may include a situation determination circuit, or may be realized by a processor including a CPU or the like executing a program. The situation determination unit 16 determines information regarding the usage status of the endoscope apparatus 10 used by a doctor or the like. For this purpose, the situation determining unit 16 determines, based on the image data acquired by the image processing unit 12, whether the positional relationship between the distal end portion where the imaging device is provided and the wall surface in the body cavity is directly facing each other. , it may be determined whether it is oblique. Further, it may be determined whether the image is in focus or not, how deep the image is, etc. Further, the usage status of the light source section in the imaging section 11 may be determined. For example, if the wavelength of the light source light is known, it can be determined whether narrow band imaging (NBI) is being performed. Furthermore, the usage status of the treatment section within the imaging section 11 may be determined. For example, if a doctor or the like performs a water injection operation, the effect of water injection will appear on the image, and if a doctor or the like performs a suction operation, the effect of suction will appear.
 状況判定部16の判定結果は、制御部20に出力される。後述するように、画像データに、そのときの状況判定結果が関連付けられる。先端部が、患部等に正対しておらず、斜めの位置にある場合には、患部等が見えにくく、推論が困難になり易い。また、注水操作を行ったような場合には、画面が水の影響で、患部等が見えにくく、推論が困難になり易い。このような状況情報があれば、この状況情報と共に画像を推論すれば、患部発見の推論の信頼性を向上させることが可能となる。状況判定部16は、学習画像選択部によって選択された各々の画像に対応する撮像情報を取得可能な撮像情報取得部として機能する(例えば、図5のS6参照)。 The determination result of the situation determination section 16 is output to the control section 20. As will be described later, the image data is associated with the situation determination result at that time. If the distal end is not directly facing the affected area, etc., but is at an oblique position, the affected area, etc. is difficult to see, making it difficult to make inferences. Furthermore, when a water injection operation is performed, the screen is affected by the water, making it difficult to see the affected area, making it difficult to make inferences. If such situational information is available, and images are inferred together with this situational information, it becomes possible to improve the reliability of inference for finding affected areas. The situation determination unit 16 functions as an imaging information acquisition unit that can acquire imaging information corresponding to each image selected by the learning image selection unit (for example, see S6 in FIG. 5).
 情報関連付け部17は、情報関連付け回路を含んでいてもよく、またCPU等を含むプロセッサがプログラムを実行することによって実現するようにしてもよい。情報関連付け部17は、画像処理部12によって処理された画像データに対して、状況判定部16によって判定された状況情報や、画像判定部15によって判定された類似画像情報や、差異判定部14によって判定された推論の信頼値の差異に関する情報等の少なくとも1つを関連付ける。情報関連付け部17によって情報が関連付けられた画像データは、記録部18に記録される。また、差異判定部14によって推論の信頼値に所定以上の差異があると判定された画像データについて、通信部19にその旨が通知される。後述するように、通信部19は、通知された画像データを学習装置30に送信し、教師データとして、再学習する際に使用する。 The information association unit 17 may include an information association circuit, or may be realized by a processor including a CPU or the like executing a program. The information association unit 17 uses the situation information determined by the situation determination unit 16, the similar image information determined by the image determination unit 15, and the difference determination unit 14 for the image data processed by the image processing unit 12. At least one piece of information, such as information regarding a difference in confidence values of the determined inferences, is associated. The image data associated with information by the information association section 17 is recorded in the recording section 18 . Further, regarding the image data for which the difference determination unit 14 determines that the inference reliability value has a difference greater than a predetermined value, the communication unit 19 is notified of this fact. As will be described later, the communication unit 19 transmits the notified image data to the learning device 30, and uses it as teacher data when relearning.
 記録部18は、電気的に書き換え可能な不揮発性メモリおよび/または揮発性メモリを有する。この記録部18は、制御部20内のCPU等を動作させるためのプログラムを記憶する。また、内視鏡装置10の機種情報や、内視鏡10内の種々の特性値や調整値等も記憶する。さらに、画像処理部12によって処理され、情報関連付け部17によって情報が関連付けられた画像データを記録するようにしてもよい。記録部18において記録されるデータ構造の詳細については、図8および図9を用いて説明する。なお、記録部18には、図8および図9に示すような情報の全部を記録せずに、学習装置30内の記録部33と分担してもよい。 The recording unit 18 includes electrically rewritable nonvolatile memory and/or volatile memory. This recording unit 18 stores a program for operating the CPU and the like in the control unit 20. It also stores model information of the endoscope device 10, various characteristic values and adjustment values within the endoscope 10, and the like. Furthermore, image data processed by the image processing section 12 and associated with information by the information association section 17 may be recorded. Details of the data structure recorded in the recording unit 18 will be explained using FIGS. 8 and 9. Note that the recording unit 18 may not record all of the information as shown in FIGS. 8 and 9, and may be shared with the recording unit 33 in the learning device 30.
 上述したように、記録部18は、撮像部11によって取得され、画像処理部12によって処理された元の(オリジナルの)動画像の画像データと、情報関連付け部17によって情報が関連付けられた画像データを記録する。元の動画像を、記録部18に記録するのは、後に記録内容をもとに、推論モデルの改良を行うためである。したがって、選択された画像データそのものである必要はなく、その画像データの検索等を行うことによって、簡単に見つけ出せるようにしておけばよい。つまり、もとの検査時の動画をそのまま記録しておき、記録画像の中から、特定コマを選択できるような情報でもよい。例えば、検査開始時の画像とか、特定対象物の検出時の画像等を示すような、例えば特定のタイミングからの何コマ目の画像とか、何分何秒の画像とか等、特定のコマを示すための情報でもよい。これらの情報については、図8および図9を用いて後述する。 As described above, the recording unit 18 stores the image data of the original moving image acquired by the imaging unit 11 and processed by the image processing unit 12, and the image data with information associated with it by the information association unit 17. Record. The reason why the original moving image is recorded in the recording unit 18 is to improve the inference model later based on the recorded content. Therefore, it is not necessary that the image data is the selected image data itself, but it is sufficient that the image data can be easily found by searching for the image data. In other words, the information may be such that the original examination video is recorded as is and a specific frame can be selected from among the recorded images. For example, it shows a specific frame, such as an image at the start of an inspection, an image at the time of detection of a specific object, etc. It can also be information for This information will be described later using FIGS. 8 and 9.
 また、再学習等を行って推論モデルを改良する場合や、特定の状況用にカスタマイズする場合や、さらに汎用性を高める場合や、あるいは新しい推論モデルを作成する場合等、様々な推論モデル作成の仕様や方向性がある。このため、推論モデルの仕様や方向性に対応して、必要な関連データを併せて記録しておくことが好ましい。例えば、特定のプロフィール(性別や年代層や人種、病歴など)に合わせ込むような推論モデルを作成する場合や改良する場合には、どのようなプロフィールの検査結果であるか等を整理して記録しておいた方が、最適な教師データ候補を選択しやすくなる。また、データ取得時の様々な背景を表すデータ等も、実際のデータ活用時には役立ち、どの病院でどの医師が検査したものか、個人情報の有無とかインフォームドコンセントなど各種契約、利用条件等の有無などの情報を記録できるようにしておいてもよい。 In addition, there are various ways to create an inference model, such as when improving an inference model by relearning, customizing it for a specific situation, increasing its versatility, or creating a new inference model. There are specifications and direction. Therefore, it is preferable to record necessary related data in accordance with the specifications and direction of the inference model. For example, when creating or improving an inference model that fits a specific profile (gender, age group, race, medical history, etc.), it is necessary to organize and record the test results for which profile. If you do this, it will be easier to select the optimal training data candidates. In addition, data representing the various backgrounds at the time of data acquisition is useful when actually using the data, such as which hospital and which doctor performed the test, the presence or absence of personal information, and the presence or absence of various contracts such as informed consent, terms of use, etc. It may also be possible to record information such as:
 上述したように、記録部18に画像を記録する場合に、関連情報も関連付けて記録するとよい。例えば、図4の例において、信頼性判断結果としてのバツ(×)を学習データにする場合に、丸(〇)の情報を参考にアノテーションできる可能性があるので、座標および/または判定結果等を参考値として記録しておいてもよい。図4の例において、〇××〇と信頼性判断しており、〇と〇が同様の判定をしている場合、その間にある×は〇の画像を参考にして、教師データ化できる可能性がある。動画取得後に、アノテーターもしくはロボットが関連情報(信頼性の判断結果)を参考に、簡単に×の画像に適切なアノテーションをすることが可能になる。 As described above, when recording an image in the recording unit 18, it is preferable to record related information in association with the image. For example, in the example of Fig. 4, when using the cross (×) as the reliability judgment result as learning data, it is possible to annotate it with reference to the information of the circle (〇), so the coordinates and/or the judgment result, etc. may be recorded as a reference value. In the example in Figure 4, if the reliability is determined to be 〇××〇, and if 〇 and 〇 are judged to be the same, then the × in between can be converted into training data by referring to the image of 〇. There is. After acquiring the video, an annotator or robot can easily annotate the X image appropriately by referring to related information (reliability judgment results).
 通信部19は、通信回路(送受信回路)を有し、学習装置30内の通信部31と通信を行う。通信部19は、記録部18に記録された情報から必要な情報を選択して送信することによって、通信や記録の負荷を減らすことができる。また、必要ないデータは送信しないようにして、セキュリティを高めることが出来る。例えば、撮像部11が取得した画像であって、推論結果に差異のある画像を学習用に学習装置30に送信する。但し、制約がない場合や、契約によって、全部の情報(画像を含む)を送信してもよい。この際、情報関連付け部17によって情報が関連付けられた画像データであって、差異判定部14によって類似画像群に含まれる各画像の推論結果に差異がある画像データを、通信部19は学習装置30内の通信部31に送信してもよい。また、推論モデルを置き換える時等にも通信を利用でき、学習装置30内の学習部34によって生成された推論モデルを、通信部31を通じて、通信部19は受信する。その他、それぞれの要求信号等の授受を行い、要求信号に応じて、条件を満たす情報等の送受信を行う。 The communication unit 19 has a communication circuit (transmission/reception circuit), and communicates with the communication unit 31 in the learning device 30. The communication unit 19 can reduce the communication and recording load by selecting and transmitting necessary information from the information recorded in the recording unit 18. Additionally, security can be increased by not sending unnecessary data. For example, images acquired by the imaging unit 11 that have different inference results are transmitted to the learning device 30 for learning. However, if there are no restrictions or based on a contract, all information (including images) may be sent. At this time, the communication unit 19 sends the image data that has been associated with information by the information association unit 17 and that has a difference in the inference result of each image included in the similar image group by the difference determination unit 14 to the learning device 30. It may also be transmitted to the communication section 31 inside. Communication can also be used when replacing an inference model, and the communication unit 19 receives the inference model generated by the learning unit 34 in the learning device 30 through the communication unit 31. In addition, each request signal is sent and received, and information that satisfies the conditions is sent and received in accordance with the request signal.
 制御部20は、CPU(Central Processing Unit)等とその周辺回路およびメモリ等を有するプロセッサである。このプロセッサは1つであってもよく、また複数のチップから構成されてもよい。CPUはメモリに記憶されたプログラムに従って、内視鏡装置10内の各部を制御することによって内視鏡装置10の全体を実行する。内視鏡装置10内の各部は、CPUによってソフトウエア的に制御することによって実現される。前述の差異判定部14、画像判定部15、状況判定部16、情報関連付け部17の全てまたはその一部を、制御部20内のプロセッサによって実現してもよい。上述のプロセッサは、類似画像群判定部、画像同定部、推論結果差異算出部、推論結果変化算出部、学習画像選択部、撮影情報取得部の全てまたは一部を実現してもよい。撮像部11、画像処理部12、推論部13についても同様に、制御部20内のプロセッサが全部またはその一部の機能を実現するようにしてもよい。また、制御部20は、学習装置30内の制御部35と協調して動作し、内視鏡装置10と学習装置30が一体となって動作するようにしてもよい。 The control unit 20 is a processor that includes a CPU (Central Processing Unit), its peripheral circuits, memory, and the like. This processor may be one, or may be composed of multiple chips. The CPU executes the entire endoscope apparatus 10 by controlling each part within the endoscope apparatus 10 according to a program stored in the memory. Each part within the endoscope apparatus 10 is realized by being controlled by software using a CPU. All or part of the difference determination section 14, image determination section 15, situation determination section 16, and information association section 17 described above may be realized by a processor within the control section 20. The above-mentioned processor may realize all or part of the similar image group determination section, the image identification section, the inference result difference calculation section, the inference result change calculation section, the learning image selection section, and the imaging information acquisition section. Similarly, the processor in the control section 20 may realize all or part of the functions of the imaging section 11, the image processing section 12, and the inference section 13. Further, the control unit 20 may operate in cooperation with the control unit 35 in the learning device 30, so that the endoscope device 10 and the learning device 30 operate as one.
 制御部20は、画像判定部15によって類似画像群の一部を構成する画像であると判定された画像に対して、推論部13が推論した推論結果(信頼度等でもよい)について、差異判定部14が算出した差異の結果を用いて、再学習に利用する画像を選択する(例えば、図2のS7、図5のS8参照)。例えば、信頼度が高い状況から低い状況に変化した場合に、信頼度が高い画像は一般に、推論モデル作成の際に、その画像と同様の画像を用いて作成されている可能性が高いのに対して、信頼度が低い画像は推論モデル作成の際に利用されていない可能性が高い。しかし、類似画像群の中に含まれるような画像は、たとえ非類似の画像であっても、同一の対象物の画像である可能性がある。このような画像は推論モデルとしては苦手画像であり、このような苦手画像に対しても、的確な推論ができることが望ましい。そこで、本実施形態においては、このような推論結果(信頼度等でもよい)が変化した場合には、そのときの画像を選択しておき、学習の候補画像としている。教師データを作成する際に、医師等の専門家が候補画像を教師データとして採用できるか否かを判定すればよい。つまり、未選択としてしまうと、教師データとして採用されるために手間がかかり、改良に時間がかかってしまうが、選択しておけば、教師データとして候補化され、直ぐに推論モデルの改良に活用できる場合があるからである。 The control unit 20 performs a difference determination on the inference result (reliability etc.) inferred by the inference unit 13 for the image determined by the image determination unit 15 to be an image forming part of a similar image group. Using the result of the difference calculated by the unit 14, an image to be used for relearning is selected (for example, see S7 in FIG. 2 and S8 in FIG. 5). For example, when a situation changes from a high-confidence situation to a low-confidence situation, an image with a high confidence level is generally likely to have been created using a similar image when creating an inference model. On the other hand, images with low reliability are likely not to be used when creating an inference model. However, images included in a group of similar images may be images of the same object, even if they are dissimilar. Such images are not good for inference models, and it is desirable to be able to make accurate inferences even for such difficult images. Therefore, in this embodiment, when such an inference result (reliability or the like) changes, the image at that time is selected as a candidate image for learning. When creating training data, an expert such as a doctor may determine whether a candidate image can be adopted as training data. In other words, if you leave it unselected, it will be adopted as training data, which will take time and effort to improve, but if you select it, it will be candidateized as training data and you can immediately use it to improve the inference model. This is because there are cases.
 制御部20は、推論部結果差異算出部によって算出された推論結果の差異に基づいて、再学習に利用する画像を選択する学習画像選択部として機能する(例えば、図2のS7、図5のS8、図7の学習画像選択部46参照)。推論結果の差異に基づいて画像を選択するのは、例えば、信頼性が所定値以上、変わった場合や、特定対象物の大きさ等が急変した場合や、対象物の判定した画像内の位置等が所定値以上急変した場合等、推論結果に差異が生じた場合には、再学習用に変化した画像を選択する。ここで選択というのは、その画像コマそのものを記録するための選択でも良いが、これに限らず、画像コマをすぐに検索できるような情報を選択するといった手法でもよい。 The control unit 20 functions as a learning image selection unit that selects an image to be used for relearning based on the difference in the inference results calculated by the inference result difference calculation unit (for example, S7 in FIG. 2, S7 in FIG. (S8, see learning image selection unit 46 in FIG. 7). Images are selected based on differences in inference results, for example, when the reliability changes by more than a predetermined value, when the size of a specific object changes suddenly, or when the position of the object in the image is determined. If there is a difference in the inference results, such as when the value suddenly changes by more than a predetermined value, the changed image is selected for relearning. The selection here may be a selection for recording the image frame itself, but is not limited to this, and may also be a method of selecting information that allows an immediate search for the image frame.
 上述の学習画像選択部は、特定の条件を満たさない、極端に条件が悪い画像は省くようにしてもよい。この特定の条件は、例えば、画像補正等を行っても、視認性がなさすぎるような場合がある。前述したように、類似画像群に含まれる画像の中から推論結果算出部による推論結果の差異に基づいて、再学習に利用する画像を選択している。このため、一見、同一部位を観察している画像とは非類似の画像も学習用候補画像として選択される場合がある。しかし、注水操作等によって画面全体が真っ白になってしまった画像や、光源の位置関係で画面全体が真っ黒になってしまった画像等、極端に条件の悪い画像は、本来、目的としている特定対象物が含まれている可能性がかなり低いといえる。 The above-mentioned learning image selection unit may omit images that do not satisfy specific conditions or have extremely poor conditions. Under these specific conditions, for example, even if image correction or the like is performed, there may be cases where visibility is too low. As described above, images to be used for relearning are selected from among the images included in the similar image group based on the difference in the inference results by the inference result calculation unit. Therefore, at first glance, images that are dissimilar to images observing the same region may also be selected as candidate images for learning. However, images with extremely poor conditions, such as images where the entire screen has become completely white due to water injection, or images where the entire screen has become completely black due to the position of the light source, may be used for specific purposes. It can be said that there is a very low possibility that something is included.
 また、対象物を撮像している時間が長い程、条件が悪い画像になり易くなる。すなわち、(1)露光時間が長い場合、(2)体腔内で十分な光量を得るために、光源の発光時間が長い場合、(3)撮像素子や回路におけるゲインが高い場合等においては、取得画像が極端に悪くなる可能性がある。これは、露光時間等が長くなると、その間、特定対象物と撮像部11の相対的位置関係が変化し、このため画像がブレる等、悪い画像となってしまうからである。 Furthermore, the longer the time it takes to image the object, the more likely it is that the image will have poor conditions. In other words, (1) when the exposure time is long, (2) when the light emission time of the light source is long to obtain a sufficient amount of light inside the body cavity, (3) when the gain in the image sensor or circuit is high, etc. The image may become extremely poor. This is because when the exposure time etc. becomes longer, the relative positional relationship between the specific object and the imaging unit 11 changes during that time, resulting in poor images such as blurring of the image.
 そこで、学習画像選択部は、このような特定対象物の画像が含まれることがあり得ないような、極端に条件の悪い画像は、選択対象から省くようにしている。これは、撮像部11によって取得した画像データのコントラストや像ブレ量等に基づいて判定しもよい。また、画質を直接判定しなくても、撮影条件等に基づいて、極端に条件が悪い画像と判定するようにしてもよい。また、HDR(High Dynamic Range)撮影や、深度合成撮影等、複数コマの画像を合成処理するような場合に、極端に条件の悪い画像となり易いので、この場合にも、撮影条件に応じて、極端に条件が悪い画像と判定してもよい。 Therefore, the learning image selection unit excludes from the selection targets images with extremely poor conditions such that it is impossible to include an image of such a specific object. This may be determined based on the contrast of the image data acquired by the imaging unit 11, the amount of image blur, etc. Furthermore, without directly determining the image quality, it may be determined that the image is in extremely poor condition based on the photographing conditions or the like. In addition, when combining multiple frames of images, such as HDR (High Dynamic Range) shooting or depth compositing shooting, it is easy to obtain an image with extremely poor conditions, so in this case as well, depending on the shooting conditions, It may be determined that the image is under extremely poor conditions.
 また、制御部20は、推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部として機能する(例えば、図2のS7、図5のS8、図7の学習画像選択部46参照)。この学習画像選択部は、変化量が規定値を超えた場合に、学習に利用する教師データ候補画像を選択する。すなわち、変化量が所定値よりも大きくなるような場合には、ほぼ同じ同定画像でありながら、それまでとは異なる画像であり、それまでに学習に使用していない可能があるので、学習候補画像として選択する。 The control unit 20 also functions as a learning image selection unit that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation unit (for example, S7 in FIG. 2, (See S8 in FIG. 7, the learning image selection unit 46 in FIG. 7). This learning image selection unit selects a teacher data candidate image to be used for learning when the amount of change exceeds a specified value. In other words, if the amount of change is larger than a predetermined value, it may be a different image even though it is almost the same identified image, and it may not have been used for learning before, so it is considered a learning candidate. Select as image.
 学習装置30は、推論モデルの作成や改良を司る部分を想定しており、病院(検査施設)外にあることを想定している。学習装置30は、通信によってデータの授受が発生するため通信部31を有しており、また高度な専門性の高い学習を効率的に行うために、画像処理部32、記録部33、学習部34、および制御部35を有している。例えば、記録部33が外部にあると、通信等が学習時の制約になりうるが、学習装置30が別の場所にある記録部と通信によって連携してもよい。学習装置30は、サーバ等に配置されていてもよく、その場合には、内視鏡装置10とはインターネット等の通信網を通じて接続される。また、学習装置30は、多数の装置に接続され、これらの装置から多数の教師用データを受信し、この教師用データを用いて、学習を行い、推論モデルを生成する。なお、画像データ等の教師データ候補を受信し、学習装置30がアノテーションを行って教師データを作成し、この教師データを用いて、推論モデルを生成してもよい。 The learning device 30 is assumed to be a part that controls the creation and improvement of an inference model, and is assumed to be located outside a hospital (examination facility). The learning device 30 includes a communication unit 31 for exchanging data through communication, and also includes an image processing unit 32, a recording unit 33, and a learning unit in order to efficiently perform highly specialized learning. 34, and a control section 35. For example, if the recording unit 33 is located outside, communication may become a constraint during learning, but the learning device 30 may cooperate with a recording unit located at another location through communication. The learning device 30 may be located on a server or the like, and in that case, it is connected to the endoscope device 10 through a communication network such as the Internet. Further, the learning device 30 is connected to a large number of devices, receives a large amount of teacher data from these devices, performs learning using this teacher data, and generates an inference model. Note that the learning device 30 may receive teacher data candidates such as image data, perform annotation to create teacher data, and use this teacher data to generate an inference model.
 通信部31は、通信回路(送受信回路)を有し、内視鏡装置10内の通信部19と通信を行う。前述したように、通信部31は、類似画像群に含まれる各画像の推論結果に差異がある画像データを受信する。また、学習部34によって生成された推論モデルを、通信部31が通信部19に送信する。さらに、内視鏡装置10から推論結果の差異に基づいて選択された再学習用の画像データを用いて、再学習して生成された推論モデルを、通信部31が通信部19に送信する。なお、上述したように、記録部33が外部に配置された場合には、通信等が学習時の制約になりうるが、通信部31を通じて、別の場所にある記録部と連携するようにしてもよい。 The communication unit 31 has a communication circuit (transmission/reception circuit), and communicates with the communication unit 19 in the endoscope device 10. As described above, the communication unit 31 receives image data in which the inference results of the images included in the similar image group are different. Further, the communication unit 31 transmits the inference model generated by the learning unit 34 to the communication unit 19. Further, the communication unit 31 transmits to the communication unit 19 an inference model generated by relearning using the image data for relearning selected from the endoscope apparatus 10 based on the difference in the inference results. As mentioned above, if the recording section 33 is located outside, communication etc. may become a constraint during learning, but it is possible to cooperate with a recording section located at another location through the communication section 31. Good too.
 画像処理部32は、画像処理回路を有し、内視鏡装置10からの画像データを入力し、制御部35からの指示に応じて、現像処理等、種々の画像処理を施す。画像処理としては、画像処理部12における処理と同等の処理であってもよく、また処理内容を適宜変更してもよい。ここで、処理が施された画像データは、記録部33に記録してもよく、また表示装置等において表示してもよい。さらに、学習する際に画像処理を行ってもよい。例えば、画像サイズを変更し、処理し易くするために強調処理を行う等、記録部33に記録されている学習用の画像に対して、画像処理を施すことによって、学習し易くしてもよい。また、画像処理部32は故意に判定できにくい画像を生成し、学習部34が学習する際に、この画像を用いてテストするようにしてもよい。 The image processing unit 32 has an image processing circuit, receives image data from the endoscope device 10, and performs various image processing such as development processing in accordance with instructions from the control unit 35. The image processing may be the same as the processing in the image processing section 12, or the processing contents may be changed as appropriate. Here, the processed image data may be recorded in the recording unit 33, or may be displayed on a display device or the like. Furthermore, image processing may be performed during learning. For example, the learning images recorded in the recording unit 33 may be subjected to image processing to make them easier to learn, such as changing the image size and performing emphasis processing to make them easier to process. . Alternatively, the image processing section 32 may intentionally generate an image that is difficult to judge, and the learning section 34 may use this image for testing when learning.
 記録部33は、電気的に書き換え可能な不揮発性メモリおよび/または揮発性メモリを有する。この記録部33は、制御部35内のCPU等を動作させるためのプログラムを記憶する。また、記録部33は、学習装置30内の種々の特性値や調整値等も記憶する。さらに、記録部33は、内視鏡装置10から送信されてきた学習用の教師データ(推論結果の差異に基づいて選択された再学習用の画像データを含む)を記録する。記録部33において記録されるデータ構造の詳細については、図8および図9を用いて説明する。なお、記録部33には、図8および図9に示すような情報の全部を記録せずに、内視鏡装置10内の記録部18と分担してもよい。 The recording unit 33 includes electrically rewritable nonvolatile memory and/or volatile memory. This recording unit 33 stores a program for operating the CPU and the like in the control unit 35. The recording unit 33 also stores various characteristic values, adjustment values, etc. within the learning device 30. Furthermore, the recording unit 33 records training training data (including image data for re-learning selected based on differences in inference results) transmitted from the endoscope apparatus 10. Details of the data structure recorded in the recording unit 33 will be explained using FIGS. 8 and 9. Note that the recording unit 33 may not record all of the information as shown in FIGS. 8 and 9, and may be shared with the recording unit 18 in the endoscope apparatus 10.
 学習部34は、推論エンジンを備えており、記録部33に記録されている学習用の教師データを用いて深層学習等の機械学習を行い、推論モデルを生成する。内視鏡装置10から、推論結果の差異に基づいて選択された再学習用の画像データが所定数、蓄積された場合には、再学習を行う。再学習を行うことによって、苦手画像や見逃し画像等に対しても、信頼性の高い推論を行うことが可能な推論モデルを生成することができる。 The learning unit 34 includes an inference engine, and performs machine learning such as deep learning using the learning teacher data recorded in the recording unit 33 to generate an inference model. When a predetermined number of image data for relearning selected based on differences in inference results are accumulated from the endoscope apparatus 10, relearning is performed. By performing relearning, it is possible to generate an inference model that can perform highly reliable inferences even for weak images or missed images.
 ここで、深層学習について、説明する。「深層学習(ディープ・ラーニング)」は、ニューラル・ネットワークを用いた「機械学習」の過程を多層構造化したものである。情報を前から後ろに送って判定を行う「順伝搬型ニューラル・ネットワーク」が代表的なものである。なお、前述の推論部13も、順伝搬型ニューラル・ネットワークを備えている。順伝搬型ニューラル・ネットワーク(例えば、図6の推論エンジン13Aも同様の構造を有する)は、最も単純なものでは、N1個のニューロンで構成される入力層、パラメータで与えられるN2個のニューロンで構成される中間層、判別するクラスの数に対応するN3個のニューロンで構成される出力層の3層があればよい。入力層と中間層、中間層と出力層の各ニューロンはそれぞれが結合加重で結ばれ、中間層と出力層はバイアス値が加えられることによって、論理ゲートを容易に形成できる。 Here, deep learning will be explained. "Deep learning" is a multilayered version of the "machine learning" process that uses neural networks. A typical example is a forward propagation neural network, which sends information from front to back to make decisions. Note that the above-mentioned inference section 13 also includes a forward propagation neural network. The simplest version of a forward propagation neural network (for example, the inference engine 13A in FIG. 6 has a similar structure) has an input layer consisting of N1 neurons and N2 neurons given by parameters. It is sufficient to have three layers: an intermediate layer composed of an output layer composed of N3 neurons corresponding to the number of classes to be discriminated. Each neuron in the input layer and the intermediate layer, and the intermediate layer and the output layer, are connected by connection weights, and a bias value is added to the intermediate layer and the output layer, thereby easily forming a logic gate.
 ニューラル・ネットワークは、簡単な判別を行うのであれば3層でもよいが、中間層を多数にすることによって、機械学習の過程において複数の特徴量の組み合わせ方を学習することも可能となる。近年では、9層~152層のものが、学習にかかる時間や判定精度、消費エネルギーの観点から実用的になっている。また、画像の特徴量を圧縮する、「畳み込み」と呼ばれる処理を行い、最小限の処理で動作し、パターン認識に強い「畳み込み型ニューラル・ネットワーク」を利用してもよい。また、より複雑な情報を扱え、順番や順序によって意味合いが変わる情報分析に対応して、情報を双方向に流れる「再帰型ニューラル・ネットワーク」(全結合リカレントニューラルネット)を利用してもよい。 A neural network may have three layers if it performs simple discrimination, but by having a large number of intermediate layers, it is also possible to learn how to combine multiple features in the process of machine learning. In recent years, systems with 9 to 152 layers have become practical in terms of learning time, judgment accuracy, and energy consumption. Alternatively, a "convolutional neural network" that performs a process called "convolution" that compresses image features, operates with minimal processing, and is strong in pattern recognition may be used. In addition, a "recurrent neural network" (fully connected recurrent neural network) that can handle more complex information and that allows information to flow in both directions may be used to support information analysis whose meaning changes depending on order and order.
 これらの技術を実現するために、CPUやFPGA(Field Programmable Gate Array)等の従来からある汎用的な演算処理回路を使用してもよい。しかし、これに限らず、ニューラル・ネットワークの処理の多くが行列の掛け算であることから、行列計算に特化したGPU(Graphic Processing Unit)やTensor Processing Unit(TPU)と呼ばれるプロセッサを利用してもよい。近年ではこのような人工知能(AI)専用ハードの「ニューラル・ネットワーク・プロセッシング・ユニット(NPU)」がCPU等その他の回路とともに集積して組み込み可能に設計され、処理回路の一部になっている場合もある。 In order to realize these techniques, conventional general-purpose arithmetic processing circuits such as a CPU or FPGA (Field Programmable Gate Array) may be used. However, this is not limited to this, and since much of the processing of neural networks is matrix multiplication, it is not possible to use processors called GPUs (Graphic Processing Units) or Tensor Processing Units (TPUs) that specialize in matrix calculations. good. In recent years, neural network processing units (NPUs), which are specialized hardware for artificial intelligence (AI), have been designed to be integrated with other circuits such as CPUs and become part of processing circuits. In some cases.
 その他、機械学習の方法としては、例えば、サポートベクトルマシン、サポートベクトル回帰という手法もある。ここでの学習は、識別器の重み、フィルター係数、オフセットを算出するものあり、これ以外にも、ロジスティック回帰処理を利用する手法もある。機械に何かを判定させる場合、人間が機械に判定の仕方を教える必要がある。本実施形態においては、画像の判定を、機械学習によって導出する手法を採用したが、そのほか、人間が経験則・ヒューリスティクスによって獲得したルールを適応するルールベースの手法を用いてもよい。 Other machine learning methods include, for example, support vector machine and support vector regression. The learning here involves calculating the weights, filter coefficients, and offsets of the classifier, and in addition to this, there is also a method that uses logistic regression processing. When a machine makes a decision, humans need to teach the machine how to make a decision. In this embodiment, a method of deriving the image judgment by machine learning is adopted, but a rule-based method that applies rules acquired by humans using empirical rules and heuristics may also be used.
 制御部35は、CPU(Central Processing Unit)等とその周辺回路およびメモリ等を有するプロセッサである。このプロセッサは1つであってもよく、また複数のチップから構成されてもよい。CPUはメモリに記憶されたプログラムに従って、学習装置30内の各部を制御することによって学習装置30の全体を実行する。学習装置30内の各部は、CPUによってソフトウエア的に制御することによって実現される。また、制御部35は、内視鏡装置10内の制御部20と協調して動作し、内視鏡装置10と学習装置30が一体となって動作するようにしてもよい。 The control unit 35 is a processor that includes a CPU (Central Processing Unit), its peripheral circuits, memory, and the like. This processor may be one, or may be composed of multiple chips. The CPU executes the entire learning device 30 by controlling each part within the learning device 30 according to a program stored in the memory. Each part within the learning device 30 is realized by software control by the CPU. Further, the control unit 35 may operate in cooperation with the control unit 20 in the endoscope device 10, so that the endoscope device 10 and the learning device 30 operate as one.
 次に、図8を用いて、記録部50に、記録されるデータの構造について説明する。記録部50は、内視鏡装置10内の記録部18および/または学習装置30内の記録部33に適用することができ、記録部50は電気的に書き換え可能な不揮発性メモリを有する。記録部18または記録部33に、図8に示すようなデータの全てを記録してもよいが、両記録部において、データを分担して記録してもよい。すなわち、内視鏡装置10と学習装置30が必要とするデータを、それぞれ記録するようにすればよい。記録部50は、検査画像記録部51と活用情報記録部52の2つの記録領域を有する。 Next, the structure of data recorded in the recording unit 50 will be explained using FIG. 8. The recording unit 50 can be applied to the recording unit 18 in the endoscope device 10 and/or the recording unit 33 in the learning device 30, and the recording unit 50 has an electrically rewritable nonvolatile memory. Although all of the data as shown in FIG. 8 may be recorded in the recording unit 18 or the recording unit 33, the data may be recorded in a divided manner in both recording units. That is, data required by the endoscope device 10 and the learning device 30 may be recorded respectively. The recording section 50 has two recording areas: an inspection image recording section 51 and a utilization information recording section 52.
 検査画像記録部51は、カルテを記録するのと同様に検査動画を記録するエリアであり、検査動画A51a、検査動画B51bが記録される。なお、図8には、検査動画としては、動画A、動画Bの2つしか示していないが、勿論3以上の検査動画を記録することが可能である。患者の診断のエビデンスとして、記録が重要である場合があり、また動画の1コマに相当する静止画をレポートとして記録する場合もあるが、図8においては静止画の表示を省略している。検査画像記録部51に、画像に限らず、検査結果などを記録してもよい。 The test image recording section 51 is an area for recording test videos in the same way as recording medical records, and test videos A51a and B51b are recorded therein. Although FIG. 8 shows only two test videos, video A and video B, it is of course possible to record three or more test videos. Recording is sometimes important as evidence for patient diagnosis, and there are also cases where a still image corresponding to one frame of a video is recorded as a report, but the display of the still image is omitted in FIG. The test image recording unit 51 may record not only images but also test results and the like.
 また、記録部50には、活用情報記録部52を設けている。エビデンスレポート用の検査画像記録部51に記録された動画を、例えば教師データとして用いるには、そのための処理手続(例えば、インフォームドコンセント)が必要であり、動画によっては医師や患者が、活用を同意していないケースがある。また、学習を行うにあたって、検査時の種々の関連情報を考慮して行うことが望ましい。そこで、そのような活用条件等を記録する活用情報記録部52を設け、どの動画がいかなる用途で使えるかの情報を整理して記録しておくとよい。この場合、動画活用情報フォルダと教師データフォルダを分けて記録するようにしてもよい。なお、活用情報記録部は、必ずしも同じ記録装置にある必要はない。 Additionally, the recording unit 50 is provided with a utilization information recording unit 52. In order to use the videos recorded in the examination image recording unit 51 for evidence reports, for example, as teaching data, processing procedures (for example, informed consent) are required, and depending on the video, doctors and patients may not want to use it. There are cases where we do not agree. Furthermore, when learning, it is desirable to consider various relevant information at the time of the examination. Therefore, it is preferable to provide a usage information recording section 52 that records such usage conditions, etc., and to organize and record information on which videos can be used for what purposes. In this case, the video utilization information folder and the teacher data folder may be recorded separately. Note that the usage information recording units do not necessarily need to be located in the same recording device.
 活用情報記録部52内には、図8に示す例では、動画活用情報A52と、動画活用情報B53が設けられている。なお、図8には、動画活用情報としては、活用情報A、活用情報Bの2つしか示していないが、検査動画の数に応じて、勿論3以上の活用情報を記録することが可能である。動画活用情報A53と動画活用情報B56に記録される情報は、同様であるので、ここでは、主に、動画活用情報A53に記録されている情報ついて説明する。 In the example shown in FIG. 8, video usage information A52 and video usage information B53 are provided in the usage information recording unit 52. Although FIG. 8 shows only two types of video usage information, usage information A and usage information B, it is of course possible to record three or more usage information depending on the number of inspection videos. be. Since the information recorded in the moving image utilization information A53 and the moving image utilization information B56 are similar, the information recorded in the moving image utilization information A53 will mainly be explained here.
 オリジナル動画情報53aは、この情報がどの動画に対応しているかを示す情報である。例えば、動画活用情報A53は検査動画A51aに対応しており、動画活用情報B56は検査動画B51bに対応している。取得情報53bは、検査日時や検査機関、検査を担当した医師の氏名等、使用した装置の機種名等の情報である。推論モデルタイプ53cは、検査に使用した内視鏡装置10が用いた推論モデルのタイプを示す。使用した推論モデルが分からないと、どの推論モデルを用いて再学習等を行うかが分からなくなる。 The original video information 53a is information indicating which video this information corresponds to. For example, the video usage information A53 corresponds to the test video A51a, and the video usage information B56 corresponds to the test video B51b. The acquired information 53b includes information such as the date and time of the test, the testing institution, the name of the doctor who conducted the test, and the model name of the device used. The inference model type 53c indicates the type of inference model used by the endoscope apparatus 10 used for the examination. If you do not know the inference model used, you will not know which inference model to use for relearning.
 活用条件情報53dは、この動画が如何なる条件で活用可能かを示す。例えば、インフォームドコンセント等によって活用範囲が決められている場合には、その旨が記録される。プロフィール情報53eは、検査を受けた被検者(患者)の性別、年齢、ID、診察結果等の情報である。推論結果情報53fは、画像のコマ毎に推論した結果に関する情報である。 The usage condition information 53d indicates under what conditions this video can be used. For example, if the scope of use is determined by informed consent, etc., that fact is recorded. The profile information 53e is information such as the sex, age, ID, and medical examination results of the subject (patient) who underwent the test. The inference result information 53f is information regarding the result of inference for each frame of the image.
 また、活用情報記録部52内には、第1の教師データフォルダ54、第2の教師データ55が設けられている。第1および第2の教師データフォルダ54、55は、検査動画A51aを用いた推論結果の差異等に基づいて、制御部20が教師データ候補として選択した画像、すなわち第1の教師データ候補54a、第2の教師データ候補55aが記録される。この候補画像は、候補画像データそのものを記録しておいてもよく、また、検査画像記録部51に記録されている画像の中から、候補画像データを指定できるような情報(例えば、何コマ目の画像、または何時何分に撮影された画像等)を記録しておいてもよい。また、情報関連付け部17が関連付けた関連情報(第1の関連情報54b、第2の関連情報55b)が記録される。第1の関連情報54b、第2の関連情報55bとして、例えば、状況情報、類似画像情報、信頼値の差異に関する情報等が記録される。 Additionally, a first teacher data folder 54 and a second teacher data 55 are provided in the usage information recording section 52. The first and second teacher data folders 54 and 55 contain images selected as teacher data candidates by the control unit 20 based on differences in inference results using the test video A51a, that is, the first teacher data candidates 54a, the Two teacher data candidates 55a are recorded. This candidate image may record the candidate image data itself, or may contain information (for example, the number of frames) that can specify candidate image data from among the images recorded in the inspection image recording section images, or images taken at what time and minute, etc.) may be recorded. Further, related information (first related information 54b, second related information 55b) associated by the information association unit 17 is recorded. As the first related information 54b and the second related information 55b, for example, situation information, similar image information, information regarding differences in reliability values, etc. are recorded.
 動画活用情報記録B56内にも、オリジナル動画情報56a、取得時情報56b、推論モデルタイプ56c、活用条件情報56d、プロフィール情報56e、推論結果情報56f、第1の教師データフォルダ57、第1の教師データ候補57a、第1の関連情報57b、第2の教師データフォルダ58、第2の教師データ候補58a、第2の関連情報58bが記録されるが、これらの情報は、動画活用情報記録部A55内と同様であることから、詳細な説明は省略する。 The video usage information record B56 also includes original video information 56a, acquisition time information 56b, inference model type 56c, usage condition information 56d, profile information 56e, inference result information 56f, first teacher data folder 57, and first teacher data. Candidates 57a, first related information 57b, second teacher data folder 58, second teacher data candidates 58a, and second related information 58b are recorded, but these pieces of information are stored in the video utilization information recording section A55. Since they are similar, detailed explanation will be omitted.
 次に、図9を用いて、記録部50に、記録されるデータの構造の変形例について説明する。すなわち、図9は、記録部50に整理されて記録されるデータの記録の仕方の変形例であり、動画記録部A61、動画記録部B65に、動画活用情報A、動画活用情報Bを併せて記録するようにしたものである。データが一括管理され、情報を活用する時に動画の記録場所を探す必要がないが、ファイルサイズが大きくなりすぎる場合もある。 Next, a modification of the structure of data recorded in the recording unit 50 will be described using FIG. 9. That is, FIG. 9 is a modification of the method of recording data that is organized and recorded in the recording section 50, in which the moving image recording section A61 and the moving image recording section B65 are combined with the moving image utilization information A and the moving image utilization information B. It was designed to be recorded. Data is managed centrally and there is no need to search for video recording locations when using the information, but the file size may become too large.
 すなわち、本変形例においては、動画記録部A61内に検査画像61aが、動画記録部B65内に検査画像65aが、記録されている。これらの検査動画は、図8において、まとめて検査画像記録部51に記録されていたものと同じである。また、本変形例においては、動画記録部A61内に、取得時情報62a、活用条件情報62b、推論モデルタイプ62c、プロフィール情報62d、推論結果情報62e、第1の教師データフォルダ63(第1の教師データ候補63a、第1の関連情報63b含む)、第2の教師データフォルダ64(第2の教師データ候補64a、第2の関連情報64b含む)が記録されている。同様に、動画記録部A65内に、取得時情報66a、活用条件情報66b、推論モデルタイプ66c、プロフィール情報66d、推論結果情報66e、第1の教師データフォルダ67(第1の教師データ候補67a、第1の関連情報67b含む)、第2の教師データフォルダ68(第2の教師データ候補68a、第2の関連情報68b含む)が記録されている。なお、図9の記録部には、動画記録部A61と動画記録部B65の2つの動画記録部が描かれているだけであるが、検査動画の数に応じて、適宜、記録領域を増やせばよい。 That is, in this modification, an inspection image 61a is recorded in the moving image recording section A61, and an inspection image 65a is recorded in the moving image recording section B65. These inspection moving images are the same as those recorded collectively in the inspection image recording section 51 in FIG. In this modification, the video recording unit A61 also includes acquisition time information 62a, usage condition information 62b, inference model type 62c, profile information 62d, inference result information 62e, and a first teacher data folder 63 (first teacher data folder 63). A data candidate 63a and first related information 63b) and a second teacher data folder 64 (including a second teacher data candidate 64a and second related information 64b) are recorded. Similarly, in the video recording unit A65, acquisition time information 66a, usage condition information 66b, inference model type 66c, profile information 66d, inference result information 66e, first teacher data folder 67 (first teacher data candidate 67a, 1 related information 67b) and a second teacher data folder 68 (including a second teacher data candidate 68a and second related information 68b) are recorded. Although only two video recording sections, the video recording section A61 and the video recording section B65, are depicted in the recording section in FIG. 9, the recording area can be increased as appropriate depending on the number of inspection videos. good.
 図8や図9に示すように、どの記録部をどの記録装置に置くかは様々な応用が可能である。また、これらのデータをどのフォルダに置いて、それをどの記録部が記録するかは、その時の状況や環境によって適宜応用すればよい。また、教師データフォルダを教師データの特性ごとに分けるようにしてもよい。 As shown in FIGS. 8 and 9, various applications are possible regarding which recording unit is placed in which recording device. Also, which folder should these data be placed in and which recording unit should record them may be determined as appropriate depending on the situation and environment at the time. Further, the teacher data folders may be divided according to the characteristics of the teacher data.
 次に、本発明の一実施形態の動作を説明するにあたって、まず類似画像群の中から再学習に利用する画像の選択方法について説明する。図6(a)(b)は、内視鏡装置10の先端部を患者の体腔内に挿入した際の例であり、このとき取得した画像の例を示す。黒塗りの画像(P1、P2、P11、P12)は、挿入当初の画像であり、また黒塗りの画像(P8、P9、P18、P19)は抜去時の画像であり、これらの遷移画像P1、P2、P8、P9には、患部等の特定対象物は映っていない。また、斜線の網掛け画像(P5、P6、P15、P16)は、医師等が患部等の特定対象物を見出し、じっくり観察している際の観察画像である。遷移画像P1、P2、P8、P9と、観察画像P5、P6の間では、中間画像P3、P4、P7が取得される。同様に、遷移画像P11、P12、P18、P19と、観察画像P15、P16の間では、中間画像P13、P14、P17が取得される。 Next, in explaining the operation of an embodiment of the present invention, first a method for selecting an image to be used for relearning from a group of similar images will be explained. FIGS. 6A and 6B show an example when the distal end of the endoscope device 10 is inserted into a patient's body cavity, and show examples of images acquired at this time. The black-painted images (P1, P2, P11, P12) are the images at the time of insertion, and the black-painted images (P8, P9, P18, P19) are the images at the time of removal, and these transition images P1, A specific object such as an affected area is not shown in P2, P8, and P9. Moreover, the diagonally shaded images (P5, P6, P15, P16) are observation images when a doctor or the like finds a specific object such as an affected area and carefully observes it. Intermediate images P3, P4, and P7 are acquired between the transition images P1, P2, P8, and P9 and the observation images P5 and P6. Similarly, intermediate images P13, P14, and P17 are acquired between transition images P11, P12, P18, and P19 and observation images P15 and P16.
 また、破線枠Isの範囲内にある画像P5~P7、P15~P17は同じような対象物が写っている画像で、互いに類似した画像であり、類似画像群に属する。類似画像であるか否かは公知の手法を用いればよい。例えば、画像判定部15が画像の中から特徴量を算出し、2つの画像の特徴量が所定範囲内であれば、類似画像であると判定してもよい。また、画像の構図や色彩分布等に基づいて、画像判定部15が類似画像であるか否かを判定してもよい。 Further, images P5 to P7 and P15 to P17 within the range of the broken line frame Is are images that depict similar objects, are similar to each other, and belong to a similar image group. A known method may be used to determine whether the images are similar. For example, the image determination unit 15 may calculate feature amounts from the images, and if the feature amounts of two images are within a predetermined range, it may be determined that the images are similar. Furthermore, the image determination unit 15 may determine whether the images are similar based on the composition, color distribution, etc. of the images.
 また、撮像部11の撮像素子によって取得された画像データは、画像処理された後、推論部13の推論エンジン13Aの入力層に入力される。推論エンジン13Aは、ニューラル・ネットワークによって、推論を行い、推論結果が出力層から出力される。推論エンジン13Aは、例えば、内視鏡画像内に患部等の特定対象物画像があるか否かを推論する。この推論結果に基づいて、画像P5、P6、P16が、特定対象物画像であることを示すことができる。図6(a)においては、画像P5、P6を太枠で囲み、図6(b)においては画像P16を太枠で囲み、特定対象物画像であることを示しているが、表示方法は適宜最適な方法を選択すればよい。また、推論エンジン13Aは、推論結果の信頼性を算出する。図6(a)において、信頼性の数値が所定値よりも高い場合に丸(〇)で示し、信頼性の数値が所定値低い場合にバツ(×)で示している。 Further, the image data acquired by the image sensor of the imaging unit 11 is input to the input layer of the inference engine 13A of the inference unit 13 after being subjected to image processing. The inference engine 13A performs inference using a neural network, and the inference result is output from the output layer. The inference engine 13A infers, for example, whether or not there is an image of a specific object such as an affected area in the endoscopic image. Based on this inference result, it can be shown that images P5, P6, and P16 are specific target object images. In FIG. 6(a), images P5 and P6 are surrounded by a thick frame, and in FIG. 6(b), image P16 is surrounded by a thick frame to indicate that they are specific object images, but the display method can be changed as appropriate. Just choose the most suitable method. Further, the inference engine 13A calculates the reliability of the inference result. In FIG. 6A, when the reliability value is higher than a predetermined value, it is indicated by a circle (◯), and when the reliability value is lower than the predetermined value, it is indicated by a cross (×).
 また、状況判定部16は、前述したように、内視鏡装置10の使用状況を判定している。図6に示す例では、状況判定部16は、内視鏡装置10の先端部が体腔内の壁面に対して、真正面から向き合っているか、すなわち正対している否か、斜めであるか等の角度情報を取得している。図6(a)に取得した角度情報を示す。なお、図6では、角度情報のみを示しているが、この情報に限らず、他の情報も取得し、類似画像群の中から再学習に利用する画像を選択するようにしてもよい。 Further, as described above, the situation determining unit 16 determines the usage situation of the endoscope device 10. In the example shown in FIG. 6, the situation determining unit 16 determines whether the distal end of the endoscope device 10 is facing the wall surface in the body cavity directly, that is, whether it is facing the wall surface in the body cavity, whether it is facing diagonally, etc. Obtaining angle information. FIG. 6(a) shows the acquired angle information. Although FIG. 6 shows only the angle information, the image information is not limited to this information, and other information may also be obtained, and an image to be used for relearning may be selected from a group of similar images.
 図6(a)に示す例では、画像P5、P6において、推論エンジン13Aが患部等の特定対象物画像であると推論し、その信頼性も高く、特定対象物に対して正対している状態である。また、図6(b)に示す例では、画像P16において、推論エンジン13Aが患部等の特定対象物画像であると推論し、その信頼性も高く、特定対象物に対して正対している状態である。 In the example shown in FIG. 6(a), the inference engine 13A infers that images P5 and P6 are images of a specific target such as an affected area, the reliability is high, and the images P5 and P6 are in a state where they are directly facing the specific target. It is. In addition, in the example shown in FIG. 6(b), the inference engine 13A infers that the image P16 is an image of a specific target such as an affected area, the reliability is high, and the image P16 is directly facing the specific target. It is.
 特定対象物を含む画像P5、P6、P16は、推論結果の信頼性が高いことから、患部等の特定対象物が含まれている可能性は高いと言える。しかし、類似画像群Isの中で、信頼性が低いと判定された画像(例えば、画像P7、P15、P17)の中にも、患部等の特定対象物が含まれている可能性がある。図6(b)に示す例では、画像P15には、患部等の特定対象物が含まれていながら、推論の信頼性は低く、特定対象物が含まれていないと表示される。図6(a)(b)に示す例では、内視鏡装置10が患部等の特定対象物に正対している場合には、推論の信頼性が高いが、一方、内視鏡装置10が特定対象物に対して斜めの場合には、推論の信頼性が低くなっている。すなわち、推論エンジン13Aに設定された推論モデルは、内視鏡先端部が体腔内の壁面に対して斜めの場合には、信頼性の高い推論ができなそうである。このような場合には、推論結果に差異が生じた画像(苦手画像)を用いて、再学習し、より信頼性の高い推論モデルを生成するとよい。 Since the reliability of the inference results for images P5, P6, and P16 that include specific objects is high, it can be said that there is a high possibility that the images P5, P6, and P16 include specific objects such as affected areas. However, among the similar image group Is, images determined to have low reliability (for example, images P7, P15, and P17) may also include specific objects such as affected areas. In the example shown in FIG. 6(b), although the image P15 includes a specific target object such as an affected area, the reliability of the inference is low and it is displayed that the specific target object is not included. In the example shown in FIGS. 6(a) and 6(b), when the endoscope device 10 is directly facing a specific object such as an affected area, the reliability of the inference is high; If the angle is oblique to the specific object, the reliability of the inference is low. That is, the inference model set in the inference engine 13A is unlikely to be able to provide highly reliable inference when the tip of the endoscope is oblique to the wall surface within the body cavity. In such a case, it is preferable to perform re-learning using images with different inference results (unfavorable images) to generate a more reliable inference model.
 このように、内視鏡装置10からの画像を推論モデルに入力し、推論を行う場合、以下のような内視鏡等における特殊性があるため、信頼性の高い推論モデルを作成し難いという内視鏡装置特有の課題がある。すなわち、(1)内視鏡装置の先端部は空間的に固定されていないので、先端部の方向がブレてしまい、取得できる画像の方向が急激に変化してしまう(サッカードに相当する画像の急変)。このように、画像の視線方向が飛んでしまうことがある。(2)内視鏡先端部と体腔内の壁面との距離は元々近距離にあるため(至近距離にある)、先端部が僅かに動くだけで観察対象が大きく変わってしまう。(3)体腔内を照明するための光源と、撮像素子の同時に動くことによって、輝度が急変し、画像が急変してしまう。また、画像の明度・彩度・コントラストの変化が起こりうる。 In this way, when inputting images from the endoscope device 10 into an inference model and performing inference, it is difficult to create a highly reliable inference model due to the following special characteristics of endoscopes, etc. There are issues unique to endoscopic devices. In other words, (1) the tip of the endoscope device is not fixed spatially, so the direction of the tip is blurred, and the direction of the image that can be acquired changes rapidly (images equivalent to saccades). sudden change). In this way, the line of sight direction of the image may jump. (2) Since the distance between the tip of the endoscope and the wall surface of the body cavity is originally close (nearby distance), even a slight movement of the tip can significantly change the object to be observed. (3) When the light source for illuminating the inside of the body cavity and the image sensor move simultaneously, the brightness changes suddenly and the image changes suddenly. Additionally, changes in brightness, saturation, and contrast of the image may occur.
 上述したように、内視鏡の先端部は細かな動きがあり、このため、凝視が難しく、対象物から外れ易いという特性がある。このとき、苦手画像、見逃し画像に相当するような重要な画像(教師画像候補)があり得る。このような教師候補となる画像を、教師データとして、学習することによって、患部等の特定対象物が含まれる画像についても、教師候補として抽出することができる。このため、本実施形態においては、患部等の特定対象物画像と類似する類似画像群の中から、教師データの候補となる画像を選択するようにしている。なお、類似画像群の中には、同一部位を観察している途中に取得する非類似画像も含んでも良いことにする。また、類似画像群の中から教師データの候補を選択するにあたっては、隣接する画像の推論結果の差異を利用する。 As mentioned above, the tip of the endoscope has small movements, which makes it difficult to stare at and easily detaches from the object. At this time, there may be important images (teacher image candidates) that correspond to weak images or missed images. By learning such images that are teacher candidates as teacher data, images that include specific objects such as affected parts can also be extracted as teacher candidates. For this reason, in this embodiment, images that are candidates for teacher data are selected from a group of similar images that are similar to images of a specific target such as an affected area. Note that the similar image group may also include dissimilar images acquired while observing the same region. Furthermore, when selecting training data candidates from a group of similar images, differences in inference results between adjacent images are used.
 次に、図2に示すフローチャートを用いて、本実施形態における撮像動作について説明する。このフローは、内視鏡装置10内の制御部20のCPUが記録部18等のメモリに記憶されているプログラムに基づいて、内視鏡装置10内の各部を制御することによって実現する。 Next, the imaging operation in this embodiment will be explained using the flowchart shown in FIG. This flow is realized by the CPU of the control section 20 within the endoscope apparatus 10 controlling each section within the endoscope apparatus 10 based on a program stored in a memory such as the recording section 18.
 図2に示すフローの動作が開始すると、まず、撮像および推論を行う(S1)。ここでは、制御部11が、撮像部11内の撮像素子および撮像制御回路に対して、撮像動作の開始を指示する。撮像動作は、予め決められたフレームレートの時間間隔で1画面(1フレーム)分の撮像信号を順次読み出す。撮像動作はステップS9において、撮像動作が終了と判定されるまで繰り返し続行する。撮像動作を開始すると、撮像部11内の撮像素子が画像信号を出力し、画像処理部12が画像信号を視認可能な情報となるように、画像処理する。この画像処理が施された画像データは表示部12aに出力され、内視鏡画像として表示される。 When the operation of the flow shown in FIG. 2 starts, first, imaging and inference are performed (S1). Here, the control unit 11 instructs the imaging device and the imaging control circuit in the imaging unit 11 to start an imaging operation. In the imaging operation, imaging signals for one screen (one frame) are sequentially read out at time intervals of a predetermined frame rate. The imaging operation continues repeatedly until it is determined in step S9 that the imaging operation has ended. When the imaging operation is started, the imaging element in the imaging section 11 outputs an image signal, and the image processing section 12 processes the image signal so that it becomes visually recognizable information. The image data subjected to this image processing is output to the display unit 12a and displayed as an endoscopic image.
 また、ステップS1において、推論部13は、画像処理が施された画像データを入力し推論を行う。すなわち、推論部13が、撮像して得た内視鏡画像に含まれる特定対象物画像を、機械学習モデルを用いて推論し、また言い換えると、推論部13が、入力画像を、機械学習モデルを用いて推論する。この推論結果は、画像処理部12において、内視鏡画像に重畳され、この画像が表示部12aに表示される。この推論は、例えば、医師等の内視鏡装置10の使用者に、診断の補助となるアドバイスを表示するために行う。内視鏡画像中に、患部等の特定対象物が含まれている場合には、その旨の表示を行ってもよい。また、特定対象物を見出すための、内視鏡操作に関するアドバイスの推論を行ってもよい。また推論部13は、入力画像の分類、検出、および領域抽出の少なくとも1つを推論してもよい。すなわち、入力した画像の中に患部等があれば、その患部等を検出し、どのような種類の患部等であるか、またその検出位置、また患部等の範囲等を推論してもよい。これらの推論を行うと、推論部13は、この推論の信頼性について算出する。 Furthermore, in step S1, the inference unit 13 receives image data that has been subjected to image processing and performs inference. That is, the inference unit 13 infers a specific object image included in the captured endoscopic image using a machine learning model. Make inferences using . This inference result is superimposed on the endoscopic image in the image processing section 12, and this image is displayed on the display section 12a. This inference is performed, for example, in order to display advice to assist in diagnosis to a user of the endoscope apparatus 10, such as a doctor. If a specific object such as an affected area is included in the endoscopic image, a display to that effect may be provided. Further, advice regarding endoscope operation to find a specific object may be inferred. Further, the inference unit 13 may infer at least one of classification, detection, and region extraction of the input image. That is, if there is an affected area in the input image, the affected area may be detected, and the type of affected area, the detected position, the range of the affected area, etc. may be inferred. After performing these inferences, the inference unit 13 calculates the reliability of this inference.
 撮像および推論を行うと、次に、類似画像群の判定を行う(S3)。ここでは、画像判定部15が、時間的に連続して撮像して得た内視鏡画像の内容を比較し、所定のコマ数に亘って非類似画像を含んでいても同一部位を観察している類似画像群を判定する。前述したように(図6参照)、本実施形態においては、類似画像群Is内の画像の中で、再学習用の画像を選択する。このステップでは、入力した画像が類似画像であるか否かを判定する。具体的には、画像処理部12から、1コマ分の画像データが出力されると、画像判定部15が、直前のコマと、今回のコマの特徴量を比較し、類似画像であるか否かを判定する。なお、類似画像群に属するか否かは、特徴量以外のパラメータを用いて判定してもよい。 After imaging and inference are performed, next, a similar image group is determined (S3). Here, the image determination unit 15 compares the contents of endoscopic images obtained by temporally consecutive imaging, and observes the same region even if dissimilar images are included for a predetermined number of frames. Determine a group of similar images. As described above (see FIG. 6), in this embodiment, an image for relearning is selected from among the images in the similar image group Is. In this step, it is determined whether the input image is a similar image. Specifically, when image data for one frame is output from the image processing unit 12, the image determining unit 15 compares the feature amounts of the previous frame and the current frame, and determines whether or not they are similar images. Determine whether Note that whether or not an image belongs to a similar image group may be determined using parameters other than the feature amount.
 前述したように、内視鏡の画像は連続して、同様の部位を検出していても、変化しやすく、単純なパターンマッチングや画像を数値化した時の類似度判定からすると類似画像ではないとして判定されうる。すなわち、内容が異なる非類似画像と判定されうる。しかし、類似画像が時間的に連続して取得されている場合には、単純には非類似とされてしまう画像であっても重要な画像が含まれている可能性が高いので、ここでは、条件を満たす場合の非類似画像は類似画像のグループを構成する画像として扱っている。つまり、内視鏡用推論モデル学習用の内視鏡画像を取得するための教師データ候補画像取得において、時間的に連続して撮像して得た内視鏡画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて類似画像群として判定を行っている。画像の内容とは、撮像されているものの対象物の形状やパターンや陰影や色や大きさ、位置(回転や歪などもあるが、それを補正したものでもよい)など画像を特徴づけるもの(画像特徴)である。 As mentioned above, even if endoscopic images successively detect similar parts, they tend to change easily and are not similar images based on simple pattern matching or similarity determination when quantifying images. It can be determined as In other words, the images may be determined to be dissimilar images with different contents. However, if similar images are acquired sequentially in time, there is a high possibility that important images may be included even if the images are simply considered dissimilar. Dissimilar images that satisfy the conditions are treated as images that constitute a group of similar images. In other words, when acquiring training data candidate images for acquiring endoscopic images for learning endoscopic inference models, the contents of endoscopic images obtained by temporally consecutive imaging are compared, and a predetermined A group of similar images in which the same part is observed over a number of frames is determined as a group of similar images, including dissimilar images. The content of an image refers to the characteristics of the image, such as the shape, pattern, shadow, color, size, and position (including rotation and distortion, but it may also be corrected) of the object being imaged. image features).
 上述したように、類似画像群の判定にあたって、非類似画像も類似画像群に含める場合がある。この点について、図4を用いて説明する。図4は、撮像部11によって取得した画像の例を示し、画像P21、P22、P29は、図6(a)(b)と同じく内視鏡装置10の挿脱時における遷移画像であり、画像P23、P24は、中間画像である。画像P25、P28は、内視鏡先端部が患部等の特定対象部に対して正対している場合の画像である。この画像P25、P28には、患部等の特定対象物がはっきり映っており、推論の信頼性も高い。 As described above, when determining a similar image group, dissimilar images may also be included in the similar image group. This point will be explained using FIG. 4. FIG. 4 shows an example of images acquired by the imaging unit 11, and images P21, P22, and P29 are transition images when the endoscope device 10 is inserted and removed, similar to FIGS. 6(a) and 6(b). P23 and P24 are intermediate images. Images P25 and P28 are images when the distal end of the endoscope is directly facing a specific target area such as an affected area. These images P25 and P28 clearly show specific objects such as the affected area, and the reliability of the inference is high.
 画像P25と画像P28の間の画像P26、P27は、サッカード(無意識の眼球移動)に相当する画像であり、画像微動として表現する。サッカードは無意志の視野変化であり、サッカード相当の画像変化は、至近距離の対象物や、光源と対象物の位置関係によって画像が急変することをいう。このサッカード相当の画像変化は、図6の説明の際に前述したように、内視鏡等を用いた検査の際における特殊性が関係している。このように、図P26、P27は、サッカードに相当する画像であるため、意図せぬ画像、想定外の画像となっており、しかも様々な変化の態様をとることがある。このため、サッカード相当の画像は、必ずしも学習時に教師データに入れられていない場合があり、推論が困難な状況になりがちである。ここでは、品質が変化していると表現する。このため、画像P26、P28は、画像P25、P26と非類似と判定され易い。しかし、品質変化のある画像であっても、極端に画像の品質が劣化した画像でなければ、患部等の特定対象物が写っていて、推論モデルの仕様によっては、見逃さない方が良い対象物である可能性がある。そこで、本実施形態においては、類似画像群の中にある非類似画像も類似画像群として扱うようにしている。ステップS3における類似画像群判定の詳しい動作については、図3を用いて後述する。 Images P26 and P27 between image P25 and image P28 are images corresponding to a saccade (involuntary eye movement), and are expressed as image micromovement. A saccade is an involuntary change in visual field, and an image change equivalent to a saccade refers to a sudden change in the image due to an object at close range or the positional relationship between the light source and the object. This image change equivalent to a saccade is related to the special nature of the examination using an endoscope or the like, as described above in the explanation of FIG. In this way, since figures P26 and P27 are images corresponding to saccades, they are unintended images and unexpected images, and may take various forms of change. For this reason, images corresponding to saccades are not always included in the training data during learning, which tends to make inference difficult. Here, we express that the quality is changing. Therefore, images P26 and P28 are likely to be determined to be dissimilar to images P25 and P26. However, even if the image has a quality change, if the image quality is not extremely deteriorated, it may show a specific object such as an affected area, and depending on the specifications of the inference model, it may be an object that should not be overlooked. There is a possibility that it is. Therefore, in this embodiment, dissimilar images included in a similar image group are also treated as a similar image group. The detailed operation of similar image group determination in step S3 will be described later using FIG.
 類似画像群判定を行うと、次に、画像群内に推論結果に差異があるか否かを判定する(S5)。前述したように、撮像部11によって取得され、画像処理部12によって画像処理された画像は、推論部13において推論がなされる。このステップでは、類似画像群の中で推論結果に差異があるか否かが判定される。すなわち、差異判定部14が、画像判定部15によって判定された類似画像群に含まれる各々の画像について、推論部による推論結果の差異を算出している。 After performing the similar image group determination, it is then determined whether there is a difference in the inference results within the image group (S5). As described above, the inference section 13 performs inference on the image acquired by the imaging section 11 and subjected to image processing by the image processing section 12 . In this step, it is determined whether there is a difference in the inference results among the similar image groups. That is, the difference determination unit 14 calculates the difference between the inference results by the inference unit for each image included in the similar image group determined by the image determination unit 15.
 また、ステップS5においては、推論結果の差異として信頼性を用いて判定してもよい。また、信頼性以外にも、例えば、患部等の特定対象物の検出範囲や位置がばらついている場合に、推論結果に差異があると判定してもよい。また、患部等の特定対象物が出現したり、出現しなかったりするなど、出現状態が変化する場合に、推論結果に差異があると判定してもよい。さらに、信頼性の値等の差分がどれくらい変化したかに応じて、判定してもよい。また、人間の目に無意識な動きであるサッカードがあっても、視覚イメージが脳内で補正されるように、画像が急変されても、推論結果が乱れないようにする方が、使いやすく、内視鏡装置で検査を行う人に優しい推論モデルとなり得る。この画像急変は原因となるパラメータが多く、複合的に起こりうるため、機械学習時に、すべてのパターンを用意するのが困難な場合が多い。 Furthermore, in step S5, reliability may be used as the difference in the inference results for determination. Furthermore, in addition to the reliability, it may be determined that there is a difference in the inference results, for example, when the detection range or position of a specific object such as an affected area varies. Furthermore, it may be determined that there is a difference in the inference results when the appearance state changes, such as when a specific object such as an affected area appears or does not appear. Furthermore, the determination may be made based on how much the difference in reliability values and the like has changed. In addition, just as the visual image is corrected in the brain even when the human eye makes a saccade, which is an unconscious movement, it is easier to use if the inference result is not disrupted even if the image suddenly changes. , it can be an inference model that is friendly to people who perform examinations with endoscopic equipment. There are many parameters that cause this sudden image change, and it can occur in a complex manner, so it is often difficult to prepare all patterns during machine learning.
 なお、信頼性が変わり得る要因としては、次のような(1)~(5)の場合がある。すなわち、(1)画質に影響する構造強調処理を行った場合、(2)TXI(Texture and Color Enhancement Imaging) 等の画像処理を変更した場合、(3)NBI(Narrow Band Imaging)、RDI(Red dichromatic Imaging)等において、光源を変更した場合、(4)鉗子など処置具を出し入れした場合、(5)、拡大観察(フード押しつけ、フード内に水を満たす浸水法での撮影)を行った場合がある。また、信頼性を高い(〇)、低い(×)と単純に判定する以外にも、信頼性の判定方法を「モードに応じて変える」ようにしてもよい。 Note that there are the following cases (1) to (5) as factors that can change the reliability. In other words, (1) when structural enhancement processing that affects image quality is performed, (2) when image processing such as TXI (Texture and Color Enhancement Imaging) is changed, (3) when NBI (Narrow Band Imaging), RDI (Red (4) When a treatment tool such as forceps is inserted or removed; (5) When performing magnified observation (pressing a hood or photographing using the immersion method by filling the hood with water) There is. In addition to simply determining reliability as high (○) or low (x), the reliability determination method may be changed depending on the mode.
 ステップS5における判定の結果、画像群内において推論結果に差異がある場合には、学習用画像を選択する(S7)。このステップでは、制御部20が、差異判定部14によって算出された推論結果の差異に基づいて、学習(再学習を含んでもよい)に利用する画像を選択し、言い換えると、制御部20が差異判定部14によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する。前述したように、類似画像群の中の画像であって、推論結果に差異が生ずるような場合には、すなわち、信頼性が低いと判定されるような場合であっても、患部等の特定対象物が写っている可能性がある。そこで、制御部20は、このような画像を再学習用に選択する。勿論、この選択した画像の中に、患部等の特定対象物が写っていない画像が含まれる可能性がある。しかし、このような不適切な画像は、教師データを作成するためにアノテーションを行う際に排除すればよい。画像を排除するよりは、苦手画像や見逃し画像を再学習用の選択しておくことによって、より良い推論モデルを生成することができる。 As a result of the determination in step S5, if there is a difference in the inference results within the image group, a learning image is selected (S7). In this step, the control unit 20 selects an image to be used for learning (which may include relearning) based on the difference in the inference results calculated by the difference determination unit 14. Based on the amount of change calculated by the determination unit 14, a teacher data candidate image to be used for learning is selected. As mentioned above, if the images are among a group of similar images and there is a difference in the inference results, in other words, even if the reliability is determined to be low, it is difficult to identify the affected area, etc. The object may be in the image. Therefore, the control unit 20 selects such an image for relearning. Of course, there is a possibility that the selected images include images that do not include a specific object such as an affected area. However, such inappropriate images can be eliminated when performing annotation to create training data. Rather than eliminating images, a better inference model can be generated by selecting weak images or missed images for relearning.
 選択した画像そのもの、または選択画像を検索できるようにした情報は、これらの画像や情報を用いて推論モデルの改良がすぐできるように整理して、記録部18に記録しておき、通信部19を通じて学習装置30に送信する。なお、画像を選択するたびに、通信部19を通じて、学習装置30に送信し、記録部33に記録し、所定数の再学習用の画像を収集した時点で、再学習を行って推論モデルを生成してもよい。なお、ステップS7において、学習用画像を選択する際には、極端に画質の悪いものを除くとよい。学習に適さない画像を画質が悪いと定義してもよい。また、画像処理等を行っても視認性が向上しない場合や、二次元に配置された画素信号のレベルに差異がなく(例えば、画面全体が真っ白や真っ黒の場合)、画像として認識することができない場合等、あまりにも画像が悪い場合には、学習に適さない。 The selected image itself or the information that allows the selected image to be searched are organized and recorded in the recording section 18 so that the inference model can be improved immediately using these images and information, and the information is stored in the communication section 19. The information is transmitted to the learning device 30 through Note that each time an image is selected, it is transmitted to the learning device 30 through the communication unit 19 and recorded in the recording unit 33, and when a predetermined number of images for re-learning have been collected, re-learning is performed to generate the inference model. May be generated. Note that when selecting learning images in step S7, it is preferable to exclude images with extremely poor image quality. Images that are not suitable for learning may be defined as having poor image quality. In addition, there are cases where visibility does not improve even after image processing, or where there is no difference in the level of two-dimensionally arranged pixel signals (for example, when the entire screen is pure white or pure black), and it is difficult to recognize it as an image. If the image is too bad, such as when it cannot be used, it is not suitable for learning.
 ステップS7において学習用画像を選択すると、またはステップS5における判定の結果、画像群内において推論結果に差異がなかった場合には、次に、終了か否かを判定する(S9)。医師等が内視鏡検査の終了操作等を行った場合には、終了と判定する。この判定の結果、終了でない場合には、ステップS1に戻り、一方、終了であった場合には、終了処理を行った後、本フローを終了する。 If a learning image is selected in step S7, or if there is no difference in the inference results within the image group as a result of the determination in step S5, then it is determined whether or not to end (S9). When a doctor or the like performs an operation to end the endoscopy, it is determined that the endoscopy has ended. As a result of this determination, if it is not the end, the process returns to step S1, whereas if it is the end, the flow is ended after the end process is performed.
 次に、図3に示すフローチャートを用いて、ステップS3(図2参照)における類似画像群判定の動作について説明する。 Next, the similar image group determination operation in step S3 (see FIG. 2) will be described using the flowchart shown in FIG.
 類似画像群判定のフローが開始すると、まず、画像特徴を仮記録する(S11)。ここでは、画像処理部16または画像判定部15が、撮像部11が取得した画像の特徴量を算出する。画像の特徴量は前述したように、公知の方法によって算出すればよい。ここで、算出した画像特徴量は、記録部18や制御部20等の内部に設けたメモリに仮記録する。このメモリには、画像特徴量の履歴、すなわち画像特徴量が時系列に仮記録されている。 When the similar image group determination flow starts, first, image features are temporarily recorded (S11). Here, the image processing section 16 or the image determination section 15 calculates the feature amount of the image acquired by the imaging section 11. As described above, the feature amount of the image may be calculated using a known method. Here, the calculated image feature amount is temporarily recorded in a memory provided inside the recording section 18, the control section 20, etc. In this memory, a history of image feature amounts, that is, image feature amounts are temporarily recorded in chronological order.
 画像特徴量を仮記録すると、次に、直前コマと画像特徴量が類似でないか否かを判定する(S13)。ここでは、ステップS11において仮記録されている画像特徴量に基づいて、差異判定部14が、最新の画像コマの画像特徴量と、直前の画像コマの画像特徴量を比較し、画像特徴が類似しているか否かを判定する。この場合、2つの画像の画像特徴量が所定範囲外であれば、類似していないと判定する。 After the image feature amount is provisionally recorded, it is then determined whether or not the image feature amount is similar to the previous frame (S13). Here, the difference determination unit 14 compares the image feature amount of the latest image frame and the image feature amount of the immediately previous image frame based on the image feature amount temporarily recorded in step S11, and the image features are similar. Determine whether or not. In this case, if the image feature amounts of the two images are outside the predetermined range, it is determined that the two images are not similar.
 ステップS13における判定の結果、直前コマと画像の特徴が類似していないと判定された場合には、次に、直前コマ以前の画像とは特徴が類似しているか否かを判定する(S15)。ここでは、ここでは、ステップS11において仮記録されている画像特徴量に基づいて、差異判定部14が、最新の画像コマの画像特徴量と、直前コマ以前の画像の画像特徴量を比較し、直前コマ以前の画像と特徴が類似しているか否かを判定する。直前以前の画像として、何コマ前まで比較するかは、設計思想に合わせて決定すればよく、画像の状態に応じて適宜変更してもよい。 As a result of the determination in step S13, if it is determined that the features of the image are not similar to the immediately preceding frame, then it is determined whether the features are similar to the images before the immediately preceding frame (S15). . Here, the difference determination unit 14 compares the image feature amount of the latest image frame with the image feature amount of the image before the immediately preceding frame, based on the image feature amount provisionally recorded in step S11, It is determined whether the features are similar to the image before the previous frame. The number of previous frames to be compared as the immediately previous image may be determined according to the design concept, and may be changed as appropriate depending on the state of the image.
 ステップS15において、直前コマ以前のコマ数としては、医師等の専門家が内視鏡検査等の検査を行う際に、特定対象物を見逃してしまう可能性のある時間や、苦手画像が発生してしまう可能性のある時間等に対応するコマ数であればよい。例えば、手ブレが収まるまでの時間に対応するコマ数、先端部の微振動が収まるまで時間、内視鏡操作時の先端部ブレの収まるまでの時間に対応するコマ数、または操作者の手のぶれが収まるまでの時間に対応するコマ数であればよい。また、特定の処置の間に取得した画像では、その処置の画像だけが別の画像に見える場合がある。このような画像は、同じ類似画像群として扱う方が良く、処置の種類が分かれば、これを検出し、その処置にかかる時間(平均時間等)に応じて、ここでの「直前以前のコマ数」を決め、直前コマ以前の画像を特定してもよい。同様のことは、処置に伴って出血した場合や、水蒸気が発生した場合等の状況が判定できれば、状況に応じて直前以前のコマ数を決めてもよい。 In step S15, the number of frames before the previous frame is determined to avoid the time during which a specialist such as a doctor may miss a specific object when performing an examination such as an endoscopy, or the occurrence of weak images. The number of frames may be any number that corresponds to the time period during which there is a possibility that the image will be lost. For example, the number of frames corresponding to the time it takes for camera shake to subside, the time it takes for the slight vibration of the tip to subside, the number of frames corresponding to the time it takes for tip shake to subside during endoscope operation, or the number of frames corresponding to the time it takes for the tip to stop shaking during endoscope operation, or the operator's hand. The number of frames may be any number that corresponds to the time it takes for the blur to subside. Further, in images acquired during a specific procedure, only the image of that procedure may appear to be a different image. It is better to treat such images as a group of similar images.If the type of treatment is known, this can be detected and the "previous frame" You may also decide on the number of frames and identify the images before the previous frame. Similarly, if the situation can be determined, such as when there is bleeding or when water vapor is generated due to the treatment, the number of frames immediately before may be determined depending on the situation.
 ステップS13における判定の結果、最新コマの画像と直前コマの画像の特徴が類似している場合には、またはステップS15における判定の結果、最新コマの画像と直前コマ以前の画像の特徴が類似している場合には、類似画像とする(S17)。ステップS13、S15の判定の結果、最新の画像は、直前のコマまたは直前以前のコマと、類似していることから、類似画像としている。パターンマッチングや画像を数値化した時の類似度判定に基づいて類似画像とすることができる。しかし、パターンマッチング等による判定では、類似画像ではない(非類似)として判定されうるが、類似画像が時間的に連続して取得されている場合には、単純には非類似とされてしまう画像であっても重要な画像が含まれている可能性が高い。そこで、前述したように、条件を満たす場合の非類似画像は類似画像のグループを構成する画像として扱っている。 If the result of the determination in step S13 is that the features of the latest frame image and the image of the immediately preceding frame are similar, or as a result of the determination of step S15, the characteristics of the image of the latest frame and the images before the immediately preceding frame are similar. If so, the image is determined to be a similar image (S17). As a result of the determination in steps S13 and S15, the latest image is similar to the immediately preceding frame or the immediately preceding frame, and is therefore determined to be a similar image. Similar images can be determined based on pattern matching or similarity determination when the images are digitized. However, when determining by pattern matching etc., the images may be determined as not similar (dissimilar), but if similar images are acquired temporally consecutively, the images may simply be determined as dissimilar. However, there is a high possibility that important images are included. Therefore, as described above, dissimilar images that satisfy the conditions are treated as images that constitute a group of similar images.
 ステップS15における判定の結果、直前コマ以前の画像とは特徴が類似していない場合、またはステップS17において類似画像群とすると、類似画像群判定のフローを終了し、元のフローに戻る。この類似画像群判定のフローでは、画像の特徴量の判定結果、非類似であったとしても、直前コマ以前のコマ画像と特徴が類似していれば、類似画像群と判定される。すなわち、類似画像群の中には、非類似の画像が含まれる場合がある。 As a result of the determination in step S15, if the features are not similar to the images before the previous frame, or if it is determined that the image group is a similar image group in step S17, the similar image group determination flow is ended and the process returns to the original flow. In this similar image group determination flow, even if the images are dissimilar as a result of determining the feature amount of the images, if the characteristics are similar to the previous frame image, the images are determined to be a similar image group. That is, the similar image group may include dissimilar images.
 このように、本発明の一実施形態においては、時間的に連続して撮像して画像を取得し、この撮像して得た画像を、機械学習モデルを用いて推論する(S1)。また、取得した画像の内容を数値化し、この数値を比較して画像変化が所定の値以下または未満である画像を、類似画像群として判定する(S3)。この判定された類似画像群に含まれる各々の画像の推論結果間の差異を算出し、差異があるか否かを判定する(S5)。この推論結果の差異に基づき、学習に利用する画像を選択する(S7)。このように、本実施形態においては、推論モデル学習用の教師用データ候補画像を効率的に取得することができる。すなわち、類似画像群の中には、推論結果が高かった画像と低かった画像等、推論結果が異なる場合がある。しかし、推論結果が低いには、入力画像が学習装置にとって苦手画像の可能性があるため、このような画像を教師用データ候補画像として選択している。この画像の中から、アノテーションする際等に、教師用データを選別すればよい。 In this way, in one embodiment of the present invention, images are acquired by temporally continuous imaging, and the images obtained by this imaging are inferred using a machine learning model (S1). Further, the content of the acquired images is digitized, the numeric values are compared, and images whose image change is less than or equal to a predetermined value are determined as a group of similar images (S3). The difference between the inference results of each image included in the determined similar image group is calculated, and it is determined whether or not there is a difference (S5). Based on the difference in the inference results, images to be used for learning are selected (S7). In this manner, in this embodiment, teacher data candidate images for inference model learning can be efficiently acquired. That is, among a group of similar images, there are cases where the inference results are different, such as images with high inference results and images with low inference results. However, if the inference result is low, there is a possibility that the input image is not suitable for the learning device, so such an image is selected as the teacher data candidate image. Teacher data can be selected from among these images when performing annotation.
 次に、図5に示すフローチャートを用いて、図2に示した撮像動作の変形例を説明する。この変形例では、図2に示すフローチャートにおいて、ステップS6を追加し、ステップS7をステップS8に置き換えているだけなので、この相違点について説明する。 Next, a modification of the imaging operation shown in FIG. 2 will be described using the flowchart shown in FIG. In this modification, step S6 is simply added and step S7 is replaced with step S8 in the flowchart shown in FIG. 2, so this difference will be explained.
 図5に示すフローが開始すると、図2に示すフローの場合と同様に、撮像および推論を行い(S1)、類似画像群判定を行い(S3)、画像群内において推論結果に差異があるか否かを判定する(S5)。 When the flow shown in FIG. 5 starts, similarly to the flow shown in FIG. 2, imaging and inference are performed (S1), similar image group determination is performed (S3), and whether there is a difference in the inference results within the image group. It is determined whether or not (S5).
 ステップS5における判定の結果、画像群内の推論結果に差異がある場合には、状況差異判定を行う(S6)。ここでは、状況判定部16が、内視鏡装置10の使用状況等の差異判定を行う。状況判定として取得する情報は、例えば、図4、図6(a)(b)に示すような角度情報(体腔内の壁面と内視鏡先端部の角度情報)がある。角度情報は、画像の変化などから判定してもよいし、センサを内蔵して検出してもよい。照明の均一さや分布などでも検出が可能である。センサデータや照明光の明るさ分布検出結果などをそのまま記録してもよい。また、状況判定に使用する情報としては、撮像部でのピント情報、奥行情報、撮像部11内の光源部情報、注水操作や吸引操作等の処置情報等がある。患部等の特定対象物を推論する場合には、画像データの他に、これらの状況情報があると、推論の信頼性が向上する。そこで、教師データを作成する際に、これらの状況の差異を判定し、情報関連付け部17が、状況差異情報を画像データに関連付け、この関連付けられた画像データを用いて、再学習を行うことを可能にする。 As a result of the determination in step S5, if there is a difference in the inference results within the image group, a situation difference determination is performed (S6). Here, the situation determination unit 16 performs a difference determination in the usage status of the endoscope apparatus 10 and the like. The information acquired as the situation determination includes, for example, angle information (angle information between the wall surface in the body cavity and the tip of the endoscope) as shown in FIGS. 4 and 6(a) and (b). The angle information may be determined from changes in the image, or may be detected using a built-in sensor. Detection is also possible based on the uniformity and distribution of illumination. Sensor data, illumination light brightness distribution detection results, etc. may be recorded as they are. Information used for situation determination includes focus information and depth information in the imaging unit, information on the light source in the imaging unit 11, and treatment information such as water injection operation and suction operation. When inferring a specific object such as an affected area, the reliability of the inference will be improved if such situation information is available in addition to the image data. Therefore, when creating training data, the information association unit 17 determines the differences between these situations, associates the situation difference information with image data, and performs relearning using this associated image data. enable.
 状況差異判定を行うと、次に、学習画像を選択する(S8)。このステップでは、制御部20が、差異判定部14によって算出された推論結果の差異に基づいて、学習(再学習を含んでもよい)に利用する画像を選択する。すなわち、ステップS7と同様に、ステップS5において、類似画像群の中の画像であって、推論結果に差異があると判定された画像を選択する。この場合、ステップS7において取得された状況差異情報を整理し、すなわち、選択された画像に関連付けて記録する。この記録された情報に基づいて、状況ごとにカスタマイズした学習が可能となる。 After performing the situation difference determination, next, a learning image is selected (S8). In this step, the control unit 20 selects an image to be used for learning (which may include relearning) based on the difference in the inference results calculated by the difference determining unit 14. That is, similarly to step S7, in step S5, an image among the similar image group and for which it has been determined that there is a difference in the inference result is selected. In this case, the situation difference information acquired in step S7 is organized, that is, recorded in association with the selected image. Based on this recorded information, learning can be customized for each situation.
 ステップS8において学習用画像を選択すると、またはステップS5における判定の結果、画像群内において推論結果に差異がなかった場合には、次に、終了か否かを判定する(S9)。医師等が内視鏡検査の終了操作等を行った場合には、終了と判定する。この判定の結果、終了でない場合には、ステップS1に戻り、一方、終了であった場合には、終了処理を行った後、本フローを終了する。 If a learning image is selected in step S8, or if there is no difference in the inference results within the image group as a result of the determination in step S5, then it is determined whether or not to end (S9). When a doctor or the like performs an operation to end the endoscopy, it is determined that the endoscopy has ended. As a result of this determination, if it is not the end, the process returns to step S1, whereas if it is the end, the flow is ended after the end process is performed.
 このように、本発明の一実施形態の変形例においては、状況差異判定を行い(S6参照)、この状況差異判定情報を取得し、画像データに関連付けている(S8参照)。この画像データを用いて再学習すれば、内視鏡検査の際の状況変化に応じた推論を行うことができ、推論の信頼性を向上させることができる。 In this way, in the modified example of the embodiment of the present invention, a situation difference determination is performed (see S6), and this situation difference determination information is acquired and associated with the image data (see S8). By relearning using this image data, inferences can be made in accordance with changes in the situation during endoscopy, and the reliability of inferences can be improved.
 次に、図7を用いて、本発明の一実施形態におけるデータの流れを中心に説明する。すなわち、図7では、撮像部41、現像部42、画像同定部43、推論部44、推論結果変化算出部45、学習画像選択部46、記録部47との間でのデータの流れを示しており、これらの各部は、後述するように、図1の各部に対応している。これらの各部は、プロセッサによって、その機能の全てまたは一部を実現してもよく、また、ハードウエアおよびソフトウエアによって実現してもよい。 Next, using FIG. 7, the flow of data in an embodiment of the present invention will be mainly explained. That is, FIG. 7 shows the flow of data among the imaging section 41, the developing section 42, the image identification section 43, the inference section 44, the inference result change calculation section 45, the learning image selection section 46, and the recording section 47. Each of these parts corresponds to each part of FIG. 1, as described later. All or part of the functions of these units may be realized by a processor, or may be realized by hardware and software.
 撮像部41は、図1の撮像部11に相当し、患部等の特定対象物の画像をRAWデータで取得し、現像部42に出力する。前述したように、撮像部11から出力される画像データは、時間的に連続しており、最新のRAWデータ2が出力されると、直前のRAWデータ1が画像ペアであり、さらに最新のRAWデータ2の出力後に次のRAWデータ3が出力されると、RAWデータ2およびRAWデータ3が画像ペアとなる。すなわち、時間的に連続した画像をペアとしている。 The imaging unit 41 corresponds to the imaging unit 11 in FIG. 1, and acquires an image of a specific object such as an affected area as RAW data, and outputs it to the developing unit 42. As mentioned above, the image data output from the imaging unit 11 is continuous in time, and when the latest RAW data 2 is output, the immediately preceding RAW data 1 is an image pair, and the latest RAW data When the next RAW data 3 is output after outputting the data 2, the RAW data 2 and the RAW data 3 become an image pair. In other words, temporally consecutive images are paired.
 現像部42は、撮像部41から出力されたRAWデータを現像処理する。この現像部42は、図1においては画像処理部12がその機能を果たす。現像処理された画像データは、画像同定部43に出力される。画像同定部43は、画像ペアの間で変化が大きいか否かを判定し、大きな変化がないと判定(同定)された同画像ペアを1つ以上抽出する。抽出された同画像ペアは、推論部44に出力される。なお、図1においては、画像判定部15が画像同定部43の機能を果たす。 The developing unit 42 develops the RAW data output from the imaging unit 41. In FIG. 1, the image processing section 12 performs the function of the developing section 42. The developed image data is output to the image identification section 43. The image identification unit 43 determines whether or not there is a large change between image pairs, and extracts one or more identical image pairs that are determined (identified) as having no large change. The extracted identical image pair is output to the inference section 44. Note that in FIG. 1, the image determination section 15 functions as the image identification section 43.
 画像同定部43は、時間的に連続した入力ペア画像がほぼ変化していないことを判定する画像同定部として機能する(例えば、図2および図5のS3、図3、図1の画像判定部15参照)。ここで、入力ペア画像は、例えば、第1画像、第2画像、第3画像・・・と、複数の画像が画像同定部に順次入力される場合に、第1画像と第2画像がペア画像となり、第2画像と第3画像がペア画像なる。また、画像同定部は、対応点の移動量、および画像の明度・彩度・コントラストの変化量の内の少なくとも1つに基づいて、入力ペア画像がほぼ変化していないことを判定する。 The image identification unit 43 functions as an image identification unit that determines that the temporally continuous input pair images have hardly changed (for example, S3 in FIGS. 2 and 5, the image determination unit in FIGS. 3 and 1) 15). Here, the input pair image is, for example, when a plurality of images such as a first image, a second image, a third image, etc. are sequentially input to the image identification unit, the first image and the second image are paired. The second image and the third image form a pair image. Further, the image identification unit determines that the input pair images are substantially unchanged based on at least one of the amount of movement of the corresponding points and the amount of change in brightness, saturation, and contrast of the images.
 推論部44は、画像同定部43からの同画像ペアを入力し、それぞれの画像に対して推論を行い、1対の推論結果(推論結果ペア)を推論結果変化算出部45に出力する。この推論結果を出力する際に、併せて推論結果の信頼度もそれぞれ推論結果変化算出部45に出力する。なお、図1において、推論部13が推論部44の機能を果たす。図1においては、推論部13が画像同定部43から直接、同画像ペアを入力していないが、制御部20から同画像ペアであるとの情報を受けるようにしてもよい。 The inference unit 44 inputs the same image pair from the image identification unit 43, performs inference on each image, and outputs a pair of inference results (inference result pair) to the inference result change calculation unit 45. When outputting this inference result, the reliability of each inference result is also output to the inference result change calculation unit 45. Note that in FIG. 1, the inference section 13 functions as the inference section 44. In FIG. 1, the inference unit 13 does not directly input the same image pair from the image identification unit 43, but it may receive information from the control unit 20 that the images are the same pair.
 推論結果変化算出部45は、推論部44から出力された推論結果ペアに基づき、同画像ペアの推論結果、およびその信頼度の変化量を算出する。すなわち、最新の画像における推論結果と、直前の画像における推論結果を比較し、その変化量を算出し、また両推論結果の信頼度を比較し、その変化量を算出する。この算出された変化量は学習画像選択部46に出力する。なお、図1において、差異判定部14が推論結果変化算出部45の機能を果たす。 Based on the inference result pair output from the inference unit 44, the inference result change calculation unit 45 calculates the inference result for the same image pair and the amount of change in its reliability. That is, the inference result for the latest image and the inference result for the immediately previous image are compared to calculate the amount of change, and the reliability of both inference results is compared to calculate the amount of change. This calculated amount of change is output to the learning image selection section 46. Note that in FIG. 1, the difference determination unit 14 functions as the inference result change calculation unit 45.
 推論結果変化算出部45は、同定された画像ペアの推論結果の変化量を算出する推論結果変化算出部として機能する(例えば、図2および図5のS5、図1の差異判定部14参照)。同定された画像ペアは、例えば、時間的に連続した入力した2つの画像(画像ペア)がほぼ変化していないような画像をいう。推論結果変化算出部は、推論結果および信頼度の少なくとも1つの微分値を用いる(例えば、図2および図5のS5、図1の差異判定部14参照)。推論結果変化算出部45は、時間的に連続した画像ペアの推論結果の変化量を算出する推論結果変化算出部として機能する(例えば、図2および図5のS5、図1の差異判定部14参照)。 The inference result change calculation unit 45 functions as an inference result change calculation unit that calculates the amount of change in the inference result of the identified image pair (for example, see S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). . The identified image pair is, for example, an image in which two temporally consecutive input images (image pair) are almost unchanged. The inference result change calculation unit uses at least one differential value of the inference result and reliability (for example, see S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). The inference result change calculation unit 45 functions as an inference result change calculation unit that calculates the amount of change in the inference results of temporally consecutive image pairs (for example, S5 in FIGS. 2 and 5, and the difference determination unit 14 in FIG. 1). reference).
 また、前述の図2のステップS5において、画像群内に推論結果に差異があるか否かを判定しており、この処理は、推論結果変化算出部45が、同定された画像ペアの推論結果の変化量を算出することに相当する。推論結果の差異としては、例えば、推論の信頼性の値が所定値以上異なっていた場合には、推論結果に差異があると判定してもよい。また言い換えると、推論結果変化出部部45が、時間的に連続した画像ペアの推論結果の変化量を算出するとも言える。 Furthermore, in step S5 of FIG. 2 described above, it is determined whether or not there is a difference in the inference results within the image group, and in this process, the inference result change calculation unit 45 This corresponds to calculating the amount of change in . As for the difference in the inference results, for example, if the reliability values of the inferences differ by a predetermined value or more, it may be determined that there is a difference in the inference results. In other words, it can be said that the inference result change output unit 45 calculates the amount of change in the inference results of temporally consecutive image pairs.
 学習画像選択部46は、推論結果変化算出部45が算出した変化量が大きい場合に、その同画像ペアを学習に利用する画像(学習画像候補)として選択し、選択した学習画像候補として記録部47に出力する。推論結果(信頼度でもよい)の変化量が大きい場合には、機械学習による識別を苦手とする画像である可能性が高いことから、学習候補画像として選択する。実際に、学習に使用するか否かは、教師データを作成する際等に、医師等の専門家が判断することができる。なお、図1において、制御部20が学習画像選択部46の機能を果たす。 When the amount of change calculated by the inference result change calculation unit 45 is large, the learning image selection unit 46 selects the same image pair as an image (learning image candidate) to be used for learning, and records it as the selected learning image candidate in the recording unit. 47. If the amount of change in the inference result (or confidence level) is large, the image is likely to be difficult to identify by machine learning, and is therefore selected as a learning candidate image. In fact, whether or not to use it for learning can be determined by experts such as doctors when creating training data. Note that in FIG. 1, the control section 20 functions as the learning image selection section 46.
 記録部47は、電気的に書き換え可能な不揮発性メモリであり、学習画像選択部46から入力した学習画像候補を順次記録する。記録部47に記録された学習画像候補の中から学習用の教師データを作成し、機械学習等によって推論モデルを生成する。前述したように、学習用の画像としては不適切な場合には、その画像を排除すればよく、適切な画像を用いて教師データを作成すればよい。なお、図1において、記録部18および/または記録部33が記録部47の機能を果たす。 The recording unit 47 is an electrically rewritable nonvolatile memory, and sequentially records the learning image candidates input from the learning image selection unit 46. Teacher data for learning is created from among the learning image candidates recorded in the recording unit 47, and an inference model is generated by machine learning or the like. As described above, if an image is inappropriate as a learning image, it is sufficient to exclude the image and create training data using an appropriate image. Note that in FIG. 1, the recording section 18 and/or the recording section 33 fulfills the function of the recording section 47.
 このように、図7に示すように、本実施形態においては、撮像部41から時間的に連続した画像をペアとし、画像同定部43によって同画像ペアと判定された画像ペアについて、推論結果変換算出部45によって算出された変化量に基づいて、学習候補を選択している。なお、図7の説明として、時間的に連続した2枚の画像について説明した。しかし、例えば、時間的に連続した画像A、画像B、画像Cについて、それぞれの推論結果がA=Cであるが、A≠B、B≠Cの場合には、画像Bだけを選択(抽出)してもよく、また、画像A、画像B、および画像Cの3枚を選択してもよい。すなわち、同画像ペアの推論結果の変化量が所定値よりも大きい場合に、同画像ペアの両方を選択してもよく、または同画像ペアの前後のペアとの比較結果に基づいて、画像を選択してもよい。 In this way, as shown in FIG. 7, in this embodiment, temporally consecutive images from the imaging unit 41 are paired, and the inference result conversion is performed for the image pairs determined to be the same image pair by the image identification unit 43. Learning candidates are selected based on the amount of change calculated by the calculation unit 45. Note that, in the explanation of FIG. 7, two temporally consecutive images have been described. However, for example, for temporally consecutive images A, B, and C, the inference result for each is A=C, but if A≠B and B≠C, only image B is selected (extracted). ), or three images, image A, image B, and image C, may be selected. In other words, if the amount of change in the inference results for the same image pair is greater than a predetermined value, both of the same image pair may be selected, or the image may be selected based on the comparison results with the previous and subsequent pairs of the same image pair. You may choose.
 以上説明したように、本発明の一実施形態とその変形例に係る教師データ候補画像取得装置は、内視鏡用推論モデル学習用の内視鏡取得画像による教師データ候補画像を取得する。そして、この教師データ候補画像取得装置は、時間的に連続して撮像して得た内視鏡画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて判定する類似画像群判定部(例えば、図1の画像判定部15、図2および図5のS3、図3、図7の画像同定部43等参照)を有している。また、教師データ候補画像取得装置は、撮像して得た内視鏡画像に含まれる特定対象物画像を、機械学習モデルを用いて推論する推論部(例えば、図1の推論部13、図2および図5のS1、図7の推論部44等参照)と、類似画像群判定部によって判定された類似画像群に含まれる各々の画像について、推論部による推論結果の差異を算出する推論結果差異算出部(例えば、図1の差異判定部14、図2および図5のS5、図7の推論結果変化算出部45等参照)と、を有している。また、教師データ候補画像取得装置は、さらに、推論部結果差異算出部によって算出された推論結果の差異に基づいて、再学習に利用する画像を選択する学習画像選択部(例えば、図1の制御部20、図2のS7、図5のS8、図7の学習画像選択部46等参照)を有している。このため、本実施形態においては、機械学習による識別を苦手とする医療画像を的確に選定することことができる。すなわち、類似画像群として判定された画像群の中で、推論部による推論結果に差異に基づいて、再学習に利用する画像を選択している。このため、苦手画像を用いて再学習することによって、推論モデルを改良することができる。 As described above, the teacher data candidate image acquisition device according to one embodiment of the present invention and its modification acquires a teacher data candidate image based on an endoscope-obtained image for learning an inference model for an endoscope. This training data candidate image acquisition device compares the contents of endoscopic images obtained by temporally consecutive imaging, and selects a group of similar images in which the same region is observed over a predetermined number of frames. , has a similar image group determination unit (for example, see the image determination unit 15 in FIG. 1, S3 in FIGS. 2 and 5, and the image identification unit 43 in FIGS. 3 and 7) that performs determination including dissimilar images. There is. The teacher data candidate image acquisition device also includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. and S1 in FIG. 5, inference unit 44 in FIG. A calculation unit (for example, see the difference determination unit 14 in FIG. 1, S5 in FIGS. 2 and 5, the inference result change calculation unit 45 in FIG. 7, etc.). The teacher data candidate image acquisition device further includes a learning image selection unit (for example, the control shown in FIG. 20, S7 in FIG. 2, S8 in FIG. 5, learning image selection section 46 in FIG. 7, etc.). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, from among the image group determined as a similar image group, images to be used for relearning are selected based on differences in the inference results by the inference unit. Therefore, the inference model can be improved by relearning using weak images.
 また、本発明の一実施形態とその変形例に係る教師データ候補画像取得装置は、入力画像を、機械学習モデルを用いて推論する推論部(例えば、図1の推論部13、図2および図5のS1、図7の推論部44等参照)と、同定された画像ペアの推論結果の変化量を算出する推論結果変化算出部(例えば、図1の画像判定部15、図2および図5のS3、図3、図7の画像同定部43等参照)と、推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部(例えば、図1の制御部20、図2のS7、図5のS8、図7の学習画像選択部46等参照)と、を有している。このため、本実施形態においては、機械学習による識別を苦手とする医療画像を的確に選定することことができる。すなわち、同定された画像ペアと判定された画像間で、推論部による推論結果に差異に基づいて、学習に利用する画像を選択している。このため、苦手画像を用いて再学習することによって、推論モデルを改良することができる。 Further, the teacher data candidate image acquisition device according to an embodiment of the present invention and a modification thereof includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3, see image identification unit 43 in FIGS. 3 and 7), and a learning image selection unit (for example, , the control section 20 in FIG. 1, S7 in FIG. 2, S8 in FIG. 5, the learning image selection section 46 in FIG. 7, etc.). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, an image to be used for learning is selected based on the difference in the inference result by the inference unit between the identified image pair and the determined image. Therefore, the inference model can be improved by relearning using weak images.
 また、本発明の一実施形態とその変形例に係る教師データ候補画像取得装置は、入力画像を、機械学習モデルを用いて推論する推論部(例えば、図1の推論部13、図2および図5のS1、図7の推論部44等参照)と、時間的に連続した画像ペアの推論結果の変化量を算出する推論結果変化算出部(例えば、図1の画像判定部15、図2および図5のS3、図3、図7の画像同定部43等参照)と、推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部(例えば、図1の制御部20、図2のS7、図5のS8、図7の学習画像選択部46等参照)と、を有している。このため、本実施形態においては、機械学習による識別を苦手とする医療画像を的確に選定することことができる。すなわち、時間的に連続した画像ペアと判定された画像間で、推論部による推論結果に差異に基づいて、学習に利用する画像を選択している。このため、苦手画像を用いて再学習することによって、推論モデルを改良することができる。 Further, the teacher data candidate image acquisition device according to an embodiment of the present invention and a modification thereof includes an inference unit (for example, the inference unit 13 in FIG. 1, the inference unit 13 in FIG. 2 and FIG. 5, the inference unit 44 in FIG. S3 in FIG. 5, image identification section 43 in FIG. 3, FIG. 7, etc.) and a learning image selection section that selects teacher data candidate images to be used for learning based on the amount of change calculated by the inference result change calculation section. (For example, see the control section 20 in FIG. 1, S7 in FIG. 2, S8 in FIG. 5, and the learning image selection section 46 in FIG. 7). Therefore, in this embodiment, it is possible to accurately select medical images that are difficult to identify using machine learning. That is, images to be used for learning are selected based on differences in inference results by the inference unit between images determined to be temporally consecutive image pairs. Therefore, the inference model can be improved by relearning using weak images.
 なお、本発明の一実施形態およびその変形例においては、内視鏡装置10内に、撮像部11以外に、画像処理部12、推論部13、差異判定部14、画像判定部15、状況判定部16、情報関連付け部17、記録部18等を設けていることを前提に説明した。しかし、内視鏡装置10内には、これ以外の構成、例えば、表示部等を設けてもよい。また、内視鏡装置10内に全てのブロックを設ける必要はなく、適宜分散してもよい。この場合には、各装置間を通信部等によってデータ通信できるようにすればよい。また、学習装置30内の学習部34等、外部のブロックを配置してもよい。 In one embodiment of the present invention and its modified examples, in addition to the imaging section 11, the endoscope apparatus 10 includes an image processing section 12, an inference section 13, a difference determination section 14, an image determination section 15, and a situation determination section. The explanation has been made on the assumption that the section 16, the information association section 17, the recording section 18, etc. are provided. However, the endoscope device 10 may be provided with other configurations, such as a display unit. Furthermore, it is not necessary to provide all the blocks within the endoscope apparatus 10, and they may be distributed as appropriate. In this case, it is only necessary to enable data communication between each device using a communication section or the like. Furthermore, external blocks such as the learning section 34 within the learning device 30 may be arranged.
 また、本発明の一実施形態およびその変形例においては、制御部20、35は、CPUやメモリ等から構成されている機器として説明した。しかし、CPUとプログラムによってソフトウエア的に構成する以外にも、各部の一部または全部をハードウエア回路で構成してもよく、ヴェリログ(Verilog)やVHDL(Verilog Hardware Description Language)等によって記述されたプログラム言語に基づいて生成されたゲート回路等のハードウエア構成でもよく、またDSP(Digital Signal Processor)等のソフトを利用したハードウエア構成を利用してもよい。これらは適宜組み合わせてもよいことは勿論である。 Furthermore, in the embodiment of the present invention and its modified examples, the control units 20 and 35 have been described as devices composed of a CPU, memory, and the like. However, in addition to configuring software using a CPU and programs, some or all of each part may also be configured as hardware circuits, such as those written in Verilog, VHDL (Verilog Hardware Description Language), etc. A hardware configuration such as a gate circuit generated based on a programming language may be used, or a hardware configuration using software such as a DSP (Digital Signal Processor) may be used. Of course, these may be combined as appropriate.
 また、制御部20、35は、CPUに限らず、コントローラとしての機能を果たす素子であればよく、上述した各部の処理は、ハードウエアとして構成された1つ以上のプロセッサが行ってもよい。例えば、各部は、それぞれが電子回路として構成されたプロセッサであっても構わないし、FPGA(Field Programmable Gate Array)等の集積回路で構成されたプロセッサにおける各回路部であってもよい。または、1つ以上のCPUで構成されるプロセッサが、記録媒体に記録されたコンピュータプログラムを読み込んで実行することによって、各部としての機能を実行しても構わない。 Further, the control units 20 and 35 are not limited to the CPU, and may be any element that functions as a controller, and the processing of each unit described above may be performed by one or more processors configured as hardware. For example, each unit may be a processor configured as an electronic circuit, or each unit may be a circuit unit in a processor configured with an integrated circuit such as an FPGA (Field Programmable Gate Array). Alternatively, a processor including one or more CPUs may execute the functions of each unit by reading and executing a computer program recorded on a recording medium.
 また、本発明の一実施形態およびその変形例においては、内視鏡装置10、学習装置30は、それぞれ各機能を果たすブロックを有しているものとして説明した。しかし、これらは一体の装置内に設けられている必要はなく、例えば、インターネット等の通信網によって接続されていれば、上述の各部は分散されていても構わない。 Furthermore, in the embodiment of the present invention and its modification, the endoscope device 10 and the learning device 30 have been described as having blocks that each perform respective functions. However, these need not be provided in a single device; for example, the above-mentioned units may be distributed as long as they are connected via a communication network such as the Internet.
 また、本発明の一実施形態およびその変形例の説明にあたっては、内視鏡の検査シーンが説明しやすいため、内視鏡を前提に説明をした。しかし、対象物の画像データを利用して観察する際に、なんらかの推論を行う装置であれば、広く応用が可能である。携帯端末内蔵カメラや民生用のカメラであっても、近年では、医療機器としての判定を行う用途で使われる場合もある。この場合には、ホールディングの不安定性によって、サッカード状の画像微動として説明していたような状況が起こりうる。また、自動車やドローンやロボットなどに搭載されたカメラ等においては、対象物との位置関係が不安定な状況で同様の画像微動は起こりやすい。これらの装置において、本願の発明を適用することが出来る。 Furthermore, in explaining one embodiment of the present invention and its modified examples, the explanation is based on an endoscope, since it is easy to explain an inspection scene using an endoscope. However, as long as the device performs some kind of inference when observing an object using image data, it can be widely applied. In recent years, even cameras built into mobile terminals and consumer cameras are sometimes used for purposes of determining whether they are medical devices. In this case, holding instability may cause a situation described as saccade-like image movement. Furthermore, in cameras mounted on cars, drones, robots, etc., similar image fluctuations are likely to occur in situations where the positional relationship with the object is unstable. The invention of the present application can be applied to these devices.
 また、本発明の各実施形態においては、ロジックベースの判定を主として説明し、一部に機械学習を使用した推論による判定を行っていた。ロジックベースによる判定を行うか推論による判定を行うかは、本実施形態においては適宜いずれかを選択して使用するようにしてもよい。また、判定の過程で、部分的にそれぞれの良さを利用してハイブリッド式の判定をしてもよい。 Furthermore, in each embodiment of the present invention, logic-based determination has been mainly described, and determination has been partially performed by inference using machine learning. In this embodiment, either logic-based determination or inference-based determination may be appropriately selected and used. Further, in the process of determination, a hybrid type determination may be performed by partially utilizing the merits of each.
 また、近年は、様々な判断基準を一括して判定できるような人工知能が用いられる事が多く、ここで示したフローチャートの各分岐などを一括して行うような改良もまた、本発明の範疇に入るものであることは言うまでもない。そうした制御に対して、ユーザが善し悪しを入力可能であれば、ユーザの嗜好を学習して、そのユーザにふさわしい方向に、本願で示した実施形態はカスタマイズすることが可能である。 In addition, in recent years, artificial intelligence that can make decisions based on various criteria at once is often used, and improvements such as making each branch of the flowchart shown here at once also fall within the scope of the present invention. Needless to say, it is something that can be included. If the user can input his or her preferences for such control, the embodiments described in this application can be customized in a direction suitable for the user by learning the user's preferences.
 また、本明細書において説明した技術のうち、主にフローチャートで説明した制御に関しては、プログラムで設定可能であることが多く、記録媒体や記録部に収められる場合もある。この記録媒体、記録部への記録の仕方は、製品出荷時に記録してもよく、配布された記録媒体を利用してもよく、インターネットを通じてダウンロードしたものでもよい。 Furthermore, among the techniques described in this specification, the control mainly explained in the flowcharts can often be set by a program, and may be stored in a recording medium or a recording unit. The method of recording on this recording medium and recording unit may be recorded at the time of product shipment, a distributed recording medium may be used, or a recording medium may be downloaded from the Internet.
 また、本発明の一実施形態においては、フローチャートを用いて、本実施形態における動作を説明したが、処理手順は、順番を変えてもよく、また、いずれかのステップを省略してもよく、ステップを追加してもよく、さらに各ステップ内における具体的な処理内容を変更してもよい。 In addition, in one embodiment of the present invention, the operation in this embodiment has been explained using a flowchart, but the order of the processing procedure may be changed, or any step may be omitted. Steps may be added, and the specific processing content within each step may be changed.
 また、特許請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず」、「次に」等の順番を表現する言葉を用いて説明したとしても、特に説明していない箇所では、この順で実施することが必須であることを意味するものではない。 In addition, even if the claims, specification, and operational flows in the drawings are explained using words expressing order such as "first" and "next" for convenience, in parts that are not specifically explained, This does not mean that it is essential to perform them in this order.
 本発明は、上記実施形態にそのまま限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせによって、種々の発明を形成できる。例えば、実施形態に示される全構成要素の幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above-mentioned embodiment as it is, and can be embodied by modifying the constituent elements within the scope of the invention at the implementation stage. Moreover, various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components of different embodiments may be combined as appropriate.
10・・・内視鏡装置、11・・・撮像部、12・・・画像処理部、13・・・推論部、14・・・差異判定部、15・・・画像判定部、16・・・状況判定部、17・・・情報関連付け部、18・・・記録部、19・・・通信部、20・・・制御部、30・・・学習装置、31・・・通信部、32・・・画像処理部、33・・・記録部、34・・・学習部、35・・・制御部 DESCRIPTION OF SYMBOLS 10... Endoscope apparatus, 11... Imaging unit, 12... Image processing unit, 13... Reasoning unit, 14... Difference determination unit, 15... Image determination unit, 16... - Situation determination unit, 17... Information association unit, 18... Recording unit, 19... Communication unit, 20... Control unit, 30... Learning device, 31... Communication unit, 32... ...Image processing unit, 33...Recording unit, 34...Learning unit, 35...Control unit

Claims (26)

  1.  時間的に連続して撮像して得た内視鏡画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて判定する類似画像群判定部と、
     撮像して得た内視鏡画像に含まれる特定対象物画像を、機械学習を用いて推論する推論部と、
     上記類似画像群判定部によって判定された類似画像群に含まれる各々の画像について、上記推論部による推論結果の差異を算出する推論結果差異算出部と、
     上記推論部結果差異算出部によって算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する学習画像選択部と、
     を有することを特徴とする内視鏡。
    Similar images are determined by comparing the contents of endoscopic images obtained by temporally continuous imaging and determining similar image groups, including dissimilar images, in which the same region is observed over a predetermined number of frames. a group determination section;
    an inference unit that uses machine learning to infer a specific object image included in the captured endoscopic image;
    an inference result difference calculation unit that calculates a difference between inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit;
    a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit;
    An endoscope characterized by having.
  2.  上記類似画像群は、急変する視点位置移動に相当する画像群を、上記時間的に連続して撮像して得た内視鏡画像の前後の画像の類似性に従って判定した結果を含むことを特徴とする請求項1に記載の内視鏡。 The similar image group includes the results determined according to the similarity of the images before and after the endoscopic images obtained by sequentially capturing the image group corresponding to the sudden change in viewpoint position. The endoscope according to claim 1.
  3.  上記学習画像選択部によって選択された各々の画像に対応する撮像情報を取得可能な撮像情報取得部を有することを特徴とする請求項1に記載の内視鏡。 The endoscope according to claim 1, further comprising an imaging information acquisition unit capable of acquiring imaging information corresponding to each image selected by the learning image selection unit.
  4.  上記類似画像群判定部は、上記内視鏡画像の画像内の同一対象物のパターンを追尾できるような数値に数値化し、上記数値を用いて類似画像であるか否かを判定することを特徴とする請求項1に記載の内視鏡。 The similar image group determination unit digitizes a pattern of the same object in the endoscopic image into a numerical value that can be tracked, and uses the numerical value to determine whether or not the images are similar. The endoscope according to claim 1.
  5.  上記学習画像選択部は、極端に条件が悪い画像は省くことを特徴とする請求項1に記載の内視鏡。 The endoscope according to claim 1, wherein the learning image selection unit excludes images with extremely poor conditions.
  6.  推論モデル学習用の画像を取得するための教師データ候補画像取得装置であって、
     時間的に連続して撮像して得た画像の内容を比較し、所定のコマ数に亘って同一部位を観察している類似画像群を、非類似画像を含めて判定する類似画像群判定部と、
     撮像して得た画像に含まれる特定対象物画像を、機械学習を用いて推論する推論部と、
     上記類似画像群判定部によって判定された類似画像群に含まれる各々の画像について、上記推論部による推論結果の差異を算出する推論結果差異算出部と、
     上記推論部結果差異算出部によって算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する学習画像選択部と、
     を有することを特徴とする教師データ候補画像取得装置。
    A teacher data candidate image acquisition device for acquiring images for inference model learning,
    A similar image group determination unit that compares the contents of images obtained by temporally consecutive imaging and determines a similar image group in which the same part is observed over a predetermined number of frames, including dissimilar images. and,
    an inference unit that uses machine learning to infer a specific object image included in the captured image;
    an inference result difference calculation unit that calculates a difference between inference results by the inference unit for each image included in the similar image group determined by the similar image group determination unit;
    a learning image selection unit that selects an image to be used for learning based on the difference between the inference results calculated by the inference result difference calculation unit;
    A teacher data candidate image acquisition device comprising:
  7.  推論モデル学習用の画像を取得するための教師データ候補画像取得方法であって、
     時間的に連続して撮像して得た画像の内容を比較し、所定のコマ数に亘って非類似画像を含んでいても同一部位を観察している類似画像群を判定し、
     撮像して得た画像に含まれる特定対象物画像を、機械学習モデルを用いて推論し、
     上記類似画像群に含まれると判定された各々の画像について、上記機械学習モデルを用いて推論した推論結果の差異を算出し、
     算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する、
     ことを特徴とする教師データ候補画像取得方法。
    A training data candidate image acquisition method for acquiring images for inference model learning, the method comprising:
    Comparing the contents of images obtained by temporally continuous imaging, determining a group of similar images that observe the same region even if they include dissimilar images over a predetermined number of frames,
    Infer the specific target image included in the captured image using a machine learning model,
    For each image determined to be included in the similar image group, calculate the difference between the inference results obtained using the machine learning model,
    Selecting images to be used for learning based on the calculated difference in the above inference results,
    A method for acquiring training data candidate images, characterized in that:
  8.  推論モデル学習用の教師データ候補画像を取得するコンピュータに、
     時間的に連続して撮像して得た画像の内容を比較し、所定のコマ数に亘って非類似画像を含んでいても同一部位を観察している類似画像群を判定し、
     撮像して得た画像に含まれる特定対象物画像を、機械学習モデルを用いて推論し、
     上記類似画像群に含まれると判定された各々の画像について、上記機械学習モデルを用いて推論した推論結果の差異を算出し、
     算出された上記推論結果の差異に基づいて、学習に利用する画像を選択する、
     ことを実行させることを特徴とするプログラム。
    A computer that acquires training data candidate images for inference model learning,
    Comparing the contents of images obtained by temporally consecutive imaging, determining a group of similar images that observe the same region even if they include dissimilar images over a predetermined number of frames;
    Infer the specific target image included in the captured image using a machine learning model,
    For each image determined to be included in the similar image group, calculate the difference between the inference results obtained using the machine learning model,
    Selecting images to be used for learning based on the calculated difference in the above inference results,
    A program that performs certain tasks.
  9.  入力画像を、機械学習モデルを用いて推論する推論部と、
     同定された画像ペアの推論結果の変化量を算出する推論結果変化算出部と、
     上記推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部と、
     を有することを特徴とする教師データ候補画像取得装置。
    an inference unit that infers the input image using a machine learning model;
    an inference result change calculation unit that calculates the amount of change in the inference result of the identified image pair;
    a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit;
    A teacher data candidate image acquisition device comprising:
  10.  時間的に連続した入力ペア画像がほぼ変化していないことを判定する画像同定部を有することを特徴とする請求項9に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 9, further comprising an image identification unit that determines that the temporally continuous input pair images have substantially not changed.
  11.  上記画像同定部は、対応点の移動量、および画像の明度・彩度・コントラストの変化量の内の少なくとも1つに基づいて上記入力ペア画像がほぼ変化していないことを判定することを特徴とする請求項10に記載の教師データ候補画像取得装置。 The image identification unit determines that the input pair image is substantially unchanged based on at least one of the amount of movement of the corresponding point and the amount of change in brightness, saturation, and contrast of the image. The teacher data candidate image acquisition device according to claim 10.
  12.  上記入力画像は、医療用画像であることを特徴とする請求項9に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 9, wherein the input image is a medical image.
  13.  上記推論部は、上記入力画像の分類、検出、および領域抽出の少なくとも1つを推論することを特徴とする請求項9に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 9, wherein the inference unit infers at least one of classification, detection, and region extraction of the input image.
  14.  上記推論部は、上記推論の信頼度を出力することを特徴とする請求項9に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 9, wherein the inference unit outputs the reliability of the inference.
  15.  上記推論結果変化算出部は、上記推論結果および信頼度の少なくとも1つの微分値を用いることを特徴とする請求項9に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 9, wherein the inference result change calculation unit uses at least one differential value of the inference result and reliability.
  16.  上記学習画像選択部は、上記変化量が規定値を超えた場合に、上記学習に利用する教師データ候補画像を選択することを特徴とする請求項9に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 9, wherein the learning image selection unit selects the teacher data candidate image to be used for the learning when the amount of change exceeds a specified value.
  17.  入力画像を、機械学習モデルを用いて推論し、
     同定された画像ペアの推論結果の変化量を算出し、
     算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する、
     ことを特徴とする教師データ候補画像取得方法。
    Infer the input image using a machine learning model,
    Calculate the amount of change in the inference result for the identified image pair,
    Selecting training data candidate images to be used for learning based on the calculated amount of change,
    A method for acquiring training data candidate images characterized by the following.
  18.  推論モデル学習用の教師データ候補画像を取得するコンピュータに、
     入力画像を、機械学習モデルを用いて推論し、
     同定された画像ペアの推論結果の変化量を算出し、
     算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する、
     ことを実行させることを特徴とするプログラム。
    A computer that acquires training data candidate images for inference model learning,
    Infer the input image using a machine learning model,
    Calculate the amount of change in the inference result for the identified image pair,
    Selecting training data candidate images to be used for learning based on the calculated amount of change,
    A program that performs certain tasks.
  19.  入力画像を、機械学習モデルを用いて推論する推論部と、
     時間的に連続した画像ペアの推論結果の変化量を算出する推論結果変化算出部と、
     上記推論結果変化算出部によって算出された変化量に基づいて、学習に利用する教師データ候補画像を選択する学習画像選択部と、
     を有することを特徴とする教師データ候補画像取得装置。
    an inference unit that infers the input image using a machine learning model;
    an inference result change calculation unit that calculates the amount of change in inference results for temporally consecutive image pairs;
    a learning image selection unit that selects a teacher data candidate image to be used for learning based on the amount of change calculated by the inference result change calculation unit;
    A teacher data candidate image acquisition device comprising:
  20.  上記入力画像は、医療用画像であることを特徴とする請求項19に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 19, wherein the input image is a medical image.
  21.  上記推論部は、上記入力画像の分類、検出、および領域抽出の少なくとも1つを推論することを特徴とする請求項19に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 19, wherein the inference unit infers at least one of classification, detection, and region extraction of the input image.
  22.  上記推論部は、上記推論の信頼度を出力することを特徴とする請求項19に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 19, wherein the inference unit outputs the reliability of the inference.
  23.  上記推論結果変化算出部は、上記推論結果および信頼度の少なくとも1つの微分値を用いることを特徴とする請求項19に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 19, wherein the inference result change calculation unit uses at least one differential value of the inference result and reliability.
  24.  上記学習画像選択部は、上記変化量が規定値を超えた場合に、上記学習に利用する教師データ候補画像を選択することを特徴とする請求項19に記載の教師データ候補画像取得装置。 The teacher data candidate image acquisition device according to claim 19, wherein the learning image selection unit selects the teacher data candidate image to be used for the learning when the amount of change exceeds a specified value.
  25.  入力画像を、機械学習モデルを用いて推論し、
     時間的に連続した画像ペアの推論結果の変化量を算出し、
     算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する、
     ことを特徴とする教師データ候補画像取得方法。
    Infer the input image using a machine learning model,
    Calculate the amount of change in the inference results for temporally consecutive image pairs,
    Selecting training data candidate images to be used for learning based on the calculated amount of change,
    A method for acquiring training data candidate images characterized by the following.
  26.  推論モデル学習用の教師データ候補画像を取得するコンピュータに、
     入力画像を、機械学習モデルを用いて推論し、
     時間的に連続した画像ペアの推論結果の変化量を算出し、
     算出された上記変化量に基づいて、学習に利用する教師データ候補画像を選択する、
     ことを実行させることを特徴とするプログラム。
    A computer that acquires training data candidate images for inference model learning,
    Infer the input image using a machine learning model,
    Calculate the amount of change in the inference results for temporally consecutive image pairs,
    Selecting training data candidate images to be used for learning based on the calculated amount of change,
    A program that performs certain tasks.
PCT/JP2022/016192 2022-03-30 2022-03-30 Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program WO2023188169A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/016192 WO2023188169A1 (en) 2022-03-30 2022-03-30 Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/016192 WO2023188169A1 (en) 2022-03-30 2022-03-30 Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program

Publications (1)

Publication Number Publication Date
WO2023188169A1 true WO2023188169A1 (en) 2023-10-05

Family

ID=88199745

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/016192 WO2023188169A1 (en) 2022-03-30 2022-03-30 Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program

Country Status (1)

Country Link
WO (1) WO2023188169A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021039748A (en) * 2019-08-30 2021-03-11 キヤノン株式会社 Information processor, information processing method, information processing system, and program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021039748A (en) * 2019-08-30 2021-03-11 キヤノン株式会社 Information processor, information processing method, information processing system, and program

Similar Documents

Publication Publication Date Title
JP6927211B2 (en) Image diagnostic learning device, diagnostic imaging device, method and program
JP6843926B2 (en) Video endoscopy system
JP6371729B2 (en) Endoscopy support apparatus, operation method of endoscopy support apparatus, and endoscope support program
JP7127785B2 (en) Information processing system, endoscope system, trained model, information storage medium, and information processing method
JP6254053B2 (en) Endoscopic image diagnosis support apparatus, system and program, and operation method of endoscopic image diagnosis support apparatus
WO2014155778A1 (en) Image processing device, endoscopic device, program and image processing method
JP2006320650A (en) Image display device
JP5326064B2 (en) Image processing device
US20210406737A1 (en) System and methods for aggregating features in video frames to improve accuracy of ai detection algorithms
KR102531400B1 (en) Artificial intelligence-based colonoscopy diagnosis supporting system and method
JP2018153346A (en) Endoscope position specification device, method, and program
CN116723787A (en) Computer program, learning model generation method, and auxiliary device
WO2023188169A1 (en) Endoscope, training data candidate-image acquisition device, training data candidate-image acquisition method, and program
JP7387859B2 (en) Medical image processing device, processor device, endoscope system, operating method and program for medical image processing device
US20220361739A1 (en) Image processing apparatus, image processing method, and endoscope apparatus
JPWO2019088008A1 (en) Image processing equipment, image processing methods, programs, and endoscopic systems
JP2022132180A (en) Artificial intelligence-based gastroscopy video diagnosis supporting system and method
WO2021044590A1 (en) Endoscope system, treatment system, endoscope system operation method and image processing program
WO2023013080A1 (en) Annotation assistance method, annotation assistance program, and annotation assistance device
US20240112450A1 (en) Information processing device and information processing method
WO2023218523A1 (en) Second endoscopic system, first endoscopic system, and endoscopic inspection method
WO2023195103A1 (en) Inspection assistance system and inspection assistance method
WO2023148857A1 (en) Information output method and information output device
US20220241015A1 (en) Methods and systems for planning a surgical procedure
WO2022044642A1 (en) Learning device, learning method, program, learned model, and endoscope system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22935314

Country of ref document: EP

Kind code of ref document: A1