CN115428011A

CN115428011A - Estimation learning device and estimation learning method

Info

Publication number: CN115428011A
Application number: CN202180003949.5A
Authority: CN
Inventors: 新谷浩一; 谷宪; 市川学; 伊藤健世; 后町智子; 野中修
Original assignee: Olympus Corp
Current assignee: Olympus Corp
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2022-12-02
Also published as: WO2022190386A1

Abstract

Provided are an estimation learning device and an estimation learning method, which can perform appropriate estimation not only on data of a previously assumed type but also in an unknown type even when the characteristics of data have been changed for data accumulated until then. Image data from a 1 st image acquisition device is input (S1 a, S5 a), when an estimation model is relearned for a 2 nd image acquisition device having image input characteristics different from the 1 st image acquisition device, the image data obtained from the 1 st image acquisition device is processed according to the difference in the image input characteristics to be teaching data (S3 a, S7 a), and the estimation model is obtained by learning using the teaching data obtained by annotating the image data (S9).

Description

Estimation learning device and estimation learning method

Technical Field

The present invention relates to an estimation learning device and an estimation learning method that collect data from a user and generate an estimation model using the data.

Background

In machine learning such as deep learning, teaching data is generated and deep learning is performed using the teaching data. The generation of teaching data requires labor and thus costs are high. Accordingly, a method of collecting teaching data with high quality at low cost has been proposed. For example, in patent document 1, a search condition for collecting data relating to a specific region from reference data relating to the specific region using the 1 st feature vector is generated. Then, data is collected using the search condition, a 2 nd feature vector of the collected data is calculated, and if the similarity between the 1 st feature vector and the 2 nd feature vector is within a predetermined range, the data collected using the search condition is extracted as teaching data.

Documents of the prior art

Patent document

Patent document 1: japanese patent laid-open publication No. 2018-124617

Disclosure of Invention

Problems to be solved by the invention

According to the data collection method described in patent document 1, teaching data can be collected at low cost. However, the data collection method of patent document 1 assumes that data in a specific area is collected. On the other hand, the estimation model generated using the teaching data is not limited to data in a specific field (specific type) assumed in advance, and the application range is widened to unknown types (unknown types), and estimation may be necessary.

Therefore, even in the unknown type, if an estimation model is generated using data having different characteristics from the conventional data, the data of the unknown type can be estimated. However, when generating an estimation model that can correspond to an unknown class, data matching the characteristics thereof must be collected, which results in a time and cost.

The present invention has been made in view of the above circumstances, and an object thereof is to provide an estimation learning device and an estimation learning method as follows: in the unknown type, it is possible to perform appropriate estimation even when the characteristics of the data are changed from those of the previously stored data, without being limited to the data of the previously assumed type.

Means for solving the problems

In order to achieve the above object, the estimation learning device of claim 1 includes: an input unit which inputs image data from the 1 st image acquisition device; and a learning unit that performs learning using teaching data obtained by annotating the image data to obtain an estimation model, wherein the estimation learning device includes an image processing unit that performs processing corresponding to a difference in image input characteristics on the image data obtained from the 1 st image acquisition device to obtain the teaching data when the estimation model is relearned for a 2 nd image acquisition device having image input characteristics different from the 1 st image acquisition device.

The estimation learning device according to claim 2 is the estimation learning device according to claim 1, wherein the image processing unit processes the 1 st object image data included in the image data obtained from the 1 st image obtaining device so as to be suitable for the 2 nd object image data included in the image data obtained from the 2 nd image obtaining device.

The estimation learning device according to claim 3 is the estimation learning device according to claim 1, wherein the image input characteristics are caused by at least 1 difference in specification and performance of the image sensor, optical characteristics for image capturing, image processing specification and performance, and a type of the illumination light.

The estimation learning device according to claim 4 is the estimation learning device according to claim 1, wherein the image processing unit includes a function of changing a comment on the same image so that image data obtained from the image acquisition device 1 among the teaching data becomes teaching data corresponding to a difference in the image input characteristics.

The estimation learning device according to claim 5 is the estimation learning device according to claim 1, wherein the image data obtained from the 1 st image acquisition device is conventional teaching data, and the image processing unit performs image processing on the conventional teaching data based on characteristics of the image data from the 2 nd image acquisition device.

The estimation learning device according to claim 6 is the estimation learning device according to claim 1, wherein the image data obtained from the 1 st image acquisition device is conventional teaching data, and the image processing unit selects the conventional teaching data based on characteristics of the image data from the 2 nd image acquisition device.

The estimation learning device according to claim 7 is the estimation learning device according to claim 5, wherein the image processing unit processes the image data obtained from the 1 st image acquisition device in the teaching data so as to be suitable for the image data from the 2 nd image acquisition device.

The estimation learning device according to claim 8 is the estimation learning device according to claim 1, wherein the image data from the 2 nd image acquisition device belongs to an unknown class.

The estimation learning device according to claim 9 is the estimation learning device according to claim 8, wherein the determination as to whether the unknown class belongs to is automatically made by artificial intelligence, or the determination as to whether the unknown class belongs to is manually set by a user of the 2 nd image acquisition device.

The estimation learning device according to claim 10 is the estimation learning device according to claim 8, wherein the determination as to whether or not the image belongs to the unknown class is made based on model information of the 2 nd image acquisition device and/or an image estimated as a reference image from image data from the 2 nd image acquisition device.

The estimation learning device according to claim 11 is the estimation learning device according to claim 1, wherein the image data obtained from the image acquisition device 1 is conventional teaching data, and the image processing unit performs image processing on the conventional teaching data or selects the conventional teaching data according to the usage when the usage of the estimation model is different.

The estimation learning device according to claim 12 is the estimation learning device according to any one of the above 1 st to 11 th inventions, wherein the image data from the 1 st image acquisition device and the image data from the 2 nd image acquisition device are endoscopic image data.

In the learning method for estimation according to claim 13, the image data from the 1 st image acquisition device is inputted,

when learning an estimation model for a 2 nd image acquisition device having characteristics different from those of the 1 st image acquisition device, image data obtained from the 1 st image acquisition device among the teaching data is processed to be teaching data, and the estimation model is obtained by learning using the teaching data obtained by annotating the image data.

The estimation learning device of the 14 th invention includes: an input unit that inputs image data from the 1 st image acquisition device; and a learning unit that performs learning using teaching data obtained by annotating the image data to obtain an estimation model, wherein the estimation learning device includes an image processing unit that performs processing including selection or annotation according to a difference in image acquisition characteristics on the image data obtained from the 1 st image acquisition device and uses the processed image data as the teaching data when the estimation model is customized for the 2 nd image acquisition device used under a condition different from that of the 1 st image acquisition device.

In the learning method for estimation according to claim 15, the image data from the 1 st image acquisition device is input,

when an estimation model is customized for a 2 nd image acquisition apparatus used under a condition different from that of the 1 st image acquisition apparatus, image data obtained from the 1 st image acquisition apparatus is processed as the teaching data including selection options or annotations according to differences in image acquisition characteristics, and the estimation model is obtained by learning using the teaching data obtained by annotating the image data.

ADVANTAGEOUS EFFECTS OF INVENTION

According to the present invention, it is possible to provide an estimation learning device and an estimation learning method as follows: in the unknown type, it is possible to perform appropriate estimation even when the characteristics of the data are changed from those of the previously stored data, without being limited to the data of the previously assumed type.

Drawings

Fig. 1 is a block diagram showing a main electrical configuration of a learning estimation device according to an embodiment of the present invention.

Fig. 2 is a diagram showing an example of guidance display using an estimation model in the estimation device for learning according to the embodiment of the present invention.

Fig. 3 is a flowchart showing an operation of generating an estimation model in the estimation device for learning according to the embodiment of the present invention.

Fig. 4 is a flowchart showing the operation of the imaging apparatus cooperating with the estimation learning apparatus according to the embodiment of the present invention.

Fig. 5 is a flowchart showing an operation of generating an estimation model after correction in the estimation device for learning according to the embodiment of the present invention.

Fig. 6 is a diagram for explaining a case where image data different from the conventional one is input to the estimation device for learning according to the embodiment of the present invention.

Fig. 7 is a flowchart showing an operation of determining whether or not AI correction is necessary in the estimation device for learning according to the embodiment of the present invention.

Detailed Description

An estimation learning device according to an embodiment of the present invention collects image data and generates teaching data by annotating the image data. An estimation model is generated using a parent set composed of the teaching data. If, for example, high-quality image data is used for the mother set of teaching data that is the basis in generating the estimation data, there is a possibility that highly reliable estimation cannot be performed when low-quality image data is input to the estimation model. It is assumed that an estimation model for guidance display is generated based on image data acquired by a highly skilled person (skilled person) using an image acquisition device. In this case, even if a person with low skill (a person with low skill) wants to obtain a guidance display for an operation by estimation based on the estimation model when using the image acquisition apparatus, the person may not be able to perform an appropriate guidance display.

In this way, when the characteristics of data that is the basis when generating the estimation model are different from the characteristics of data that is input at the time of actual estimation, it may be impossible to perform estimation with high reliability. In such a case, data having the same degree of characteristics as those of data input at the time of actual estimation may be collected as in the case of the estimation model generation having actual results, but it takes time and cost. In the present embodiment, when an estimation model is generated by inputting data of an unknown type that differs from the actual estimation model generation, such as differences in the used equipment or differences in the skills of the user, the estimation model is generated by processing the data so that the characteristics of the previously accumulated existing teaching data match those of the data of the unknown type.

Here, as for the content described as the classification, the most understandable example is given, and if the estimation model is used for detecting a specific object in an image by acquiring and estimating image data, even if similar images are acquired with respect to image data acquired from image acquisition apparatuses of different specifications, the image data acquired from the image acquisition apparatuses of different specifications is handled as an unknown class due to differences in image quality and the like. In addition, the contents reflected in the acquired image may be different, and the object to be found may be different, and the change pattern of the image may be different due to a difference in the operator or the robot at the time of image acquisition, a difference in the person or the equipment handling the object reflected in the image, and the like.

The data processing described above may be performed by image processing, by comment correction processing, or by affecting the specification of a model to be selected or estimated. The estimation model specification is described because the estimation result expected by an expert and a person other than the expert may differ in consideration of the skill level or the like. However, even in such a case, by using the technique of the present embodiment, it is possible to easily leave precious teaching data used in the generation of a conventional realistic estimation model.

Here, when the data is image data, the processing of the data includes image processing of conventional teaching data. The image processing includes various image processing such as increase and decrease in the number of pixels, change in brightness (luminance value), change in wavelength (color signal), and change in the angle of view. Further, as the processing of data, there is a process of selecting teaching data included in the mother set from existing teaching data. That is, it is possible to exclude inappropriate image data and newly select and add image data. For example, the teaching data also includes test data for determining the completion of the estimation model. Even if the test data is effective in the conventional estimation model generation, the test data may be used for other than the test data when generating an estimation model corresponding to an unknown type. Inclusion of such measures is an option. Further, if there are cases where some kind of work or processing method of the property can be detected based on the image information, for example, teaching data acquired by an operation of a skilled person may be removed and teaching data acquired by an operation of a low skilled person may be added.

Further, in the case of collecting images using a monitoring camera, the image of the same monitoring camera is sometimes used for identifying a criminal (the feature of the face is important), and is sometimes also used for investigation of congestion conditions (the feature of the face is not important, and conversely, from the viewpoint of personal information, it is sometimes preferable not to know the feature of the face). As can be seen from this, depending on the use application, even for the same image, the quality, specification, or processing required for the teaching data may change depending on the required estimation. Similarly, even for medical images, image processing may be different in preventing a lesion from being overlooked and in strict diagnosis. That is, even if the original estimation model can properly estimate the input data, if the application or object of the estimation model is different, the input data is of an unknown type, and different teaching data processing or different learning is required.

In addition, as the unknown class, a use object of the estimation model may be considered. For example, in a medical apparatus for diagnosing cancer, an affected part where cancer occurs or a type of cancer differs depending on data such as a region, a race, a sex, and an age. Further, it is desirable that the regional differences are different in the hospital systems and medical devices, in the doctor's skills, the affiliation, the trend of the patient data, and the like, and the differences in such factors cause the assumed categories to be different and sometimes to belong to unknown categories. Therefore, teaching data for generating an estimation model may be processed by appropriately changing the existing teaching data in consideration of the use of the estimation model.

Hereinafter, an example in which the present invention is applied to a learning system for image estimation will be described as an embodiment of the present invention with reference to the drawings. The learning system for image estimation shown in fig. 1 is composed of a learning device 1 for image estimation and an imaging device 6.

The image estimation learning device 1 may be a stand-alone computer or the like, or may be disposed in a server or the like. When the learning device 1 for image estimation is a standalone computer or the like, the imaging device 6 may be connected by wire or wirelessly. In the case where the image estimation learning device 1 is disposed in a server or the like, the imaging device 6 may be connected via an information communication network such as the internet.

The imaging device 6 may be a device installed in a medical apparatus such as an endoscope to capture an image of an object such as an affected part, a scientific apparatus such as a microscope to capture an image of an object such as a cell, or a digital camera to capture an image of a subject. In any of these devices, the imaging device 6 may be a device having an imaging function as a main function in the present embodiment, or may be a device having an imaging function to perform another main function. Hereinafter, a case where the imaging device 6 is an endoscope and the image acquisition device outputs endoscopic image data will be mainly described.

The imaging device 6 includes an image estimation device 2, an image acquisition device 3, a guide unit 5, and a control unit 7. In addition, an example in which the various devices described above of the imaging device 6 shown in fig. 1 are integrally configured will be described. However, it is needless to say that the communication devices may be separately provided in different apparatuses and connected to each other through an information communication network such as the internet or a dedicated communication network. For example, the image estimation device 2 may be configured separately from the imaging device 6 and connected via the internet or the like. In fig. 1, an operation unit (input interface), a communication unit (communication circuit), a recording unit (for example, recording of image data acquired by the image acquisition device 3), an information acquisition device, and the like, and various members, circuits, devices, and the like for causing the imaging device 6 to function are provided, which are not illustrated.

The image acquisition unit 3 includes various image pickup circuits such as an optical lens, an image pickup device, an image pickup control circuit, and an image pickup signal processing circuit, and acquires and outputs image data of an object. Further, the optical imaging apparatus may include exposure control means (for example, a shutter or a diaphragm) and an exposure control circuit during imaging, and may further include a lens driving device, a focus detection circuit, a focus adjustment circuit, and the like for focusing the optical lens. Further, the optical lens may be a zoom lens.

Either one of the

image acquisition devices

3a and 3b is disposed in the image acquisition device 3. In fig. 1, both the image acquisition device 3a and the image acquisition device 3b are shown because they are substantially the same in function, and therefore, it is assumed that a portion at the subsequent stage can be shared, and as described above, either one of them is mounted on the image pickup device 6, so that the description will be easy. The image acquisition apparatus 3 is used as an image acquisition apparatus representing either one of an image acquisition apparatus 3a (e.g., a reusable endoscope) and an image acquisition apparatus 3b (e.g., a disposable endoscope). That is, the image acquisition apparatus 3 is used separately depending on the region, facility, or case (object), and it is also assumed that the users are different. However, since the same guidance function and the like are assumed to be effectively used, there is a possibility that a system such as an estimation model can be shared. Further, it is assumed that the estimation model itself is customized including a user interface or a required guidance.

In the present embodiment, although it has been described that either one of the image acquisition device 3a and the image acquisition device 3b is disposed in the image acquisition device 3, it does not prevent the image acquisition device 3 from including both the image acquisition device 3a and the image acquisition device 3 b. This is because a plurality of devices may be used depending on the situation. To explain such a case: the image data from the image acquisition device 3a belongs to a known category, and the image data from the image acquisition device 3b belongs to an unknown category. The characteristics of the image data output from the image acquisition device 3a and the image acquisition device 3b are different. The characteristics include image quality, light source, and angle of view. For example, when the number of pixels of the image acquisition device 3b is smaller than that of the image acquisition device 3a or the resolution of the optical lens is low, the image quality of the image is different. Further, even if the user, the object, or the usage environment is different, the type of data is unknown. The estimation model itself is customized in coordination with the category, including also the user interface or required guidance, etc.

The information for discriminating the difference can be acquired by, for example, matching model information of each image processing apparatus (in addition to this, information of peripheral systems such as a light source, treatment instrument information, and the like, which will be described later, may be included, and may be used separately) with a database, and the like. The information may be acquired by transmitting data recorded in a memory built in the device, a system in a use environment of the device, or the like to the image estimation learning device 1, or may be manually input by a user. Instead of using the model information, a symbol or a numerical value indicating the detection performance or the processing performance of the image processing apparatus may be used. Further, the information on the use environment and the information (data) on the object such as the patient can be obtained and determined by communication from each device in the same manner, or manually input information can be obtained and used by communication or the like. The selection and processing of teaching data may be performed based on the difference between the acquired supplementary data. Further, the estimation model to be expected also varies depending on skills, performance, and constraints of a tool or a device to be used, or a person or a robot handling them. Therefore, it is also possible to acquire such information from information recorded in the memory, manual input, sensor information, or the like. In addition to the acquisition of the memory information, if the image data itself or the situation (screen) transition as a moving image is analyzed, it can be determined that the image data is an unknown type that cannot be processed by the assumed estimation model.

The imaging device 6 includes a light source, and when an object irradiated with the light source is imaged, an obtained image also differs depending on the wavelength characteristic or the light distribution characteristic of the light source. In addition, either of the

image acquisition apparatuses

3a and 3b may be capable of observation by a Narrow Band Imaging (NBI) method, and in this case, the characteristics of the image acquisition apparatus 3a and the image acquisition apparatus 3b may be different.

Further, when the focal lengths of the optical systems of the image pickup devices 3 are different, the angle of view is different. In the case of a telephoto lens, an image in which the object is enlarged although it has a narrow angle can be obtained. On the other hand, in the case of a short-focus lens, an image in which the subject is reduced in size although the lens is wide can be obtained. In the case where the optical system is a zoom lens, the image greatly differs depending on the set focal length.

The image acquisition device 3a may have a distance (distribution) detection function 3D (3 Daa). If the 3D is provided, the characteristics of the image acquisition device 3a and the image acquisition device 3b are different at this point. The 3aa such as 3D images an object in three dimensions to acquire three-dimensional image data and the like, but depth information may be acquired by acquiring reflected light, ultrasonic waves and the like in addition to the three-dimensional image. The three-dimensional image data can be used when detecting the position of an object in space, such as the depth of the object from the imaging device 3. For example, in the case where the imaging device 6 is an endoscope, when a doctor inserts the endoscope into the body and operates it, if the imaging unit is 3D, it is possible to grasp the positional relationship between the site in the body and the treatment instrument, and also grasp the three-dimensional shape of the site, and perform three-dimensional display. Further, even if the depth information is not strictly acquired, the depth information can be calculated from the relationship between the size of the background and the size of the object in the near vicinity.

Data such as image data acquired by the image acquisition device 3 (the image acquisition device 3a or the image acquisition device 3 b) and data serving as a candidate set of teaching data are output to the recording unit 4 in the learning device for image estimation 1 and recorded as teaching data a set 4a. In this case, a memory may be provided in the imaging device 6 to store the image data and the like acquired by the image acquisition device 3.

Further, the information acquisition device may be disposed in the image acquisition device 3. The information acquisition device is not limited to obtaining image data, and may obtain information related to the object, for example, information related to a patient, information related to a device for diagnosis or treatment, and the like, connected to an electronic medical record or the like, and obtained from the electronic medical record. For example, when a doctor performs a treatment using an endoscope, the information acquisition device obtains information such as the name, sex, and the like of the patient, and a site in the body into which the endoscope is inserted. The information acquisition device may acquire not only information from an electronic medical record but also sound data at the time of diagnosis or treatment, and may acquire medical data such as body temperature data, blood pressure data, heartbeat data, and the like. These data may be output to the image estimation learning device 1.

In the case of estimating the risk of treatment or the like, the above-described elements can be added to improve the reliability. In the present embodiment, an example of an image is mainly described, but estimation can be performed based on the above numerical data. Further, the same point as the estimation using an image is that teaching data collected on a specific premise needs to be customized in a different environment. Therefore, the method according to the present embodiment is not limited to images, and is an effective solution for general data. As the situation transition based on the moving image is important, the method of the present embodiment can be applied to the time transition of these data by the same consideration method. In the following embodiments, an estimation model in which such a time lapse is considered will be described as an example. The estimation using the still image or the single body data is simpler and thus not illustrated, but should be roughly understood in the above description.

The image estimation device 2 receives the image data and the like acquired by the image acquisition device 3, estimates the data using the estimation model generated by the learning device for image estimation 1, and outputs a guidance display to the guidance unit 5 based on the estimation result. The image estimation device 2 includes an image input unit 2IN, an estimation change unit 2SL, an estimation unit 2AI, and an estimation result output unit 2OUT. In addition, the "estimation model" may be described as including guidance (display or sound) to be output to the user.

The image input unit 2IN inputs image data output from the image acquisition device 3. These data are time-series data composed of a plurality of frames, and are input to the image input unit 2IN at every moment. Further, if necessary, information other than images, such as voice and data obtained by other sensors, may be referred to. The present invention is not limited to image input, and may be a data input unit. The image input to the input unit may be one frame of a continuously acquired image, or a plurality of frames may be processed in a lump. Such learning may be performed on the premise of an estimation engine that performs estimation using a plurality of frames.

The estimation unit 2AI has an estimation engine to which an estimation model generated by the learning device 1 for image estimation is set. The estimation engine has a neural network in the same manner as the learning unit 1c described later, and an estimation model is set in the neural network. The estimation unit 2AI inputs the image data input by the image input unit 2IN to the input layer of the estimation engine, and estimates the image data IN the middle layer of the estimation engine. The estimation result is output to the guidance unit 5 by the estimation result output unit 2OUT.

The estimation changing unit 2SL changes the estimation model used in the estimation unit 2 AI. As described above, the characteristics of the image pickup apparatuses 3A and 3B are different. When the estimation model generated based on the data from the imaging acquisition unit 3A is set in the estimation unit 2AI, for example, in other environments, even if the data from the image acquisition device 3B is input to the estimation unit 2AI in anticipation of the function as the estimation model described above, appropriate estimation may not be performed and guidance display may not be performed. When data having different characteristics is input to the image input unit 2IN this manner, the control unit 7 requests generation of a corrected estimation model suitable for the data output from the image acquisition device 3B. The estimation changing unit 2SL changes the estimation model after the correction in the estimation unit 2 AI.

In other words, the corrected estimation model may be learned from the corrected teaching data. That is, even if the data such as image data is the same, when teaching data based on the data is generated, by changing the processing method (processing and correction of the teaching data) or the method of selecting the processing method, estimation models (estimation models after correction) having different specifications and performances can be generated.

The guide unit 5 has a display for display, and displays the image of the object acquired by the image acquisition device 3. Further, guidance display based on the estimation result output by the estimation result output unit 2OUT is performed.

The control Unit 7 is a processor having a CPU (Central Processing Unit) 7a, a memory 7b, and peripheral circuits. The control unit 7 controls each device or each unit in the imaging device 6 according to a program stored in the memory 7 a.

The learning device 1 for image estimation performs machine learning (including deep learning) using the image data acquired by the image acquisition device 3 to generate an estimation model. The image estimation learning device 1 includes an image input unit 1b, a learning unit 1c, an image processing unit 1d, a learning result utilization unit 1e, a teaching data selection unit 1f, and a recording unit 4.

The recording unit 4 is an electrically rewritable nonvolatile memory for recording image data and various information data output from the image acquisition device 3 in the imaging device 6. The various data recorded by the recording unit 4 are output to the image input unit 1b. The recording unit 4 can store the teaching data a group 4a and the teaching data B group 4B. The recording unit 4 may record test data for verifying the strength of the estimated model. Even if the test data itself is not recorded in advance, a part of the teaching data recorded in the recording unit 4 may be extracted and used as the test data.

The teaching data a group 4a is a teaching data group based on time-series data acquired by the image acquisition device 3 a. As described later, the teaching data B group 4B is teaching data generated by processing the teaching data a group 4a already recorded when generating an estimation model for an unknown type. When the characteristics of the image acquisition devices are different, the recording section 4 records the different teaching data sets. The recording unit 4 records both a candidate set of teaching data transmitted from the image acquisition device 3 and a teaching data set to which comments are given, which will be described later. Further, the teaching data not used may be used when teaching data for unknown types is generated, not only the teaching data used by the teaching data selecting section 1f, but also the teaching data not used may be recorded in the recording section 4.

The control unit 1a generates a teaching data set by giving a comment to the time-series data (transmitted to the learning device for image estimation 1 in S35 of fig. 4 described later) acquired by the image acquisition device 3. For example, fig. 2 described later shows a case where bleeding occurs when the endoscope is inserted into the body, and the bleeding is enlarged in fig. 2 (a) and reduced in fig. 2 (b). In such a case, teaching data can be generated by noting how bleeding changes for the time series data ID1 and ID 2. The annotation herein may be performed automatically or manually as needed. In addition, the customization may be performed automatically, taking into account or reflecting the result of the manual annotation. In the case of automatic annotation, the examination may be performed manually, and a step of prompting the user to perform the examination again may be added in accordance with the examination result.

Note that the "processing and correction of teaching data" and "selection of teaching data" may be changed when a comment is given. This is because, for example, even if the same bleeding as shown in fig. 2 is used, the degree of remedy for the condition after bleeding is changed depending on whether or not there is a treatment instrument or a person or skill (so that such information can be obtained) that can be immediately dealt with at the time of bleeding. That is, even if teaching data obtained in a system in which the system including the skill or the instrument class is a universal system and an image is judged to be annotated as "no bleeding", it is preferable to give the annotation by a more strict judgment when generating estimation guidance for guidance in a system in which the skill or the instrument class is inferior.

Information such as information on such skills may be classified. The judgment may be performed as information for customization based on a result of manual input of skill information, a result of a record registered in advance, or a past history, or may be performed based on a tendency of acquiring an image. As an example that can be easily understood, in addition to the difference in equipment, the difference in composition, exposure, focus, and the like between a photograph taken by a professional photographer and a photograph taken by a beginner can be known, and skill determination can be made based on this. When the image is a dynamic image, the tendency becomes further strong, and the use habit of the equipment is reflected in the image. There are also methods of simultaneously acquiring sounds and the like and taking them as a reference. Further, the equipment to be used may be determined based on distortion or blur of the image.

That is, the estimation learning device in the present embodiment includes: an input unit which inputs image data from the 1 st image acquisition device; and a learning unit configured to learn based on teaching data obtained by annotating the image data to obtain an estimation model, wherein the estimation learning device includes an image processing unit configured to perform teaching datamation processing (including annotation change) on image data obtained from a 1 st image obtaining device among the teaching data in accordance with a difference in image obtaining characteristics when performing custom learning (or re-custom learning) on the estimation model for a 2 nd image obtaining device having image input characteristics different from the 1 st image obtaining device, and to use the processed teaching data as the teaching data.

For example, since the results of the treatment after bleeding are different between a scalpel with a hemostatic function and a scalpel without the hemostatic function, it is preferable to add the difference to the estimation model. The difference in the specifications or performance of the instrument may be determined based on information input in advance, or may be determined based on the feature of the image of the treatment instrument reflected in the captured result. For example, it is preferable that the treatment image when a treatment instrument having no hemostatic function is used and the treatment image when a treatment instrument having a hemostatic function is used are distinguished, and if learning is performed by combining the differences, teaching data can be generated in a form of so-called processing and correction by using one image as the other image. That is, the estimation model for guidance at the time of treatment without a hemostatic function can be generated by performing processing and correction based on the image having a hemostatic function. Since surgery and the like are different depending on the constitution of an individual, an affected part, and the like, ideal teaching data is not necessarily easily collected, and thus an estimation model with high reliability can be easily produced by such a method.

The image input unit 1b inputs teaching data a group 4a acquired by the image acquisition device 3a and recorded in the recording unit 4. The teaching data group 4a input to the image input section 1b is annotated. The input teaching data a group 4a is output to the learning section 1c and the image processing section 1d. In the learning, it is needless to say that data other than the image acquired by the image acquisition device 3a may be used instead of the image data. When the learning device relearns and generates the corrected estimation model, the teaching data B group 4B obtained by processing the teaching data a group 4a is input to the image input unit 1B. The image input unit 1b functions as an input unit (input interface) for inputting image data from the 1 st image acquisition device (see, for example, S1 and S5 in fig. 3 and S1a and S5a in fig. 5).

The image processing unit 1d processes the teaching data input from the image input unit 1b by an image processing circuit or the like or a program. As described above, the characteristics of the image acquisition apparatuses are different between the

image acquisition apparatuses

3a and 3 b. Therefore, when the teaching data is generated based on the image data acquired by the image acquisition device 3a and the learning unit 1c generates the estimation model, even if the estimation model is generated from the image data acquired by the image acquisition device 3b, appropriate estimation cannot be performed. Then, the image processing unit 1d performs image processing on the image data input by the image input unit 1b as if the image data is converted in the same manner as the image data acquired by the image acquisition device 3A. The image data processed by the image processing unit 1d is output to the learning unit 1c. Further, detailed processing of the image will be described later with reference to fig. 6, and generation of the corrected estimation model will be described later with reference to fig. 5.

The image processing unit 1d functions as an image processing unit (image processing processor) that processes and sets image data obtained from the 1 st image acquisition device as teaching data when the estimation model is relearned for the 2 nd image acquisition device having characteristics different from those of the 1 st image acquisition device (see, for example, S1a to S7a in fig. 5 and (b) in fig. 6). The image input characteristics described above are due to at least 1 difference in the specification and performance of the image sensor, the optical characteristics for image capturing, the image processing specification and performance, and the type of illumination light.

The image processing unit processes the 1 st object image data included in the image data obtained from the 1 st image obtaining device in the teaching data so as to be suitable for the 2 nd object image data included in the image data obtained from the 2 nd image obtaining device (see, for example, S1a to S7a of fig. 5 and (b) of fig. 6). The image processing unit includes a unit for changing the comment on the same image so that the image data obtained from the 1 st image acquisition device in the teaching data is the teaching data corresponding to the difference in the image input characteristics.

Further, the image input device is used in association with some operations, and the content of the image or the obtained image may change depending on changes in the environment, changes in the object, and differences in tools used at the same time. In this case, it is considered that the image input characteristics have changed, and "processing" is performed in accordance with the difference in the image input characteristics. The "processing" according to the difference in the image input characteristics includes not only the type of image processing and the method of correction, but also the content of comments as part of the teaching data, the correction of the trigger timing of the start of estimation and the like with respect to the standard method using the learning result of the teaching data after the processing, and the like. This is because it is assumed that the "processing" process is also performed in accordance with changes in the environment and conditions (including not only the performance, specification, environment, and peripheral system of the apparatus but also peripheral equipment such as objects and accessories to be processed, treatment tools, and operators) of the 2 nd standard image acquisition apparatus that actually uses the estimation model.

Here, the image data obtained from the 1 st image acquisition device is conventional teaching data. That is, the teaching data used by the teaching data selecting section 1f is accumulated in the recording section 4. The image processing unit performs image processing on the conventional teaching data based on the characteristics of the image data from the 2 nd image acquisition device (i.e., the characteristics are different from those of the 1 st image acquisition device) (see, for example, S1a and S5a in fig. 5). The image processing unit performs selection of conventional teaching data based on the characteristics of the image data from the 2 nd image acquisition device (see S13 in fig. 5). The image processing unit performs processing so that the image data obtained from the 1 st image acquisition device in the teaching data is suitable for the image data from the 2 nd image acquisition device (see, for example, S1a, S5a, and S13 in fig. 5).

In addition, when the purpose of the estimation model is different, the image processing unit may perform image processing on the existing teaching data or perform selection or selection of the existing teaching data according to the purpose. When the estimation model is customized for the 2 nd image acquisition device used under a condition different from that of the 1 st image acquisition device, the image processing unit may perform processing including selection of choices or comments according to a difference in image acquisition characteristics on the image data obtained from the 1 st image acquisition device, and use the processed image data as teaching data. For example, when an estimation model is generated using teaching data based on image data acquired by a skilled user operating the imaging device 6, even if the unskilled person uses the estimation model to perform operation guidance, it may be difficult to perform appropriate guidance. In such a case, the image processing unit 1d or the image selecting unit 1f may exclude inappropriate image data, and newly select and add image data, or may appropriately correct an image. Since they may be coordinated, communication may be performed between the image processing unit 1d and the image selecting unit 1 f. The image processing unit may process and select the image data or the teaching data in consideration of the use application of the estimation model. For example, as an estimation model for diagnosing cancer, selection and processing of image data may be performed in consideration of the race, sex, age, and the like.

When the estimation model is generated by learning, the learning unit 1c determines the reliability of the learning result, and in this determination, it is possible to prepare test data and perform determination based on whether or not the output when the test data is input to the estimation model is a positive solution known in advance (see, for example, S11 in fig. 3 and 5). The test data for determination may be selected from the recording unit 4, or when acquired from outside the learning apparatus, the image selecting unit 1f may select appropriate test data and the image processing unit 1d may process the selected test data as necessary. When selecting test data that can be processed appropriately, the image processing unit 1d and the image selecting unit 1f are preferably operated in cooperation. In addition, since the test data verifies the actual performance of the actual device used, it is preferable to select an image actually obtained in the actual device. When the image selecting unit 1f selects an appropriate image and the image processing unit 1d performs processing, the image selecting unit 1f may process the appropriate image.

The learning unit 1c includes an estimation engine as in the estimation unit 2AI, and generates an estimation model. The learning unit 1c generates an estimation model by machine learning such as deep learning using the image data input by the image input unit 1b or the image data processed by the image processing unit 1d. The deep learning is described later. The learning unit 1c functions as a learning unit (learning engine) that obtains an estimation model by learning using teaching data obtained by annotating image data (see, for example, S9 in fig. 3 and 5).

The teaching data selection unit 1f determines the reliability of the estimation model generated in the learning unit 1c, and determines whether or not to adopt the estimation model as teaching data based on the determination result. That is, when the reliability is low, the teaching data used for generating the estimation model is not used, and only the teaching data in the case of high reliability is used. The learning unit 1c finally generates an estimation model from the teaching data used by the teaching data selection unit 1 f. Further, the teaching data employed by the teaching data selection section 1f is recorded in advance as the teaching data a group 4a of the recording section 4. In some cases, the teaching data selection unit 1f may have a memory, and the employed teaching data may be recorded in the memory in advance.

The estimation model generated in the learning unit 1c is output to the learning result utilization unit 1e. The learning result utilization unit 1e transmits the generated estimation model to an estimation engine such as the image estimation unit 2 AI.

Here, the deep learning will be explained. "Deep learning (Deep learning)" is learning in which a process of "machine learning" using a neural network is structured in multiple layers. A "forward propagation type neural network" that transmits information from front to back to make a decision is a representative neural network. The simplest forward propagation type neural network may have 3 layers, that is, an input layer composed of N1 neurons, an intermediate layer composed of N2 neurons given by parameters, and an output layer composed of N3 neurons corresponding to the number of classes to be discriminated. The input layer and the intermediate layer, and the intermediate layer and the output layer are connected by a connection weight, and the intermediate layer and the output layer are biased, whereby a logic gate can be easily formed.

The neural network may have 3 layers if it is simply discriminated, but by setting a plurality of intermediate layers, it is also possible to learn a combination of a plurality of feature amounts in the machine learning process. In recent years, from the viewpoint of time taken for learning, determination accuracy, and energy consumption, it is practical to use 9 to 152 layers. Further, a "convolution type neural network" in which a pattern recognition is strong and a minimum processing is performed by performing a processing called "convolution" of the feature amount of the compressed image may be used. Furthermore, "recurrent neural networks" (fully coupled recurrent neural networks) that process more complex information and flow information bidirectionally in correspondence with information analysis whose meaning changes according to order or sequence may also be utilized.

In order to realize these techniques, a conventionally general-purpose arithmetic processing circuit such as a CPU or an FPGA (Field Programmable Gate Array) may be used. However, since the Processing of the neural network is mostly matrix multiplication, a processor called GPU (Graphic Processing Unit) or Tenser Processing Unit (TPU) dedicated to matrix calculation may be used. In recent years, such Artificial Intelligence (AI) -dedicated hardware "neural Network Processing Unit (NPU)" is also designed to be able to be integrated with other circuits such as a CPU and to be a part of a processing circuit.

Further, as a method of machine learning, for example, a support vector machine and a support vector regression method are also available. The learning here is a method of calculating the weight, filter coefficient, and offset of the identifier, and in addition, there is a method of using logistic regression processing. In the case of having the machine make some determination, the person needs to teach the method of the machine determination. In the present embodiment, a method of deriving the image determination by machine learning is employed, but if the method is a method of deriving the annotation result for teaching data, a rule-based method that adapts to a rule obtained by a human being through an empirical rule and a heuristic rule may be used.

The control Unit 1a is a processor having a CPU (Central Processing Unit) 1aa, a memory 1ab, and peripheral circuits. The control unit 1a controls each unit in the image estimation learning device 1 according to a program stored in the memory 1 ab. For example, the control unit 1a gives a comment to the image data or the like output from the image acquisition device 3 (see S3 and S7 in fig. 3, and S3a and S7a in fig. 5).

Next, a case where a treatment is performed using an endoscope will be described as an example of image collection and an example of guidance display based on the image, with reference to (a) and (b) of fig. 2. The endoscope includes an imaging device 6 shown in fig. 1, and therefore includes an image acquisition device 3, an image estimation device 2, and a guide unit 5.

Fig. 2 (a) shows an example in which bleeding BL occurs in the body and the bleeding is enlarged to become enlarged bleeding BLL when the treatment is performed by the endoscope. The image acquisition device 3 of the endoscope collects image data at predetermined time intervals all the time while the doctor performs the treatment, and the control unit 1a records the image data as a teaching data candidate group in a memory in the imaging device 6. In the example of fig. 2 (a), bleeding occurs at time T =0, and it can be recognized that bleeding is enlarged at time T = T1 a. In this case, the image data ID1 from the time of tracing back 5 seconds from the time T =0 onward is recorded as the bleeding episode image. The image estimation learning device 1 is configured to generate teaching data for bleeding enlargement if it gives a comment indicating that the collected image data ID1 has been subjected to bleeding enlargement BLL after time T = T1 a. In the present embodiment, the image estimation learning device 1 makes a comment (see S3 in fig. 3 and S3a in fig. 6), but the image pickup device 6 may make a comment and transmit the teaching data to which the comment is made to the image estimation learning device 1.

Fig. 2 (b) shows an example in which bleeding occurs in the body when treatment is performed by an endoscope, but the bleeding is reduced thereafter. As in the example of fig. 2 (a), the image acquisition device 3 of the endoscope collects image data at predetermined time intervals all the time while the treatment is being performed, and the control section 1a records the image data as a teaching data candidate group in the memory in the imaging device 6. In the example of fig. 2 (b), bleeding occurs at time T =0, and bleeding shrinkage can be recognized at time T = T1 b. In this case, the image data ID2 from the time point when the time T =0 is traced 5 seconds before is collected as the bleeding-up image. The image estimation learning device 1 gives a comment indicating that the reduced bleeding BLS is obtained after the time T = T1b to the collected image data ID2, and then obtains teaching data for the time of the reduction. In the present embodiment, the image estimation learning device 1 performs annotation (see S7 in fig. 3 and S7a in fig. 6), but the imaging device 6 may perform annotation and the teaching data to which the annotation is applied may be transmitted to the image estimation learning device 1. By analyzing the image acquisition (moving image) that continues in time series in this manner, various effective information can be obtained.

In fig. 2 (a) and (b), time T =0 is a timing at which the user notices bleeding, but behavior or phenomenon that causes bleeding often occurs at a timing before time T = 0. In the present embodiment, when an event (for example, bleeding enlargement, bleeding reduction, or the like) occurs, trigger information is generated, time is traced back from the specific timing, data is collected, and the causal relationship is cleared up. By analyzing the image acquisition (moving image) that continues in time series in this manner, various effective information can be obtained.

By collecting and annotating a large number of examples (a) and (b) of fig. 2, a large amount of teaching data can be generated and can be handled as big data. The learning unit 1c generates an estimation model using a large amount of these teaching data. This estimation model can estimate whether bleeding is enlarged or reduced after a predetermined time has elapsed (in the example of (a) and (b) in fig. 2, at T1a or T1 b) when bleeding occurs at time T = 0.

If such an estimation model is generated and set in the estimation unit 2AI of the imaging device 6, the future can be predicted based on the image acquired by the image acquisition unit 3. That is, when the imaging device 6 recognizes bleeding at a timing at time T =0 at which the time T =1 is not reached, as shown in (a) and (b) of fig. 2, teaching data (or teaching data candidates) obtained based on image data from the timing to a time (T = -5 sec) traced back for a predetermined time is input to the estimation model, and thereby it is possible to predict whether bleeding is enlarged or reduced. When the bleeding is enlarged as a result of the prediction (estimation), the guide unit 5 of the imaging device 6 displays an attention display Ga. On the other hand, if the result of prediction (estimation) is a reduction in bleeding, a guidance Go that does not concern bleeding is displayed.

Next, generation of the estimation model used in (a) and (b) of fig. 2 will be described using a flowchart shown in fig. 3. The CPU1aa of the control unit 1a in the learning device 1 for image estimation realizes this flow in accordance with the program stored in the memory 1 ab.

When the flow of estimation model generation shown in fig. 3 starts, first, a hemorrhage dilated course image is collected (S1). As described above, the imaging device 6 collects, in the continuous images acquired by the image acquisition device 3, images in which the area of the portion bleeding during the period from time T = -5 to T = T1a shown in fig. 2 (a) increases. Specifically, in fig. 2 (a), the control unit 7 performs image analysis of the image data, and when it is determined that bleeding is enlarged, generates trigger information (see S27 in fig. 4), and records the bleeding enlarged image retrospectively (see S29 in fig. 4). The retrospectively recorded image is temporarily recorded in a memory in the imaging device 6. In step S1, the control unit 1a of the learning device for image estimation 1 collects a procedure image at the time of bleeding enlargement from the imaging device 6 or the like, and temporarily stores the procedure image in the recording unit 4.

After the procedure image at the time of bleeding enlargement is collected in step S1, "bleeding enlargement" is annotated to the image data (S3). Here, the control unit 1a applies an annotation of "bleeding enlargement" to each of the collected image data, and records the annotated image data in the recording unit 4 as teaching data a group 4a.

Next, an image of the procedure when the bleeding contracted is collected (S5). As described above, the imaging device 6 collects, from the continuous images acquired by the image acquisition device 3, images in which the area of the portion that bleeds during the period from time T = -5 to T = T1b shown in fig. 2 (b) decreases. Specifically, in the above-described fig. 2 (b), the control unit 7 analyzes the image data, and when it is determined that the bleeding is reduced, a trigger is generated (see S27 in fig. 4), and the bleeding-reduced image is retrospectively recorded (see S29 in fig. 4). The retrospectively recorded image is temporarily recorded in a memory in the imaging device 6. In step S5, the control unit 1a of the learning device for image estimation 1 collects a process image at the time of shrinkage from the imaging device 6 or the like, and temporarily stores the process image in the recording unit 4.

After the process image at the time of the bleeding reduction is collected in step S5, "bleeding reduction" is annotated to the image data (S7). Here, the control unit 1a applies a comment of "bleeding reduction" to each of the collected image data, and records the image data to which the comment is applied in the recording unit 4 as teaching data a group 4a.

In the procedure shown in fig. 3, the bleeding scaled-down image is collected after the bleeding dilation. However, in practice, steps S1 to S7 are selectively executed as appropriate depending on whether bleeding occurs in the image collected by the image acquisition apparatus 3 and whether the range is enlarged or reduced when bleeding occurs.

Next, an estimation model is generated (S9). Here, the teaching data to which the comment generated by the imaging device 6 is applied in steps S3 and S7 is recorded as the teaching data a group 4a, and the teaching data is input to the image input section 1b. The learning unit 1c in the learning device 1 for image estimation generates an estimation model using the teaching data. This estimation model can perform prediction of output "good after second bleeding is enlarged" when an image is input.

When the estimation model is generated, it is determined whether the reliability is OK (S11). Here, the learning unit 1c inputs image data for reliability confirmation, for which the answer is known in advance, to the estimation model, and determines the reliability based on whether or not the output in this case is the same as the answer. In the case where the reliability of the generated estimation model is low, the proportion of answers that are consistent is low.

In the estimation of the prediction of such treatment, it is desirable to reflect the skill of a doctor or the like who performs the treatment, the difference in treatment instruments, and the like in the estimation. However, since teaching data is often used as image data to be collected when generating an estimation model, image data of a procedure of treatment of a good-quality item to be treated by a good-quality doctor can be easily collected. However, it is meaningful to perform guidance display for treatment in a situation where an unskilled person who uses an assumed item is performing, and it is desirable to be able to cope with an assumed case. Further, even when a completely new treatment instrument or the like is marketed, it may be assumed that such a tool is often an inexperienced user. Further, the degree of inexperienced skills varies widely, and in many cases, may be assumed to be out of order. In other words, it is extremely desirable to present highly reliable guidance for unskilled skills by unseen props, and the estimation learning system according to the present embodiment can cope with such a situation.

As described above, by generating an estimation model with high reliability by learning, it is possible to provide an estimation learning device mainly including a learning unit for learning by using teaching data obtained by annotating image data from the 1 st image acquisition device (for example, the image acquisition device 3 a) and obtaining an estimation model by the learning. Further, even when an estimation model for a 2 nd image acquisition apparatus (for example, the image acquisition apparatus 3 b) having different characteristics from the 1 st image acquisition apparatus is generated, teaching data collected for the 1 st image acquisition apparatus can be effectively used.

That is, when learning is performed by effectively using teaching data for a 2 nd image acquisition device having an image input characteristic different from that of the 1 st image acquisition device, the image processing unit may perform image processing in which image data obtained from the 1 st image acquisition device among the teaching data is processed according to a difference in the image acquisition characteristic to be used as the teaching data, and generate an estimation model for the 2 nd image acquisition device having a different image input characteristic. The cause of the difference in image input characteristics may be caused by a difference in the specification and performance of the image sensor, the optical characteristics for image pickup, the image processing specification and performance, and the type of illumination light.

In addition, when there is a difference in the image acquisition apparatuses, it is considered that there is a difference in other apparatuses. For example, an imaging subject in such an environment may naturally exhibit a different appearance. Therefore, the image processing unit may process the image data of the 1 st object included in the image data obtained from the 1 st image obtaining device in the teaching data so as to be suitable for the image data of the 2 nd object included in the image data obtained from the 2 nd image obtaining device.

For example, it is conceivable to generate new teaching data by adding all the features of the images of similar objects detected by the 2 nd image acquisition device to the object reflected in the image obtained by the 1 st image acquisition device. For example, the following examples are given: when learning to present a publication or the like without an image of a zebra, a stripe is marked on the image of the horse and the image is substituted. Although this is only a change in color or pattern, it may be used by correcting a difference in characteristics of other shapes or the like. For example, when teaching data of a treatment section having a rectangular distal end shape is used for a treatment instrument having a circular distal end shape, a configuration in which the distal end of the treatment instrument is relatively circular may be selected as the teaching data, and the learning may be performed using an image whose shape has been changed by correcting a difference in the characteristics of the distal end shape.

However, processing an image may not always satisfy a desired specification (for example, a guidance function matching the skill of the user) although the application range of the object is increased. In this case, not only the image processing but also selection of teaching data and processing (adjustment or change) of the content or method of the comment can be performed. Further, the display method of the estimation result may be processed. Or can be customized as follows: however, in other cases, even if the reliability is low, the operator is observed safely and warned at a timing at which the reliability is improved.

That is, the estimation learning device in the present embodiment includes: an input unit that inputs image data from the 1 st image acquisition device; and a learning unit that obtains an estimation model by learning using teaching data obtained by annotating the image data, and further includes an image processing unit that, when the estimation model is relearned for a 2 nd image acquisition device having image input characteristics different from those of the 1 st image acquisition device, performs processing such as changing of a reliability determination level according to a difference in image input characteristics generated by a user's skill on the image data obtained from the 1 st image acquisition device to obtain the teaching data. The user's skill is known in the speed of coping with a shake or bradykinesia, a change in a specific scene, and the like. Based on the difference in the skill, a difference in image input characteristics (or a difference in temporal image data change) is generated. Here, the difference in the manner in which the image data changes in time is expressed in terms of the upper order as a difference in the image input characteristics.

Further, since the test data used for determining the reliability with respect to the estimation model generated using teaching data generated by correction or processing for the 2 nd image acquisition apparatus is an estimation model for the 2 nd image acquisition apparatus, the data from the 2 nd image acquisition apparatus may be used.

In the learning of the unseen items, it is sufficient to combine a plurality of items having similar shapes to increase the probability, or to perform learning using items having similar shapes to the unseen parts, or to replace the treatment instrument image captured in the conventional teaching data or to perform learning by changing a part of the shape. In order to perform learning in a safety level, methods such as using bleeding timing of conventional treatment instrument teaching data in advance in time or setting bleeding expansion strictly are conceivable.

When learning the skill of an unskilled person, first, a measure for changing the reliability level is described, but in addition to this method, a method for emphasizing the movement fluctuation of the tool obtained from the conventional teaching data or advancing the time shift is considered. It is also possible to use the bleeding timing of the conventional skill teaching data in a similar manner in advance in time and to advance the timing of issuing the guidance.

When the reliability is lower than a predetermined value as a result of the determination in step S11, the teaching data (S13) is selected. When the reliability is low, the teaching data may be selected or rejected to improve the reliability. Then, in this step, the teaching data selection unit 1f removes image data for which there is no causal relationship. For example, teaching data that causes and effects of bleeding enlargement/reduction do not have causal relationships are removed. In this process, an estimation model for estimating the causal relationship may be prepared in advance, and teaching data with low causal relationship may be automatically excluded. Further, the overall conditions of the teaching data may be changed. After the selection of the teaching data is performed, the process returns to step S9, and the estimation model is generated again.

On the other hand, if the determination result in step S11 is that the reliability is verified by preferentially using the data obtained in the assumed system, for example, and the reliability is OK, the estimation model is transmitted (S15). Here, since the generated estimation model satisfies the criterion of reliability, the teaching data selection unit 1f determines the teaching data candidates used at this time as teaching data. Further, the learning result utilization unit 1e transmits the generated estimation model to the imaging device 6. Upon receiving the estimation model, the imaging device 6 sets the estimation model in the estimation unit 2 AI. And after the estimation model is sent, ending the flow of generating the estimation model. Further, if the transmitted estimation model is transmitted together with information such as the specification, it is possible to realize control in which the estimation by the single image, the determination by a plurality of images, the degree of the time difference (frame rate, etc.), and the like are reflected at the time of estimation by the imaging apparatus. Other information may also be processed

In this way, in the present flow, the learning device inputs image data from the image acquisition device 3 (S1, S5), generates teaching data by annotating the image data (S3, S7), and obtains an estimation model by learning using the generated teaching data (S9). In particular, in images obtained continuously in time series from the image acquisition device 3, the image data from a specific timing to a retrospective timing is annotated (S3, S7, S13) to become teaching data (S11, S13). In this way, time-series image data is acquired from a specific timing (for example, bleeding enlargement, bleeding reduction) at which a certain event occurs, within the image data that is always output, and the image data is annotated to become teaching data candidates. An estimation model is generated by learning using the teaching data candidates, and if the reliability of the generated estimation model is high, the teaching data candidates are set as teaching data.

That is, in the present flow, an estimation model is generated using data traced back from a specific timing at which some event has occurred. That is, an estimation model capable of predicting the future can be generated based on the cause of the specific timing, that is, based on the causal relationship. By using this estimation model, it is possible to predict the future without omission even in the case where there is a small behavior or phenomenon that the user does not notice, and it is possible to remind the user of the fact or warn the user in the case where an accident occurs, for example. Further, even if there is a fear that the user notices, it can be notified in a case where it does not progress to a serious step.

The learning device 1 for image estimation in the present flow can collect the teaching data group 4A from a large number of imaging devices 6, and therefore can generate teaching data using an extremely large number of data, and can generate an estimation model with high reliability. In addition, in the present embodiment, when an event occurs, data narrowed down to a range related to the event is collected, and therefore, an estimation model can be generated efficiently.

In the present flow, the learning device for image estimation 1 collects an image data group that can be a candidate for teaching data from the imaging device 6, and makes an annotation such as bleeding enlargement on the image data group (S3, S7 refer). However, these comments may be made by the imaging device 6 to generate a teaching data group, and the learning unit 1c may generate an estimation model using the teaching data group. In this case, the image estimation learning device 1 can omit the process of performing annotation. In this case, the present flow is realized by the control unit 1a in the learning apparatus for image estimation 1 and the control unit 7 in the imaging apparatus 6 in cooperation.

Next, the operation of the imaging device 6 will be described with reference to a flowchart shown in fig. 4. This operation is executed by the control unit 7 in the imaging device 6 controlling each device and each unit in the imaging device 6. An example in which the imaging device 6 is provided in an endoscope device will be described. In this flow, a normal operation such as turning on and off of the power supply is omitted.

When the flow shown in fig. 4 starts, first, image capturing and display are performed (S21). Here, when the image acquisition device 3 acquires image data at predetermined time intervals (determined by the frame rate), the image data is displayed on the basis of the image data in the guidance unit 5. For example, if the imaging device 6 is provided in an endoscope apparatus, an image of the inside of the body acquired by an imaging element provided at the distal end portion of the endoscope is displayed on the guide portion 5. The display is updated at predetermined time intervals determined by the frame rate. The guidance mode may be classified into a beginner, a skilled person, and the like according to the technique described in the present specification, and may be changed according to the user. It is also assumed that the situation changes depending on the object or the usage environment.

Next, it is determined whether AI correction is necessary (S23). The estimation model mounted on the estimation unit 2AI may become inappropriate because the device used (including the imaging device 6A) is changed to the image acquisition device 6B or the version is upgraded and the characteristics of the image data are changed. In addition to these reasons, there are cases where the reasons are not appropriate for other reasons. In such a case, it is preferable to correct the estimation model set in the estimation unit 2 AI. Then, in this step, the control unit 7 determines whether or not the estimation model needs to be corrected.

In the case where the estimation model is not appropriate for the above-described reason such as changing the used device, it is preferable to generate the estimation model using image data from the device. However, in the case of a small amount of data for the device, a considerable amount of data cannot be collected to generate an estimation model. In the present embodiment, the corrected estimation model is generated by processing the image data collected so far. Next, the detailed operation of whether the AI correction is necessary will be described with reference to fig. 7.

When it is determined that the AI requires correction as a result of the determination in step S23, generation of the corrected estimation model is requested and acquired (S25). Here, the imaging device 6 requests the image estimation learning device 1 to generate an estimation model after correction, and acquires the estimation model after the estimation model is generated. When requesting the corrected estimation model, information such as a portion that needs to be corrected may be transmitted. That is, as described above, in the present embodiment, the teaching data that has already been used is processed so as to be applied to a new device or the like, and the corrected estimation model is generated using the processed teaching data. The detailed operation of generating the corrected estimation model will be described later with reference to fig. 5.

When the corrected estimation model is acquired or when the AI correction is not necessary as a result of the determination in step S23, it is next determined whether or not the determination is trigger information (S27). For example, when an event occurs as described with reference to (a) and (b) of fig. 2, for example, when bleeding occurs during a treatment and the bleeding enlarges, trigger information is generated. In this example, the control unit 7 may output the trigger information when analyzing the image data acquired by the image acquisition device 3 and determining that the bleeding is enlarged. The image analysis may be performed by an AI using an estimation model, or trigger information may be output by a doctor manually operating a specific button or the like.

When the trigger information is generated as a result of the determination in step S27, a predetermined time trace record is performed (S29). Here, the image data acquired by the image acquisition device 3 is traced back for a predetermined time and recorded in the memory for storing the image data in the imaging device 6. In general, all image data acquired by the image acquisition device 3 is recorded in a memory in advance, predetermined metadata is given to image data within a predetermined time from a specific timing determined by generation of trigger information, and the image data is temporarily recorded in the teaching data candidate group. If there is no trigger information, the control section 7 may also appropriately cancel the image data candidate group. In the example shown in fig. 2 (a) and (b), the specific timing is a time point at which bleeding is enlarged, and the trace back time is a time from a predetermined time (for example, T = -1 sec) to T = -5 sec. Further, if image data of T =0 to T = T1a is added to the image data group, it is possible to perform learning including the passage of bleeding enlargement. The start point of the trace-back recording may be from the time point at which the trigger information is generated, or may be a time point earlier than the time point at which the trigger information is generated. The retroactive time may be determined appropriately so as to include a range in which the cause of the causal relationship can be found. The timing causing this change depends on reliability, and the reliability decreases as the retrospective time becomes longer. These also appear as image processing.

After the retrospective recording is performed in step S29 or when the trigger information is not present as a result of the determination in step S27, image estimation is performed next (S31). Here, the image data acquired by the image acquisition device 3 is input to the image input unit 2IN of the image estimation device 2, and the estimation unit 2AI performs estimation. When the estimation result output unit 2OUT outputs the estimation result, the guidance unit 5 performs guidance based on the output result. For example, as shown in (a) and (b) of fig. 2, it is possible to estimate at time T = -5sec, display the time at which bleeding is to start after 5 seconds (T = 0), and display Ga or display Go based on the estimation result of whether bleeding is enlarged or reduced when bleeding occurs at time T = 0. In addition, when a plurality of image estimation devices such as the image estimation device 2a are provided in addition to the image estimation device 2, a plurality of estimations can be performed. For example, in addition to use for anticipating bleeding, other anticipation can be performed.

In addition, when estimating an image, the estimation can be supplemented not only by image data but also by the sound of a doctor at the time of diagnosis or treatment. Further, the reliability may be estimated for the device used for diagnosis or treatment, and when the reliability is lower than a predetermined value, a device with high reliability may be recommended. Further, since the treatment instrument used for the treatment may be noisy (may get in the way when viewed on the screen), the image of the part of the treatment instrument may be processed by image estimation.

After the image estimation, it is next determined whether or not teaching data candidates are to be output (S33). Here, the control unit 7 determines whether or not the retrospective recording is performed in step S29. When the retrospective recording is performed, the image data at this time is stored as a teaching data candidate in a memory in the imaging device 6. If the determination result is that the trace back recording is not performed, the process returns to step S21.

If the determination result in step S33 is yes, teaching data candidates are output (S35). Here, the control section 7 outputs the teaching data candidate group stored in the memory in the imaging device 6 to the learning device for image estimation 1. When the image estimation learning device 1 receives the teaching data candidate group, it records it in the recording unit 4 in advance. After the teaching data candidates are output in step S35, the process returns to step S21.

In the present embodiment, the imaging device 6 performs determination of bleeding enlargement or bleeding reduction (see S27 in fig. 4). However, the determination may be performed in the control unit 1a of the learning device for image estimation 1. That is, the expansion/contraction of bleeding can be determined based on a change in the shape or size of the color of blood occupied on the screen, and can be detected based on logic or estimation. The determination of enlargement/reduction may be intentionally changed according to customization of the teaching data. For the beginner, teaching data may be converted into an image with an enlarged comment in consideration of safety without enlargement. This also manifests as image processing.

In addition, an example in which bleeding occurs in the body when the endoscope is used is described with respect to the trigger information in step S27. However, the present embodiment can be applied to other than bleeding. For example, when the body temperature or the body weight can be measured by the wearable sensor, trigger information may be generated when the body temperature rises rapidly, and the body temperature data, the body weight data, or other data (including image data) may be recorded retrospectively. If these data are transmitted as teaching data to the estimation learning device, an estimation model can be generated.

In step S35, the teaching data candidate group generated based on the trace-back record is transmitted to the estimation learning device. The teaching data candidate group in this case may be examined for causal relationships by tracing not only an image data group recorded in the same device (imaging device 6) but also detection data of other devices.

In step S23, it is determined whether or not AI correction is necessary in the imaging device. However, whether or not the AI correction is necessary may be determined by the image estimation learning device 1. When the image processing unit 1d (or the control unit 1 a) detects that the teaching data set input to the image input unit 1b of the learning device for image estimation 1 is different from the characteristics (including the application) of the teaching data set stored in the past, it may be determined that AI correction is necessary.

Further, even when the image is first viewed, teaching data is generated for the image annotation in a normal state to be normal, and learning is performed using the teaching data, whereby it is possible to realize estimation that the abnormality is determined. For example, a determination unit that determines an abnormality such as a lesion, a color, a shape, or the like from an image of the stomach may be provided, or an AI that needs to determine what kind of abnormality is when the abnormality is recognized may be used.

In addition, whether or not the AI needs to be corrected may be determined using the reliability of the "normal" determination (or the reliability of the "abnormal" determination) based on the currently owned AI, and when the determination result is a certain reliability or less, it may be determined that the AI is first seen and the AI is corrected.

Here, as an example of learning for generating an estimation model, causal relationship guidance estimation in an extremely advanced medical field is described as an example. However, the present embodiment is not limited to the medical field, and can be applied to estimation for guidance. In practice, many estimation models that are often used are used to identify what is observed in an image, and such an image detection type is used for various person detection and action detection by a monitoring camera, obstacle detection by a vehicle-mounted camera, and the like. The technique for improving the performance of estimation by eliminating the difference in the state of input image data described in this embodiment is, of course, effective also in detecting the type of authentication.

That is, the image estimation learning device includes: an input unit which inputs image data from the 1 st image acquisition device; and a learning unit that obtains an estimation model by learning using teaching data obtained by annotating the image data, wherein the application range of the learning device for image estimation is widened, and the learning device for image estimation can be used in various fields in relation to useful estimation models, while exceeding various restrictions with high efficiency. However, due to restrictions and the like, it is sometimes difficult to immediately collect useful teaching data. Therefore, the estimation learning device in the present embodiment includes an image processing unit that performs processing including selection of choices or comments according to differences in image acquisition characteristics on the image data obtained from the 1 st image acquisition device among the teaching data and sets the image data as the teaching data when the estimation model is customized for the 2 nd image acquisition device used under conditions different from those of the 1 st image acquisition device (not completely different estimation models are created but the same specification that is expected to have the actual results). With this design, a useful estimation model can be generated even if useful teaching data cannot be collected immediately.

Further, it is not limited to use only the teaching data obtained from the 1 st image acquisition unit or use the teaching data as it is. For example, even if not obtained in the 1 st image acquisition unit, abnormality information of the object or the like published in a paper or reported elsewhere may be used. For example, when information such as a tumor exists as an abnormality of the target object, the teaching data may be corrected by deforming or enlarging/reducing the size of the image of the tumor. If necessary, the teaching data may be changed again by correcting the color, deforming the similar image portion, and the like. If such processing is performed with reference to a situation that may arise here based on the information obtained in the use environment of the 2 nd image acquisition unit, the reliability is further improved.

Next, before describing the flow of the corrected estimation model generation shown in fig. 5, the processing of the image data will be described with reference to fig. 6. Fig. 6 (a) shows a case where bleeding occurs when a treatment is performed by an endoscope and the bleeding expands, as in fig. 2 (a).

Fig. 6 (b) shows a case after the hemorrhage has been dilated, as in the case of fig. 6 (a). In this example, an image acquisition device (for example, the image acquisition device 3 b) having different characteristics from the image acquisition device 3a is used as the image pickup device 6. Since the number of pixels of the image pickup device of the image acquisition apparatus 3b is small, the image data ID3 that can be acquired is greatly different from the image data ID 1. Therefore, in the estimation model generated by accumulating image data having characteristics equivalent to those of the image data ID1, even if the image data as shown in fig. 6 (b) is input, only estimation with low reliability can be performed. Even if the estimation model is generated using the parent set in which the image data ID3 and the image data accumulated before are mixed, only the estimation model with low reliability can be generated.

The characteristics here are characteristics based on the specifications and performance of the image reading apparatus, and are characteristics based on the specifications and performance of peripheral devices such as objects and accessories to be handled here, and associated cooperation devices, and may vary depending on their usage environments. The image input characteristics may be caused by differences in the specifications and performance of the image sensor, the optical characteristics for image pickup, the image processing specifications and performance, and the type of illumination light. Of course, such elements may be changed by the mode setting of the user, and in this case, these elements may be considered.

In the present embodiment, the image processing unit 1d performs processing (correction) of the image data used up to that point on the basis of the difference in the image input characteristics, and adjusts the image data to the same level as that in fig. 6b (see S1a and S5a in fig. 5). Then, the corrected image data is annotated (see S3a and S7a of fig. 5), and an estimation model is generated (see S9 of fig. 5). In addition, when image data is processed (corrected) for teaching data used up to now based on a difference in image input characteristics, if it is not necessary to change the comment of the teaching data, only the image data is processed (corrected).

Next, the operation of the corrected estimation model generation will be described with reference to a flowchart shown in fig. 5. This flow is executed when the imaging device 6 requests the corrected estimation model generation in the learning device for image estimation 1 in step S25 (see fig. 4). This flow is realized by the control unit 1a of the learning apparatus 1 for image estimation controlling each unit in the learning apparatus 1 for image estimation. This flow is an example in which the learning device 1 for image estimation generates a corrected estimation model based on an image in the case where bleeding is enlarged or reduced.

When the flow of the corrected estimation model generation shown in fig. 5 starts, first, a bleeding-dilating-time process image is collected (S1 a). As described with reference to fig. 2 (a), when bleeding is increased at the time of treatment, the image processing unit 1d collects image data at that time from the recording unit or the imaging device 6 in the learning device for image estimation 1, and corrects the image data. In the example shown in fig. 2 (a), image data between T = -5 seconds and T = -1 seconds is collected. Then, the image processing unit 1d corrects the collected image data so as to obtain image data of the same level as the image data ID3 shown in fig. 6 (b) (for example, the image data output from the image acquisition device 3 b).

Further, as described above, since it is desired to obtain a more reliable estimation model in accordance with the customization of the user' S use of the device, environment, object, and the like, it is also possible to perform a process (customization request) of grasping a desired specification in step S1 a. Selection of an image, image correction, and annotation correction are performed in accordance with the customization request, and appropriate teaching data is reconstructed (processed, edited, and processed).

For example, in most cases, the image data acquired from the image acquisition apparatus of the 1 st standard (including factors listed below) rarely matches the image data acquired from the image acquisition apparatus of the 2 nd standard not only with respect to the performance, the standard, the environment, and the peripheral system of the apparatus but also with respect to peripheral equipment, treatment instruments, operators, and the like such as objects, accessories, and the like to be processed therein. Therefore, the following situation is likely to occur: it is difficult to directly use, in the image acquisition apparatus of the 2 nd standard, an estimation model obtained by learning teaching data obtained by annotating image data from the image acquisition apparatus of the 1 st standard and learning the teaching data. Therefore, the estimation learning device of the present embodiment includes an image processing unit that performs processing according to a difference in image acquisition characteristics (including not only the performance, specification, environment, and peripheral system of the device but also peripheral equipment such as an object and an accessory to be processed, a treatment tool, an operator, and the like) of image data obtained from the 1 st image acquisition device among teaching data when an estimation model is customized and learned for a 2 nd (standard) image acquisition device having different image input characteristics from the 1 st (standard) image acquisition device, and sets the processed teaching data as the teaching data. By optimizing the teaching data by the image processing section, an estimation model that can be used in the image acquisition device of the 2 nd standard can be generated.

As an example of the correction in the case of using a treatment instrument or the like, there is a method of: when it can be determined that the shape of the treatment instrument has changed, the teaching data set of the treatment instrument having the closest shape is used to perform geometric conversion of the image, nonlinear conversion such as partial expansion/contraction, and the like, and annotation information indicating the part of the treatment instrument is also converted in accordance with the conversion. When the tip shape is pointed in the geometric transformation, the annotation is weighted in the direction in which bleeding is likely to occur, or is annotated so that the result of the weighting determination is "bleeding". It is also possible to determine what influence the other AI (shape change effect prediction AI) having been learned by using teaching data having different shape differences has on the change in shape, and use a method of reflecting the result.

In many cases, the image processing unit described above is required to use the 1 st object image data (many of them have actual results) included in the image data obtained from the 1 st image acquisition device in the teaching data as efficiently as possible, and therefore, processing is performed so as to be suitable for the 2 nd object image data included in the image data obtained from the 2 nd image acquisition device, based on image acquisition characteristics including not only the performance, specification, environment, and peripheral system of the device but also the object, peripheral equipment such as an accessory, a treatment tool, an operator, and the like to be processed.

The image processing unit 1d (or the image selecting unit 1 f) is configured to issue a warning as soon as possible in consideration of safety, and to preferentially use or collect teaching data of similar factors, depending on the use of an image sensor having poor detection performance or a treatment instrument having poor operability, the proficiency of the user, and the target patient or affected part.

After the image is collected and corrected in step S1a, "bleeding enlargement" and timing are next annotated (S3 a). Here, the control unit 1a uses the image data and the timing of acquiring the image data as teaching data, with the effect of "bleeding enlargement". Specifically, the control unit 1a uses the timing of obtaining the image data and the timing of re-selecting the image data, changing the weighting, making a customization measure to the conventional image data or the image data after processing the conventional image data, and annotating the image data with "bleeding enlargement" or the like as the teaching data. The customization measure may also be expressed in terms of machining. Even if the image is "bleeding-reduced", the image in which the area of the bleeding part is not reduced for a certain time or more is newly annotated as "bleeding-enlarged", and the modification of the image obtained by the treatment of a skilled person into teaching data for a beginner may be referred to as "processing". Note that the teaching data obtained by correcting (processing) the image data and applying the comment may be recorded in the recording unit 4 as the teaching data B group 4B in advance.

Then, an image of the procedure when the bleeding is reduced is collected (S5 a). As described with reference to fig. 2 (b), the bleeding during treatment is reduced, and the image processing unit 1d collects images at this time from the recording unit or the imaging device 6 in the learning device for image estimation 1. In the example shown in fig. 2 (b), images between T = -5 seconds and T = -1 second are collected. Then, the image processing unit 1d corrects the collected image data so that the image data has the same level as the image data ID3 shown in fig. 6 (b) (for example, the image data output from the image acquisition device 3 b).

After the image is collected and corrected in step S5a, "bleeding reduction" and timing are next annotated (S7 a). Here, the control section 1a makes an annotation to the image data about "hemorrhage reduction" and timing of acquiring the image data so as to be used as a teaching data candidate. Note that the teaching data obtained by correcting (processing) the image data and applying the comment may be recorded in the recording unit 4 as the teaching data B group 4B in advance.

After the image data is annotated and the teaching data is generated in steps S3a and 7a, an estimation model is generated in the same manner as in fig. 3 (S9). Here, the learning unit 1c generates an estimation model using the teaching data annotated in steps S3a and S7 a. When the image data ID3S shown in fig. 6 (b) is input, this estimation model can realize the prediction of outputting "good quality second and then bleeding is enlarged".

When the estimation model is generated, it is determined whether the reliability is OK (S11). Here, as in fig. 3, the learning unit 1c inputs image data for reliability confirmation, in which the answer is known in advance, to the estimation model, and determines the reliability based on whether the output in this case is the same as the answer. In the case where the reliability of the generated estimation model is low, the proportion of answers that are consistent is low.

In this step, test data is input, and whether or not an expected estimation result is to be output is determined. The test data is preferably matched with the standard environment and conditions of the 2 nd standard image acquisition apparatus actually using the estimation model (including not only the performance, the standard, the environment, and the peripheral system of the apparatus but also peripheral equipment, treatment instruments, operators, and the like of the object, the accessory, and the like to be processed), and here, it is desirable to preferentially use data obtained under the standard environment and conditions of the 2 nd standard image acquisition apparatus. However, since such data cannot be prepared immediately, the data obtained from the 1 st image obtaining device is processed and used so as to be suitable for the image data of the 2 nd object included in the image data obtained from the 2 nd image obtaining device, based on the difference in image obtaining characteristics including not only the performance, specification, environment, and peripheral system of the device but also the peripheral equipment such as the object, accessory, treatment tool, operator, and the like to be processed. Of course, the determination may be adopted by manually inputting what determination the user wants to make.

When the reliability is lower than the predetermined value as a result of the determination in step S11, the teaching data is selected as shown in fig. 3 (S13). When the reliability is low, the reliability may be improved by selecting or rejecting the teaching data. In this step, image data having no causal relationship is removed. When the selection of the teaching data is accepted or rejected, the process returns to step S9, and the estimation model is generated again.

On the other hand, when the reliability is OK as a result of the determination in step S11, the estimation model is transmitted as in fig. 3 (S15). Here, since the generated estimation model satisfies the criterion of reliability, the teaching data selection unit 1f determines the teaching data candidates used in the estimation as teaching data. The learning result utilization unit 1e also transmits the generated estimation model to the imaging device 6. When the imaging device 6 receives the estimation model, the estimation model is set in the estimation unit 2 AI. After the estimation model is transmitted, the flow of generation of the estimation model is ended.

In this way, in the flow of the corrected estimation model generation shown in fig. 5, image data serving as teaching data is collected (S1 a, S5 a), and the image processing section 1dd corrects (processes) the collected image data (S3 a, S7 a). Since the characteristics of data such as image data to be estimated have changed, correction of the estimation model is requested. Then, in this flow, in order to generate an estimation model corresponding to the characteristics of the new data, correction is performed so that the accumulated data fits the characteristics of the new data. Therefore, the estimation model can be generated quickly and at low cost without recollecting data having new characteristics.

Next, the operation of determining whether or not the AI correction is necessary in step S23 (see fig. 4) will be described with reference to the flowchart shown in fig. 7. This flow is executed by the CPU7a in the imaging apparatus 6 controlling each unit in the imaging apparatus 6 in accordance with the program stored in the memory 7 b.

When the operation of fig. 7 is started, whether or not the AI correction is necessary is first determined whether or not the model information of the image acquisition apparatus is present (S41). Here, the image acquisition device 3 determines whether or not it has model detailed information. The model information includes, for example, the number of pixels, frame rate, resolution, focal length information, distance information from the object, and the like. Further, although it is easy to acquire the model information when the image acquisition device 3 is integrally configured in the imaging device 6, it is also possible to acquire the model information through an information communication network such as the internet and refer to a database or the like as necessary when the model information is configured separately.

Here, for simplicity, differences in specifications and performance of the image acquisition apparatus are illustrated. However, since the customization is actually made in accordance with the situation, environment, object, and the like of the user using the device as described above, in order to obtain an estimation model with higher reliability, it is also possible to make a decision or the like to grasp a desired specification (customization request). For example, even if the model is the same, the same processing as the model information can be performed based on the manual input result, the information recorded in the recording unit, and the like, and the strength of the equipment and the user used in combination, the difference in the object, and the like.

If the determination result in step S41 is that the model information of the image acquisition device is present, a correction method is then acquired from the image quality information DB based on the model information, and the correction method is determined (S43). Here, the control unit 7 determines the correction method performed in steps S1a and S5 a. For example, when the number of pixels of the image pickup device is small, the number of pixels of the acquired image data may be multiplied and divided (thinning out, adding, or the like) according to the pixel ratio. Such processing is also expressed as processing, but a method of processing an image as teaching data may also be expressed as processing.

Here, a method of determining the image quality difference in particular in detail will be described. If the model information is not present as a result of the determination in step S41, it is determined whether or not a reference scene image is present (S45). The reference scene image is an image obtained when an object is photographed in order to determine whether or not characteristics of image data and the like are different. In other words, when determining whether to correct the AI, it is preferable to determine whether the image data used when generating the current estimation model is the same as the image data input at that time. Therefore, it is easy to know if images obtained by photographing the same object are compared. However, it is generally difficult to photograph exactly the same object, and therefore, it is sufficient to photograph similar objects. In the case of an endoscope, even if the device or the subject is different, the images that can be acquired when the device is inserted into the esophagus from the oral cavity become substantially the same image, and therefore the image at that time may be used as the reference scene. As an example other than the endoscope, a blue sky may be a reference for the skill of the camera, and may include a reference map for performance determination, in addition to a white map and a gray map. Even if a special figure is not prepared, if an image with a known character or pattern or a standardized image is captured, a change in the amount of peripheral light, aberration information, and the like can be obtained from a difference from an original shape or the like.

If the result of the determination in step S45 is that the scene is not the reference scene image, the image of the reference scene is estimated (S47). Since there is no image of the reference scene, it is necessary to search for a substitute image from the images acquired by the image acquisition device 3. As the image replacing the reference scene, it is desirable to have a degree similar to that of the reference scene, even if not, to which the characteristics of the image data can be determined to be different by comparing 2 images. For example, in an endoscopic examination, a treatment instrument is sometimes used, and the shape of the treatment instrument is often similar. In this case, an image in which the shape of the treatment instrument is known in the acquired image is estimated as a reference scene. In addition, not only the shape of the treatment instrument, but also the appearance form (appearance position, etc.) of the treatment instrument in the screen may be used as a determination reference when estimating the image of the reference scene. Not only endoscopes but also devices such as microscopes and video cameras often have similar shapes and colors, and therefore reflection of these devices and the like can be determined and compared.

After the image of the reference scene is estimated in step S47, or in the case where the result of the determination in step S45 is that there is an image of the reference scene, it is next determined whether or not a difference from the reference image can be tolerated (S49). As described above, the 2 images are compared, and if the characteristics of the image data are not different, the estimation model does not need to be corrected. Here, it is determined whether or not the characteristics of the image data acquired from the image acquisition device 3 are different to such an extent that the estimation model must be corrected. Further, it is determined whether or not the difference is large even in the same region.

If the result of the determination in step S49 is that the difference from the reference image is within the allowable range, the flow branches to no (S55). If the difference between the image acquired by the image acquisition device 3 and the reference scene is not large at this time, the estimation model may not be corrected, and therefore the process branches to "no" and proceeds to step S27 of fig. 4.

On the other hand, if the difference from the reference image is not within the allowable range as a result of the determination in step S49, the correction method is next decided based on the feature of the image (S51). Since the correction method differs depending on the degree of difference between the image acquired by the image acquisition device 3 and the reference scene at this time, the control unit 1a may determine the correction method depending on the degree of difference or the like. For example, when the number of pixels is different, a method of increasing or decreasing the number of pixels of the stored image so that the number of pixels becomes approximately the same as the number of pixels of the image acquisition apparatus 3 acquired at that time may be determined. In addition to differences in performance of an optical system, an imaging sensor, and image processing, the same method can be applied to differences in frame rate, field angle, and illumination light.

After the correction method is decided in step S43 or S51, the flow branches to yes (S53). Since the image acquired by the image acquisition device 3 at this time is greatly different from the reference scene, the estimation model needs to be corrected, and the process branches to yes and proceeds to step S25 in fig. 4.

In this way, in the flow of whether or not AI correction is necessary as shown in fig. 7, if model information of the image acquisition device is present, correction for correcting the image data is determined according to the model (see S41 and S43). On the other hand, if there is no model information of the image acquisition apparatus, it is determined whether or not correction of the estimation model is necessary using the reference scene image or the image estimated as the reference scene (see S49), and if it is determined that correction is necessary, a correction method is determined based on the feature of the image (S51). In this flow, whether or not correction is necessary, and which correction method to use if correction is necessary, are determined based on model information of the image acquisition apparatus, the reference scene image, and the like. However, since there are various determination elements as to whether or not AI correction is necessary, these pieces of information may be added, and the determination itself may be performed by AI.

As described above, among the customizations in the case where there is a difference in the device use environment, the customization mainly for the difference in the performance, function, and specification of the imaging section, and the processing of the teaching data matching the customization are described here. However, the following system, apparatus, and method can be provided: not only processing (image correction) of image quality and characteristics in accordance with the customization request but also reconstruction (processing, editing, and processing) of appropriate teaching data while performing selection of an image or correction of annotations.

In most cases, the image data from the image acquisition apparatus of the 1 st standard (including the factors listed below) is less in accordance with the image data of the 2 nd standard image acquisition apparatus than in the case of the peripheral equipment, treatment instruments, operators, and the like, such as the object, accessories, and the like to be processed, as well as the performance, the standard, the environment, and the peripheral system of the apparatus. In this case, it is often difficult to directly use an estimation model obtained by learning teaching data obtained by annotating image data from an image acquisition apparatus of the 1 st standard. The present embodiment can solve such a situation.

In order to solve the above-described problem, the present embodiment includes an image processing unit that, when customizing and learning an estimation model for a 2 nd (standard) image acquisition device having an image input characteristic different from that of the 1 st (standard) image acquisition device, performs processing corresponding to a difference in image acquisition characteristics (including not only performance, specification, environment, and peripheral system of the device but also peripheral equipment such as an object and an accessory to be processed, a treatment tool, an operator, and the like) on image data obtained from the 1 st image acquisition device, and uses the processed image data as teaching data. By optimizing the teaching data by the image processing unit, an estimation model corresponding to the 2 nd image acquisition device can be generated. The acquisition of information that forms the basis of the image processing and an example of the processing will be described with reference to fig. 7.

That is, in many cases, it is desirable that the image processing unit uses the 1 st object image data (many of them have actual results) included in the image data obtained from the 1 st image acquisition device in the teaching data as efficiently as possible, and therefore, the image processing unit performs processing so as to be suitable for the image data of the 2 nd object included in the image data obtained from the 2 nd image acquisition device, based on image acquisition characteristics including not only the performance, specification, environment, and peripheral system of the device but also peripheral equipment, such as an object, an accessory, and the like, a treatment tool, an operator, and the like to be processed. By performing such a design, it is possible to optimally detect a specific object from an image in accordance with the performance of the apparatus, for example. The present invention is effective for detecting or segmenting an object in an image, which is an important type of image estimation model.

In order to estimate an accident such as "bleeding-up" or "bleeding-down", it is preferable to perform processing not only for differences in image quality when performing prediction annotation on timing. The image sensor having poor detection performance can be dealt with by the above-described correction (processing) method, but it may be devised to issue a warning as soon as possible in consideration of safety in accordance with the use of a treatment instrument having poor operability, the low skill level of the user, and the target patient or affected part, and to preferentially use or collect teaching data of similar factors as much as possible.

In the above case, the control unit 1a uses the newly selected image data, the image data with the changed weighting, the conventional image data, or the image data after processing thereof as teaching data, with the effect of customizing measures and noting "bleeding enlargement" or the like, and with the timing of acquiring the image data. The customization measure may also be expressed in terms of machining. Even if the image is "bleeding-reduced", an image in which the area of the bleeding part is not reduced for a certain time or more is newly annotated as "bleeding-enlarged", and a change in which teaching data is made to an image obtained by a skilled person's treatment to generate an estimation model for a beginner may be called "processing".

In fig. 7, it is determined whether or not the AI needs to be corrected in the imaging device 6. However, the determination is not limited to the imaging device 6, and may be performed in the learning device for image estimation 1. In this case, information such as model information may be acquired and used when acquiring image data from the imaging device 6. Alternatively, a database such as a reference scene image may be prepared in advance and compared with the image data from the imaging device 6 to determine whether or not the reference scene image is included. Needless to say, the reconstruction (processing, editing, and processing) of the teaching data may be appropriately performed in accordance with manual input and other customization requirements, in addition to the processing (image correction) of the image quality and characteristics, the selection of the image, the correction of the comment, and the like. In the case of the writing such as the AI correction, the correction includes not only the performance, specification, environment, and peripheral system of the apparatus but also peripheral equipment such as an object and an accessory to be processed, a treatment tool, an operator, and the like. In the case of the image estimation engine, since the basis of the input data is an image, the image acquisition characteristics can be considered in a broad sense.

Note that, the case where the AI correction is required may be a case where the data from the image acquisition apparatus 3 belongs to an unknown category, and whether the data belongs to the unknown category may be automatically determined by artificial intelligence, or may be manually set by a user of the 2 nd image acquisition apparatus (for example, the image acquisition apparatus 3 b) to belong to the unknown category. Further, it may be determined whether or not the image belongs to the unknown category based on model information of the 2 nd image acquisition device (for example, the image acquisition device 3 b) and/or an image estimated as a reference image from the image data from the 2 nd image acquisition device.

As described above, in one embodiment of the present invention, image data from the 1 st image acquisition device (for example, see S1a and S5a in fig. 5) is input, and when the estimation model is relearned for the 2 nd image acquisition device having characteristics different from those of the 1 st image acquisition device, the image data obtained from the 1 st image acquisition device among the teaching data is processed to be teaching data (for example, see S3a and S7a in fig. 5), and the estimation model is obtained by learning using the teaching data obtained by annotating the image data (for example, see S9 in fig. 5). Therefore, it is possible to perform appropriate estimation not only on data of a previously assumed type but also on unknown types even when the characteristics of data have been changed for data accumulated until then. That is, even when data other than assumed is processed, an estimation model that can estimate data other than assumed can be generated by processing some of the data accumulated until then.

In one embodiment of the present invention, image data from the 1 st image acquisition device (see, for example, S1a and S5a in fig. 5) is input, and when an estimation model is customized for the 2 nd image acquisition device used under a condition different from that of the 1 st image acquisition device, the image data obtained from the 1 st image acquisition device is processed to include selection options or comments according to differences in image acquisition characteristics to be used as teaching data (see, for example, S3a and S7a in fig. 5), and the estimation model is obtained by learning using the teaching data obtained by annotating the image data. Therefore, not only data of a previously assumed type but also data of an unknown type can be appropriately estimated even when the characteristics of the data have been changed for data accumulated until then. That is, even when data other than assumed is processed, by performing selection or processing of the data from the 1 st image acquisition device accumulated until then, an estimation model that can estimate the data other than assumed can be generated.

Here, "assumed-out data" is written, and the assumed-out data is data from an "assumed-out device" that cannot collect sufficient teaching data, or data from an "assumed-out environment". That is, the "provisional data" is a result of image acquisition characteristics including not only the performance, specification, environment, and peripheral system of the apparatus but also peripheral equipment such as an object and an accessory to be processed, a treatment instrument, an operator, and the like. Accordingly, by performing processing such as selection of teaching data or image processing in accordance with factors other than those assumed, it is possible to effectively utilize known data to the maximum extent, expand the expected area to be solved for AI, reduce restrictions on equipment and users, and open up a safe and secure world.

As described above, according to one embodiment of the present invention, precious teaching data can be processed and used according to the situation. Therefore, a system capable of immediately coping with a necessary situation can be constructed, and a society which makes a commitment to the safety and reassurance of people can be realized by using the advanced AI in various situations in the world. Of course, the system can assist the trouble-free output even in the use and entertainment of consumers, and can be effectively used for supporting high-quality contents and creatures. In this way, various data and high-quality data that are supported and opened with the aid of AI also become effective teaching data, and support the realization of such a world is possible.

In the embodiment of the present invention, the imaging device 6 transmits only the image data acquired by the image acquisition device 6 to the learning device for image estimation 1, but teaching data may be generated by making a comment in the imaging device 6 and transmitting the teaching data to the learning device for image estimation 1. In this case, when the AI needs to be corrected, the teaching data may be processed by the image estimation learning device 1 to correct the estimation model. The imaging device 6 determines whether or not the AI correction is necessary (see S23 in fig. 4), but the imaging device 6 is not limited to this, and the learning device 1 for image estimation may determine whether or not the AI correction is necessary. For example, the learning device 1 for image estimation may analyze image data and the like transmitted from the various imaging devices 6, compare the analyzed data with known data, and perform AI correction when it is determined that characteristics (portions that differ depending on factors including not only the performance, specification, environment, and peripheral system of the device but also peripheral equipment such as objects, accessories, treatment tools, and operators to be processed) are different.

In one embodiment of the present invention, the estimation model is generated by learning using teaching data generated from image data. However, the teaching data is not limited to the image data, and may be generated based on other data, for example, time-series important data such as body temperature and blood pressure.

In addition, in the embodiment of the present invention, the determination based on the logic has been mainly described, but the present invention is not limited thereto, and the determination may be performed by estimation using machine learning. Any of these may be used in the present embodiment. In the determination process, the hybrid determination may be performed by partially utilizing the respective advantages.

In addition, in the embodiment of the present invention, it has been described that the control unit 7 or the control unit 1a is a device including a CPU, a memory, and the like. However, in addition to being configured in the form of software by a CPU and a program, a part or all of each part may be configured by a hardware circuit, or a hardware configuration such as a gate circuit generated based on a program language described by a hardware description language (Verilog), or a hardware configuration using software such as a DSP (Digital Signal Processor) may be used. Of course, they may be appropriately combined.

The control unit is not limited to the CPU, and may be an element that realizes the function as a controller, and the processing of each unit may be performed by 1 or more processors configured as hardware. For example, each unit may be a processor configured as an electronic circuit, or each circuit unit in a processor configured as an integrated circuit such as an FPGA (Field Programmable Gate Array). Alternatively, the processor including 1 or more CPUs may read and execute a computer program recorded in a recording medium to execute the functions of the respective units.

In one embodiment of the present invention, the image estimation learning device 1 has been described as including a control unit 1a, an image input unit 1b, a learning unit 1c, an image processing unit 1d, a learning result utilization unit 1e, a teaching data selection unit 1f, and a recording unit 4. However, these components do not need to be provided in an integrated device, and the above-described components may be distributed if they are connected via a communication network such as the internet, for example. Similarly, the imaging device 6 has been described as having the image estimating unit 2, the image acquiring device 3, and the guiding unit 5. However, these components do not need to be provided in an integrated device, and may be distributed if they are connected via a communication network such as the internet.

In recent years, artificial intelligence capable of collectively determining various judgment criteria is often used, and it is needless to say that improvements such as collectively performing the branches of the flow chart shown here also fall within the scope of the present invention. If the user can input such control to the right or the wrong, the user's preference can be learned and the embodiment shown in the present application can be customized in a direction suitable for the user.

In addition, the control described mainly in the flowchart in the technique described in the present specification can be set by a program in many cases, and may be recorded in a recording medium or a recording unit. The method of recording to the recording medium or the recording unit may be performed at the time of product shipment, may be performed using a distributed recording medium, or may be performed by downloading via the internet.

In addition, although the operation in the present embodiment is described using the flowchart in one embodiment of the present invention, the order of the processing steps may be changed, arbitrary steps may be omitted, steps may be added, and specific processing contents in each step may be changed.

In the operational flows in the claims, the specification, and the drawings, even if the description is made using a language expressing the order of "first", "next", and the like for convenience, the description does not mean that the operations are necessarily performed in the order of the description at a portion not particularly described.

The present invention is not limited to the above embodiments, and the structural elements may be changed and embodied without departing from the scope of the invention in the implementation stage. Further, various inventions can be formed by appropriate combinations of a plurality of constituent elements disclosed in the above embodiments. For example, some of all the components shown in the embodiments may be deleted. Further, the components in the different embodiments may be appropriately combined.

Description of the reference symbols

1 method 8230, a learning device for image estimation 1a method 8230, a control section 1aa 8230, a CPU 1ab 8230, a memory 1B method 8230, an image input section 1c method 8230, a learning section 1D method 8230, an image processing section 1e method 8230, a learning result utilization section 1f 8230, a teaching data selection section 2 method 8230, an image estimation device 2IN 8230, an image input section 2SN 8230, an estimation change section 2AI 8230, an estimation section 2OUT 82308230, a teaching data selection section 2OUT 82308230an estimation result output section 3 8230, an image acquisition device 3a method 8230a, an image acquisition device 3aa 8230, a 3D and the like, 3B method 8230, an image acquisition device 4a method 823030, a recording section 4a method 8230a data group, a teaching data group 4a method 82305 group, and a data group 3a method

Claims

1. A learning device for estimation, comprising:

an input unit which inputs image data from the 1 st image acquisition device; and

a learning unit that obtains an estimation model by learning using teaching data obtained by annotating the image data,

it is characterized in that the preparation method is characterized in that,

the estimation learning device includes an image processing unit that performs processing corresponding to a difference in the image input characteristics on the image data obtained from the 1 st image acquisition device to obtain the teaching data when the estimation model is relearned for a 2 nd image acquisition device having different image input characteristics from the 1 st image acquisition device.

2. The learning device for estimation according to claim 1,

the image processing unit processes the 1 st object image data included in the image data obtained from the 1 st image obtaining device so as to be suitable for the 2 nd object image data included in the image data obtained from the 2 nd image obtaining device.

3. The learning device for estimation according to claim 1,

the image input characteristics are caused by at least 1 difference in the specification, performance, imaging optical characteristics, image processing specification, performance, and kind of illumination light of the imaging sensor.

4. The learning device for estimation according to claim 1,

the image processing unit includes a function of changing the comment on the same image so that the image data obtained from the 1 st image acquisition device in the teaching data becomes teaching data corresponding to a difference in the image input characteristics.

5. The learning device for estimation according to claim 1,

the image data obtained from the 1 st image acquisition device is conventional teaching data,

the image processing unit performs image processing on the conventional teaching data based on characteristics of image data from the 2 nd image acquisition device.

6. The learning device for estimation according to claim 1,

the image data obtained from the 1 st image obtaining device is conventional teaching data,

the image processing unit selects the conventional teaching data based on the characteristics of the image data from the 2 nd image acquisition device.

7. The learning device for estimation according to claim 1,

the image processing unit processes the image data obtained from the 1 st image acquisition device in the teaching data so as to be suitable for the image data obtained from the 2 nd image acquisition device.

8. The learning device for estimation according to claim 1,

the image data from the 2 nd image obtaining apparatus belongs to an unknown category.

9. The learning device for estimation according to claim 8,

whether the image belongs to the unknown category is automatically determined by artificial intelligence, or whether the image belongs to the unknown category is manually set by a user of the 2 nd image acquisition apparatus.

10. The learning device for estimation according to claim 5,

and determining whether or not the image belongs to the unknown class based on model information of the 2 nd image acquisition device and/or an image estimated as a reference image from the image data from the 2 nd image acquisition device.

11. The learning device for estimation according to claim 1,

when the purpose of the estimation model is different, the image processing unit performs image processing on the existing teaching data or performs selection or deletion of the existing teaching data according to the purpose.

12. The learning device for estimation according to any one of claims 1 to 11,

the image data from the 1 st image acquisition device and the image data from the 2 nd image acquisition device are endoscopic image data.

13. A learning method for estimation is characterized in that,

image data from the 1 st image obtaining apparatus is inputted,

when learning an estimation model for a 2 nd image acquisition device having characteristics different from those of the 1 st image acquisition device, teaching data is obtained by processing image data obtained from the 1 st image acquisition device among the teaching data,

an estimation model is obtained by learning using teaching data obtained by annotating the image data.

14. An estimation learning device includes:

an input unit that inputs image data from the 1 st image acquisition device; and

it is characterized in that the preparation method is characterized in that,

the estimation learning device includes an image processing unit that performs processing including selection or comment according to a difference in image acquisition characteristics on image data obtained from the 1 st image acquisition device to obtain the teaching data when the estimation model is customized for the 2 nd image acquisition device used under a condition different from that of the 1 st image acquisition device.

15. A learning method for estimation is characterized in that,

image data from the 1 st image obtaining apparatus is inputted,

when customizing an estimation model for a 2 nd image acquisition apparatus used under a condition different from the 1 st image acquisition apparatus, the teaching data is obtained by performing processing including selection or annotation according to a difference in image acquisition characteristics on image data obtained from the 1 st image acquisition apparatus,

learning is performed by using teaching data obtained by annotating the image data, thereby obtaining an estimation model.