WO2022245191A1

WO2022245191A1 - Method and apparatus for learning image for detecting lesions

Info

Publication number: WO2022245191A1
Application number: PCT/KR2022/007308
Authority: WO
Inventors: Sungwan Kim; Jooyoung Lee; Seung Ho Choi; Sun Young Yang; Seon Hee Lim; Soomin Park; Seunggi PARK; Dan YOON; Byeongsoo KIM; Woo Sang Cho; Jung Chan Lee; Jung Ho Bae; Hyoun-Joong Kong
Original assignee: Endoai Co., Ltd.; Seoul National University R&Db Foundation; Seoul National Universitiy Hospital
Priority date: 2021-05-21
Filing date: 2022-05-23
Publication date: 2022-11-24

Abstract

In accordance with an aspect of the present disclosure, there is provided an image learning method performed by an image learning apparatus for detecting lesions, the image learning method comprising: inputting a captured image for a sessile serrated adenoma to a generative adversarial network (GAN) to obtain a synthesized image for the sessile serrated adenoma as an output of the generative adversarial network; extracting the captured image for the sessile serrated adenoma from a first training dataset that is used when training a convolutional neural network to detect lesions; waiting for a training command after visualizing and providing data distribution for the synthesized image and the captured image; generating a second training dataset including the synthesized image according to the training command; and training the convolutional neural network using the second training dataset.

Description

METHOD AND APPARATUS FOR LEARNING IMAGE FOR DETECTING LESIONS

The present disclosure relates to a device for learning an image on a neural network for detecting lesions, and a method for such a device for learning an image on a neural network.

Colorectal cancer is a cancer with a very high incidence worldwide, and colonoscopy has been widely used for early detection.

In addition, for the detection of colonic lesions, research on a system for automatically detecting colonic lesions using deep learning technology based on computer vision and image processing technology is being actively conducted. When the detection result by such a system for automatically detecting colonic lesions is provided to a clinician, it is possible to increase a detection rate of adenoma by the clinician and help the detection of lesions that are easy to miss.

In view of the above, the present disclosure provides an image learning apparatus that acquires a synthesized image for a sessile serrated adenoma using a generative adversarial network (GAN) and uses the acquired synthesized image when training a convolutional neural network (CNN) to detect lesions, and an image learning method performed by the image learning apparatus.

The technical problems to be achieved by the present disclosure are not limited to the technical problems mentioned above, and other technical problems that are not mentioned may be clearly understood by those with ordinary knowledge in the technical field to which the present disclosure belongs from the following description.

In accordance with another aspect of the present disclosure, there is provided an image learning apparatus for detecting lesions, comprising: a user interface unit configured to provide a user interface for inputting various commands; a neural network model unit configured to include a generative adversarial network and a convolutional neural network; a visualization unit configured to provide visualization information; and a processor.

Wherein the processor inputs a captured image for a sessile serrated adenoma to a generative adversarial network (GAN) to obtain a synthesized image for the sessile serrated adenoma as an output of the generative adversarial network, extracts the captured image for the sessile serrated adenoma from a first training dataset that is used when training a convolutional neural network to detect lesions, visualizes the data distribution for the synthesized image and the captured image, provides the visualized data distribution through the visualization unit, and then waits for a training command, generates a second training dataset including the synthesized image according to the training command through the user interface, and trains the convolutional neural network using the second training dataset.

In accordance with another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform the image learning method of any one of claims 1 to 3 when executed by the processor.

In accordance with another aspect of the present disclosure, there is provided a computer program stored on a non-transitory computer-readable storage medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform the image learning method of any one of claims 1 to 3 when executed by the processor.

According to an embodiment, it is possible to acquire a synthesized image for a sessile serrated adenoma using a generative adversarial neural network, and use the acquired synthesized image when training a convolutional neural network to detect lesions. In particular, by visually providing a data distribution for the acquired synthesized image and the pre-prepared captured image, it is possible to easily check whether the synthesized image is suitable to detect lesions for the sessile serrated adenoma. In addition, it is possible to train the convolutional neural network using a training dataset including a synthesized image according to a training command input after providing visualization information on the data distribution. It is possible to improve the reliability of the training dataset and ultimately improve the reliability of lesions detection by allowing only the synthesized image verified by the user to be included in the training dataset of the convolutional neural network.

FIG. 1 is a configuration diagram of an image learning apparatus according to an embodiment of the present disclosure.

FIG. 2 is a flowchart for describing an image learning method performed by the image learning apparatus according to the embodiment of the present disclosure.

FIGS. 3 and 4 are examples of data distribution maps provided by a visualization unit illustrated in FIG. 1.

The advantages and features of the embodiments and the methods of accomplishing the embodiments will be clearly understood from the following description taken in conjunction with the accompanying drawings. However, embodiments are not limited to those embodiments described, as embodiments may be implemented in various forms. It should be noted that the present embodiments are provided to make a full disclosure and also to allow those skilled in the art to know the full range of the embodiments. Therefore, the embodiments are to be defined only by the scope of the appended claims.

Terms used in the present specification will be briefly described, and the present disclosure will be described in detail.

In terms used in the present disclosure, general terms currently as widely used as possible while considering functions in the present disclosure are used. However, the terms may vary according to the intention or precedent of a technician working in the field, the emergence of new technologies, and the like. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning of the terms will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present disclosure should be defined based on the meaning of the terms and the overall contents of the present disclosure, not just the name of the terms.

When it is described that a part in the overall specification "includes" a certain component, this means that other components may be further included instead of excluding other components unless specifically stated to the contrary.

In addition, a term such as a "unit" or a "portion" used in the specification means a software component or a hardware component such as FPGA or ASIC, and the "unit" or the "portion" performs a certain role. However, the "unit" or the "portion" is not limited to software or hardware. The "portion" or the "unit" may be configured to be in an addressable storage medium, or may be configured to reproduce one or more processors. Thus, as an example, the "unit" or the "portion" includes components (such as software components, object-oriented software components, class components, and task components), processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, database, data structures, tables, arrays, and variables. The functions provided in the components and "unit" may be combined into a smaller number of components and "units" or may be further divided into additional components and "units.

Hereinafter, the embodiment of the present disclosure will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present disclosure. In the drawings, portions not related to the description are omitted in order to clearly describe the present disclosure.

Hereinafter, a term such as a "sessile serrated adenoma" in the embodiment of the present disclosure, may comprises a "sessile serrated polyp".

FIG. 1 is a configuration diagram of an image learning apparatus 100 according to an embodiment of the present disclosure.

Referring to FIG. 1, the image learning apparatus 100 according to the embodiment includes a user interface unit 110, a neural network model unit 120, a visualization unit 130, and a processor unit 140. In addition, the image learning apparatus 100 may further include an input unit 150 and/or a providing unit 160. Here, the neural network model unit 120 and/or the processor unit 140 may include computing means such as a microprocessor.

The user interface unit 110 provides a user interface through which a user may input various commands. For example, the user interface unit 110 may include at least one of a keyboard, a keypad, and a coordinate input device (e.g., a computer mouse) connected to the image learning apparatus 100 including the computing means such as a microprocessor.

The neural network model unit 120 includes a generative adversarial network and a convolutional neural network. As illustrated in FIG. 1, the neural network model unit 120 may be implemented physically separately from the processor unit 140, but may be implemented in the form of one module by combining the neural network model unit 120 and the processor unit 140.

The visualization unit 130 provides visualization information under the control of the processor unit 140. For example, the visualization unit 130 may include a display device or the like capable of outputting the processing result of the processor unit 140 to a screen.

A training dataset that may be used when training the convolutional neural network of the neural network model unit 120 is input to the input unit 150 to detect lesions. Here, the training dataset may include a captured image for a sessile serrated adenoma. Then, the captured image for the sessile serrated adenoma that is input to the generative adversarial network of the neural network model unit 120 and used to generate the synthesized image is input to the input unit 150. For example, the input unit 150 includes a communication module or the like capable of receiving the training dataset and/or the captured image through a serial interface or communication channel that may directly receive the training dataset and/or the captured image from the outside.

The processor unit 140 inputs the captured image for the sessile serrated adenoma to the generative adversarial network of the neural network model unit 120 to acquire the synthesized image for the sessile serrated adenoma as an output of the generative adversarial network. The processor 140 extracts the captured image for the sessile serrated adenoma from the first training dataset that may be used when training the convolutional neural network to detect lesions. The processor unit 140 visualizes a data distribution for the acquired synthesized image and the extracted captured image, provides the data distribution through the visualization unit 130, and then, waits for a training command. In addition, the processor unit 140 generates a second training dataset including the previously acquired synthesized image according to the training command through the user interface unit 110. The processor unit 140 trains the convolutional neural network using the generated second training dataset. Here, when visualizing the data distribution, the processor 140 may use a t-distributed stochastic neighbor embedding (t-SNE) algorithm to reduce a data distribution in a high-dimensional space to a data distribution in a two-dimensional space. In addition, when visualizing the data distribution, the processor unit 140 may input the previously acquired synthesized image and the previously extracted captured image to the t-SNE algorithm with the same number and the same size. Meanwhile, a filtering command, not the training command, may be input through the user interface unit 110. In this case, the processor unit 140 performs clustering based on a comparison result with a relative distance and a preset threshold for the synthesized image and the captured image, and deletes an image outside the clustering area according to the clustering among the synthesized images, thereby filtering the captured image not to be included in the second training dataset.

The providing unit 160 may provide the learned convolutional neural network to the outside under the control of the processor unit 140. For example, the providing unit 160 may include an interface capable of transmitting various data to a peripheral device, and transmit the convolutional neural network to the peripheral device through the interface. Alternatively, the providing unit 160 may include a communication module, and transmit the convolutional neural network to the outside through the communication module.

FIG. 2 is a flowchart for describing an image learning method performed by an image learning apparatus according to an embodiment of the present disclosure, and FIGS. 3 and 4 are examples of a data distribution map provided by the visualization unit illustrated in FIG. 1.

Hereinafter, an image learning method performed by the image learning apparatus 100 according to the embodiment of the present disclosure will be described in detail with reference to FIGS. 1 to 4.

First, a user inputs a training dataset that may be used when training the convolutional neural network of the neural network model unit 120 to detect lesions and inputs the captured image for the sessile serrated adenoma to be used for generation of the synthesized image into the generative adversarial network of the neural network model unit 120 through the input unit 150 of the image learning device 100.

The input unit 150 provides the received captured image for the sessile serrated adenoma to the processor unit 140 of the image learning device 100, and provides the training dataset to the processor 140 and/or the neural network model unit 120.

Next, the processor unit 140 inputs the captured image for the sessile serrated adenoma to the generative adversarial network of the neural network model unit 120 to acquire the synthesized image for the sessile serrated adenoma as the output of the generative adversarial network. Here, when the captured image for the sessile serrated adenoma input through the input unit 150 is in a form suitable for acquiring the synthesized image, the captured image may be input to the generative adversarial network as it is, but the user interface unit 110 may perform a crop process. For example, the processor unit 140 may reproduce the captured image for the sessile serrated adenoma through the visualization unit 130, and the user may command to crop the captured image to an appropriate size by using a coordinate input device (e.g., a mouse for a computer) or the like included in the user interface unit 110, and input the cropped captured image to the generative adversarial network (S210).

The processor 140 extracts the captured image for the sessile serrated adenoma from the first training dataset that may be used when training the convolutional neural network to detect lesions. Here, in the first training dataset, not only the captured image for the sessile serrated adenoma, but also the captured image and label for various types of polyps may constitute a dataset, and the processor unit 140 may extract the captured image for the sessile serrated adenoma based on the identification information of the label (S220).

Subsequently, the processor 140 determines the data distribution for the synthesized image acquired in step S210 and the captured image extracted in step S220. For example, the processor 140 may reduce the data distribution in the high-dimensional space to the data distribution in the two-dimensional space by using the t-SNE algorithm. In addition, the processor unit 140 may input the synthesized image acquired in step S210 and the captured image extracted in step S220 to the t-SNE algorithm with the same number and the same size. For example, the processor unit 140 may extract the captured image for a total of 200 sessile serrated adenomas from the first training dataset when the number of synthesized images acquired in step S210 is 200 in total. In addition, when the size of the captured image extracted in step S220 is larger than the size of the synthesized image acquired in step S210, the processor unit 140 may crop the captured image to the size of the synthesized image and input the cropped captured image to the t-SNE algorithm.

The processor unit 140 generates visualization data for the data distribution identified for the synthesized image and the captured image, provides the visualization information on the data distribution through the visualization unit 130, and then waits for the input command through the user interface unit 110 (S230).

FIGS. 3 and 4 are examples of data distribution maps provided by a visualization unit illustrated in FIG. 1. It can be seen from the example of FIG. 3 that the captured image and the synthesized image are located within a single clustered area. In the example of FIG. 4, it can be seen that a part of the synthesized image is located within the same cluster area as the captured image, but the remaining part of the synthesized image is located within a separate cluster area unlike the captured image. FIG. 4 illustrates a case in which the captured image for the sessile serrated adenoma received through the input unit 150 before step S210 is not actually the captured image for the sessile serrated adenoma, or a case in which the generative adversarial network used in the process of acquiring the synthesized image through step S210 is trained in an incorrect direction.

Meanwhile, the user may check the visualization information on the data distribution provided through the visualization unit 130 in step S230, and input the training command or the filtering command through the user interface unit 110. For example, in step S230, the training command may be input when the data distribution map as illustrated in FIG. 3 is provided, and in step S230, the filtering command may be input when the data distribution map as illustrated in FIG. 4 is provided.

When the filtering command is input through the user interface unit 110, the processor unit 140 extracts an image out of the same clustering area from a result of clustering with the captured image among the synthesized images, and deletes the extracted image. Subsequently, the processor unit 140 may provide the updated data distribution map through the visualization unit 130 by performing step S230 again. Alternatively, the processor unit 140 may wait for the next command or terminate the image learning process after guiding the error occurrence situation through the visualization unit 130 (S250).

When the training command is input through the user interface unit 110, the processor unit 140 generates the second training dataset including the synthesized image acquired in step S210. For example, the processor unit 140 may generate a dataset in which the synthesized image acquired through step S210 is an input and the identification information of the sessile serrated adenoma is a label, and generate and generate a new second training dataset by merging the dataset with a pre-prepared first training dataset to be used when training the convolutional neural network to detect lesions (S260).

The processor unit 140 trains the convolutional neural network of the neural network model unit 120 using the second training dataset generated in step S260 (S270).

Thereafter, the processor unit 140 may control the providing unit 160 to externally provide the learned convolutional neural network through step S270. For example, the processor unit 140 may transmit the learned convolutional neural network to a peripheral device through the interface of the providing unit 160 or may transmit the learned convolutional neural network to the outside through the communication module of the providing unit 160.

Meanwhile, each step included in the image learning method performed by the image learning apparatus 100 according to the above-described embodiment may be implemented in a computer-readable recording medium for recording a computer program including instructions for performing these steps.

According to an embodiment as described above, it is possible to acquire the synthesized image for the sessile serrated adenoma using the generative adversarial neural network, and use the acquired synthesized image when learning the convolutional neural network to detect lesions. In particular, by visually providing the data distribution for the acquired synthesized image and the pre-prepared captured image, it is possible to easily check whether the synthesized image is suitable for the detect lesions for the sessile serrated adenoma. In addition, it is possible to train the convolutional neural network using the training dataset including the synthesized image according to the training command input after providing the visualization information on the data distribution. It is possible to improve the reliability of the training dataset and ultimately improving the reliability of the lesions detection by allowing only the synthesized image verified by the user to be included in the training dataset of the convolutional neural network.

Combinations of steps in each flowchart attached to the present disclosure may be executed by computer program instructions. Since the computer program instructions can be mounted on a processor of a general-purpose computer, a special purpose computer, or other programmable data processing equipment, the instructions executed by the processor of the computer or other programmable data processing equipment create a means for performing the functions described in each step of the flowchart. The computer program instructions can also be stored on a computer-usable or computer-readable storage medium which can be directed to a computer or other programmable data processing equipment to implement a function in a specific manner. Accordingly, the instructions stored on the computer-usable or computer-readable recording medium can also produce an article of manufacture containing an instruction means which performs the functions described in each step of the flowchart. The computer program instructions can also be mounted on a computer or other programmable data processing equipment. Accordingly, a series of operational steps are performed on a computer or other programmable data processing equipment to create a computer-executable process, and it is also possible for instructions to perform a computer or other programmable data processing equipment to provide steps for performing the functions described in each step of the flowchart.

In addition, each step may represent a module, a segment, or a portion of codes which contains one or more executable instructions for executing the specified logical function(s). It should also be noted that in some alternative embodiments, the functions mentioned in the steps may occur out of order. For example, two steps illustrated in succession may in fact be performed substantially simultaneously, or the steps may sometimes be performed in a reverse order depending on the corresponding function.

The above description is merely illustrative of the technical idea of the present disclosure, and various modifications and variations can be made by those skilled in the art to which the present disclosure pertains without departing from the essential quality of the present disclosure. Therefore, the embodiments disclosed herein are not intended to limit the technical spirit of the present disclosure, but to illustrate it, and the scope of the technical spirit of the present disclosure is not limited by these embodiments. The protection scope of the present disclosure should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present disclosure.

Claims

An image learning method performed by an image learning apparatus for detecting lesions, the image learning method comprising:

inputting a captured image for a sessile serrated adenoma to a generative adversarial network (GAN) to obtain a synthesized image for the sessile serrated adenoma as an output of the generative adversarial network;

extracting the captured image for the sessile serrated adenoma from a first training dataset that is used when training a convolutional neural network to detect lesions;

waiting for a training command after visualizing and providing data distribution for the synthesized image and the captured image;

generating a second training dataset including the synthesized image according to the training command; and

training the convolutional neural network using the second training dataset.
The image learning method of claim 1, wherein, when visualizing the data distribution, a t-distributed stochastic neighbor embedding (t-SNE) algorithm is used to reduce a data distribution in a high-dimensional space to a data distribution in a two-dimensional space.
The image learning method of claim 2, wherein, when visualizing the data distribution, the synthesized image and the captured image are input to the t-SNE algorithm with the same number and the same size.
An image learning apparatus for detecting lesions, comprising:

a user interface unit configured to provide a user interface for inputting various commands;

a neural network model unit configured to include a generative adversarial network and a convolutional neural network;

a visualization unit configured to provide visualization information; and

a processor,

wherein the processor inputs a captured image for a sessile serrated adenoma to a generative adversarial network (GAN) to obtain a synthesized image for the sessile serrated adenoma as an output of the generative adversarial network,

extracts the captured image for the sessile serrated adenoma from a first training dataset that is used when training a convolutional neural network to detect lesions,

visualizes the data distribution for the synthesized image and the captured image, provides the visualized data distribution through the visualization unit, and then waits for a training command,

generates a second training dataset including the synthesized image according to the training command through the user interface, and

trains the convolutional neural network using the second training dataset.
The image learning method of claim 4, wherein, when visualizing the data distribution, a t-SNE algorithm is used to reduce a data distribution in a high-dimensional space to a data distribution in a two-dimensional space.
The image learning method of claim 5, wherein, when visualizing the data distribution, the synthesized image and the captured image are input to the t-SNE algorithm with the same number and the same size.
A non-transitory computer-readable storage medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a image learning, the method comprising:

inputting a captured image for a sessile serrated adenoma to a generative adversarial network (GAN) to obtain a synthesized image for the sessile serrated adenoma as an output of the generative adversarial network;

extracting the captured image for the sessile serrated adenoma from a first training dataset that is used when training a convolutional neural network to detect lesions;

waiting for a training command after visualizing and providing data distribution for the synthesized image and the captured image;

generating a second training dataset including the synthesized image according to the training command; and

training the convolutional neural network using the second training dataset.
The non-transitory computer-readable storage medium of claim 7, wherein, when visualizing the data distribution, a t-distributed stochastic neighbor embedding (t-SNE) algorithm is used to reduce a data distribution in a high-dimensional space to a data distribution in a two-dimensional space.
The non-transitory computer-readable storage medium of claim 8, wherein, when visualizing the data distribution, the synthesized image and the captured image are input to the t-SNE algorithm with the same number and the same size.