WO2020153471A1

WO2020153471A1 - Deduction device, learning model, learning model generation method, and computer program

Info

Publication number: WO2020153471A1
Application number: PCT/JP2020/002491
Authority: WO
Inventors: 慎一郎平岡
Original assignee: 国立大学法人大阪大学
Priority date: 2019-01-24
Filing date: 2020-01-24
Publication date: 2020-07-30
Also published as: JPWO2020153471A1; WO2020152815A1

Abstract

Provided are a deduction device, a learning model, a learning model generation method, and a computer program. The present invention is provided with: an acquisition unit that acquires an oral cavity image obtained by taking an image of the oral cavity of a subject; a deduction unit that, by using a learning model configured to output information about lesions in oral cavity mucosa in response to an input of the oral cavity image, deduces the presence/absence of lesions in the oral cavity mucosa of the subject from the oral cavity image acquired by the acquisition unit; and an output unit that outputs the result of deduction by the deduction unit.

Description

Estimating apparatus, learning model, learning model generation method, and computer program

The present invention relates to an estimation device, a learning model, a learning model generation method, and a computer program.

There are various types of oral mucosal diseases such as oral malignant tumors and stomatitis, but diagnosis is often difficult. In particular, an early stage oral malignant tumor often presents with clinical findings similar to stomatitis, and many non-specialized medical staff overlook it. Regarding oral malignant tumors, many institutions are currently studying causative genes and prognostic factors, and the inventors have also reported pathological prognostic factors for oral malignant tumors (for example, non-patent document) See 1).

However, research in the oral area is delayed compared to other areas, and there is no prospect of establishing a simple diagnostic support system that can be put to practical use.

An object of the present invention is to provide an estimation device capable of estimating lesions in the oral mucosa, a learning model, a learning model generation method, and a computer program.

An estimation device according to one aspect of the present invention is configured to output an acquisition unit that acquires an oral cavity image obtained by imaging the inside of the oral cavity of a subject, and information regarding a lesion in the oral mucosa in response to the input of the oral cavity image. Using the configured learning model, an estimation unit that estimates the presence or absence of a lesion in the oral mucosa of the subject from the oral image acquired by the acquisition unit, and an output unit that outputs the estimation result of the estimation unit ..

A learning model according to one aspect of the present invention includes an input layer to which an oral image obtained by imaging the inside of the oral cavity of a subject is input, an output layer to output information regarding lesions in the oral mucosa, and the oral image and the oral cavity An annotation for an image is used as training data, and an intermediate layer that learns the relationship between the oral image input to the input layer and the information output by the output layer is provided, and the oral image is input to the input layer. , Causing the computer to function so as to output information regarding lesions in the oral mucosa from the output layer.

A learning model generation method according to an aspect of the present invention uses a computer to acquire training data including an oral cavity image obtained by imaging the inside of the oral cavity of a subject, and an annotation for the oral cavity image, and acquires the training data. Based on the training data, the learning model that outputs information about lesions in the oral mucosa in response to the input of the oral image is generated.

A computer program according to an aspect of the present invention causes a computer to acquire an oral cavity image obtained by imaging the inside of the oral cavity of a subject, and output information regarding a lesion in the oral mucosa in response to the input of the oral cavity image. It is a computer program for executing a process of estimating the presence or absence of a lesion in the oral mucosa from the obtained oral image using the configured learning model and outputting the estimation result.

According to the present application, lesions in the oral mucosa can be estimated.

3 is a block diagram illustrating the configuration of the estimation device according to the first embodiment. FIG. It is a schematic diagram which shows an example of an oral cavity image. It is a schematic diagram which shows the structural example of a learning model. 7 is a flowchart illustrating a procedure of processing executed by the estimation device according to the first embodiment. It is a schematic diagram which shows the output example of an estimation apparatus. 9 is a flowchart illustrating a procedure of processing executed by the estimation device 1 according to the second embodiment. FIG. 9 is a block diagram illustrating a configuration of an estimation device according to a third embodiment. It is a schematic diagram which shows an example of extraction. 9 is a flowchart illustrating a procedure of processing executed by the estimation device according to the third embodiment. It is a block diagram explaining the composition of a server apparatus. It is a conceptual diagram which shows an example of an oral cavity image database. It is a flow chart explaining the generation procedure of a learning model. 11 is a block diagram illustrating a configuration of an estimation device according to a fifth embodiment. FIG. It is a schematic diagram which shows an example of an enlarged image. It is a schematic diagram which shows the structural example of a learning model. It is a conceptual diagram which shows an example of the oral cavity image database in Embodiment 5. It is a flow chart explaining the generation procedure of a learning model. It is a flowchart which shows the estimation procedure using a learning model. It is a schematic diagram which shows the example of presentation of an observation site. It is a schematic diagram which shows an example of the interface screen which receives designation|designated of a target site|part.

Hereinafter, the present invention will be specifically described based on the drawings showing the embodiments.
(Embodiment 1)
FIG. 1 is a block diagram illustrating the configuration of the estimation device 1 according to the first embodiment. The estimation device 1 is a computer device installed in a facility such as a hospital, and estimates the presence/absence of a lesion in the oral mucosa of a subject from an oral image obtained by imaging the inside of the oral cavity of the subject. The estimation device 1 provides diagnosis support by presenting an estimation result to a doctor or the like who is a diagnostician.

The estimation device 1 includes an input unit 11, a control unit 12, a storage unit 13, an output unit 14, a communication unit 15, and an operation unit 16.

The input unit 11 includes an input interface for inputting image data of an oral cavity image. The input interface is, for example, an interface that connects an imaging device for imaging the inside of the oral cavity of the subject. The imaging device is a digital camera or a digital video camera, and outputs, for example, image data in which each pixel is represented by RGB gradation values. The input unit 11 acquires image data of an oral cavity image from an imaging device connected to the input interface. The input interface may be an interface for accessing a recording medium in which captured image data is recorded. The input unit 11 acquires the image data relating to the oral cavity image by reading the image data recorded in the recording medium. The image data acquired by the input unit 11 is output to the control unit 12 and stored in the storage unit 13 via the control unit 12.

The control unit 12 includes, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory). The ROM included in the control unit 12 stores a control program or the like for controlling the operation of each hardware unit included in the estimation device 1. The CPU in the control unit 12 executes the control program stored in the ROM and various computer programs stored in the storage unit 13 to be described later, and controls the operation of each part of the hardware, thereby changing the oral image from the lesion on the oral mucosa. Realize the function to estimate the presence or absence of. The RAM used by the control unit 12 temporarily stores data used during execution of the calculation.

The control unit 12 is configured to include a CPU, a ROM, and a RAM, but a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), a DSP (Digital Signal Processor), a quantum processor, a volatile or nonvolatile memory. It may be one or a plurality of arithmetic circuits including the above. Further, the control unit 12 may have functions such as a clock for outputting date and time information, a timer for measuring an elapsed time from giving a measurement start instruction to giving a measurement end instruction, and a counter for counting the number.

The storage unit 13 includes a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), and an EEPROM (Electronically Erasable Programmable Read Only Memory). The storage unit 13 stores a computer program executed by the control unit 12, a learning model 130 used for the process of estimating the presence/absence of a lesion in the oral mucosa, and the like.

The computer program stored in the storage unit 13 includes an estimation processing program P1 for causing the estimation device 1 to perform processing for estimating the presence or absence of a lesion in the oral mucosa from the acquired oral image using the learning model 130.

The computer program stored in the storage unit 13 may be provided by the non-transitory recording medium M1 in which the computer program is readablely recorded. The recording medium M1 is, for example, a portable memory such as a CD-ROM, a USB memory, an SD (Secure Digital) card, a micro SD card, or a compact flash (registered trademark). The control unit 12 reads various programs from the recording medium M1 through the input unit 11, and installs the read various programs in the storage unit 13, for example.

The learning model 130 is a learning model configured to output information regarding lesions in the oral mucosa in response to input of an oral image. The learning model 130 is described by its definition information. The definition information of the learning model 130 includes structural information of the learning model 130, various parameters such as weights and biases between nodes used in the learning model 130, and the like. In the present embodiment, the learning model 130 preliminarily learned by the predetermined learning algorithm is stored in the storage unit 13 using the oral cavity image and the annotation for the oral cavity image as the training data.

The control unit 12 executes the estimation processing program P1 stored in the storage unit 13 and supplies the image data of the oral cavity image to the learning model 130 to acquire the information regarding the lesion from the learning model 130. The control unit 12 estimates the presence or absence of a lesion in the oral mucosa based on the information on the lesion acquired from the learning model 130.

The output unit 14 has an output interface for connecting an output device. An example of the output device is a display device 140 including a liquid crystal panel, an organic EL (Electro-Luminescence) panel, or the like. When outputting the estimation result, the control unit 12 generates display data to be displayed on the display device 140, and outputs the generated display data to the display device 140 through the output unit 14, thereby displaying the estimation result on the display device 140. Let

The communication unit 15 has a communication interface for transmitting and receiving various data. The communication interface included in the communication unit 15 is, for example, a communication interface conforming to the communication standard of LAN (Local Area Network) used in WiFi (registered trademark) or Ethernet (registered trademark). When the data to be transmitted is input from the control unit 12, the communication unit 15 transmits the data to be transmitted to the designated destination. Further, when the communication unit 15 receives the data transmitted from the external device, the communication unit 15 outputs the received data to the control unit 12.

The operation unit 16 includes input interfaces such as various operation buttons, switches, and touch panels, and receives various operation information and setting information. The control unit 12 performs appropriate control based on the operation information input from the operation unit 16, and stores the setting information in the storage unit 13 as necessary.

Next, the oral cavity image input to the estimation device 1 will be described.
FIG. 2 is a schematic diagram showing an example of an oral cavity image. The oral cavity image in the present embodiment is an image obtained by capturing the inside of the oral cavity of the subject with an imaging device. In the example of FIG. 2, an oral image captured so as to include the left side surface of the subject's tongue is shown.

In the present embodiment, it is sufficient that at least part of the oral mucosa is included in the oral image. Oral mucosa includes at least a portion of the subject's tongue, upper lip, hard palate, soft palate, uvula, tonsils, buccal mucosa, floor of the mouth, gums, and lower lip. The oral cavity image may include teeth other than the oral mucosa such as the subject's teeth, the imager's fingers, and other structures.

Next, the learning model 130 used in the estimation device 1 will be described.
FIG. 3 is a schematic diagram showing a configuration example of the learning model 130. The learning model 130 is, for example, a learning model based on CNN (Convolutional Neural Networks), and includes an input layer 131, an intermediate layer 132, and an output layer 133. The learning model 130 is learned in advance so as to output information regarding lesions in the oral mucosa in response to the input of the oral cavity image.

Image data of an oral cavity image is input to the input layer 131. The image data of the oral cavity image input to the input layer 131 is sent to the intermediate layer 132.

The intermediate layer 132 is composed of, for example, a convolutional layer 132a, a pooling layer 132b, and a total coupling layer 132c. A plurality of convolutional layers 132a and pooling layers 132b may be alternately provided. The convolutional layer 132a and the pooling layer 132b extract the features of the oral cavity image input through the input layer 131 by the calculation using the nodes of each layer. The fully-combined layer 132c combines the data whose feature portions have been extracted by the convolutional layer 132a and the pooling layer 132b into one node, and outputs the feature variable transformed by the activation function. The feature variable is output to the output layer 133 through the full connection layer 132c.

The output layer 133 includes one or a plurality of nodes. The output layer 133 converts into a probability using a softmax function based on the feature variable input from the fully connected layer 132c of the intermediate layer 132, and outputs the probability that the oral cavity image falls into each category from each node. That is, in the present embodiment, the probability that the oral cavity image belongs to each category is output as the information regarding the lesion. The categories for classifying oral images are oral malignancies, precancerous lesions, benign tumors, traumatic ulcers, inflammatory diseases, viral diseases, fungal infections, autoimmune diseases, stomatitis, cheilitis, decubitus ulcers, tongue surface It can be arbitrarily set to include an organic change in the mucous membrane or at least one lesion belonging to graft-versus-host disease. For example, the categories to be classified are oral cancer and oral sarcoma belonging to oral malignant tumors, leukoplakia belonging to precancerous lesions, erythema and lichen planus, gingivitis belonging to inflammatory diseases, periodontitis, jaw inflammation, Mandibular osteomyelitis and drug-induced osteonecrosis of the jaw, herpes belonging to viral diseases, herpes zoster, herpangina, and hand-foot-and-mouth disease, oral candidiasis belonging to fungal infections, pemphigus belonging to autoimmune diseases, pemphigus vulgaris, and Behcet Disease, and at least one of a cartilaginous tongue, a grooved tongue, a black-haired tongue, and a rhomboid glossum belonging to the organic change of the surface mucous membrane of the tongue. Further, the categories to be classified may include pigmentation that does not belong to the lesion, and may include normal state that does not belong to the lesion and pigmentation.

The example in FIG. 3 shows a learning model 130 in which n categories are set as categories for classifying oral cavity images. The learning model 130 is configured to output from each node of the output layer 133 a probability X1 of oral malignancy, a probability X2 of leukoplakia 2, a probability X3 of lichen planus,..., A probability Xn of normal. Has been done. The number of categories (=n) to be set may be one or more.

The control unit 12 of the estimation device 1 acquires the probability of each lesion set as a category to be classified from the output layer 133 of the learning model 130, and estimates the presence/absence of a lesion in the oral mucosa based on the acquired probability. To do. For example, when only the probability X1 of being an oral malignancy exceeds a threshold value (for example, 80%), the control unit 12 can estimate that a malignant tumor has developed in the oral cavity of the subject. The same applies when any one of the probabilities X2, X3,..., Xn−1 exceeds the threshold value. On the other hand, if none of the probabilities X1, X2,..., Xn−1 exceeds the threshold value, or if the probability Xn indicating that the probability is normal exceeds the threshold value, the control unit 12 causes the lesion in the oral cavity of the subject. Can be inferred to not exist.

Although the CNN learning model 130 is shown in the example of FIG. 3, a machine learning model for constructing the learning model 130 can be set arbitrarily. For example, instead of CNN, a learning model based on R-CNN (Region-based CNN), YOLO (You Only Look Once), SSD (Single Shot Detector), or the like may be set. Further, information on the region estimated to be a lesion may be output using an algorithm such as U-Net.

FIG. 4 is a flowchart illustrating a procedure of processing executed by the estimation device 1 according to the first embodiment. The control unit 12 of the estimation device 1 executes the estimation process program P1 stored in the storage unit 13 to execute the following estimation process.

The control unit 12 acquires an oral cavity image through the input unit 11 (step S101), and applies the acquired oral cavity image to the input layer 131 of the learning model 130 to execute the calculation using the learning model 130 (step S102). .. The image data of the oral cavity image given to the input layer 131 of the learning model 130 is sent to the intermediate layer 132. In the intermediate layer 132, an operation using an activation function including weights and biases between nodes is executed. Image characteristics are extracted in the convolutional layer 132a and the pooling layer 132b of the intermediate layer 132. The data of the characteristic part extracted by the convolutional layer 132a and the pooling layer 132b is combined with each node which the total connection layer 132c comprises, and converted into a characteristic variable by an activation function. The converted feature variable is output to the output layer 133 through the fully connected layer 132c. The output layer 133 converts the feature variables input from the fully connected layer 132c of the intermediate layer 132 into probabilities using a softmax function, and outputs the probabilities belonging to each category from each node.

The control unit 12 acquires a calculation result from the learning model 130 and estimates the presence or absence of a lesion in the oral mucosa based on the acquired calculation result (step S103). As described above, the probability of each lesion set as a category to be classified is output from each node forming the output layer 133 of the learning model 130. The control unit 12 can estimate the presence or absence of a lesion based on the probability output from each node of the output layer 133.

The control unit 12 outputs the estimation result through the output unit 14 (step S104). Specifically, the control unit 12 causes the display device 140 to display the estimation result by generating display data for displaying the estimation result on the display device 140 and outputting the generated display data to the display device 140. .. The display mode of the estimation result can be set arbitrarily. For example, the control unit 12 generates display data including a character or a graphic representing the presence or absence of a specific lesion (for example, oral malignant tumor) and outputs the display data to the display device 140, and the display device 140 displays the presence or absence of the specific lesion as a character. Alternatively, it may be displayed by a graphic. In addition, the control unit 12 generates display data including a probability value corresponding to each lesion, outputs the display data to the display device 140, and displays the probability value corresponding to each lesion on the display device 140 as numerical information. Good.

FIG. 5 is a schematic diagram showing an output example of the estimation device 1. In the example of FIG. 5, the subject ID for identifying the subject, the name of the subject, the mouth image used in the estimation process, the probability of being in each category, and the character information indicating the estimation result are displayed on the display device 140. It shows the displayed state. Further, the learning model 130 may be used to specify the position of the lesion in the oral image, and the information on the position of the identified lesion may be displayed together.

As described above, in the present embodiment, the presence or absence of a lesion in the oral mucosa is estimated using the learning model 130 for machine learning including deep learning, and the estimation result is output. Therefore, by using the estimation result for diagnosis support, The possibility of overlooking the lesion can be reduced.

In the present embodiment, the estimation device 1 has been described as a computer device installed in a facility such as a hospital, but the estimation device 1 is a server device that can be accessed by communication from a computer device installed in a facility such as a hospital. Good. In this case, the estimation device 1 acquires an oral cavity image obtained by imaging the inside of the oral cavity of the subject by communication from a computer device such as a hospital, and estimates the presence or absence of a lesion in the oral mucosa based on the acquired oral cavity image. The estimation device 1 transmits the estimation result to a computer device such as a hospital by communication. The computer device displays the estimation result received from the estimation device 1 on the display device to support diagnosis for a doctor or the like.

(Embodiment 2)
In the second embodiment, a configuration will be described in which preprocessing is performed on the oral cavity image before the calculation using the learning model 130 is executed.

FIG. 6 is a flowchart illustrating a procedure of processing executed by the estimation device 1 according to the second embodiment. The control unit 12 of the estimation device 1 executes the estimation process program P1 stored in the storage unit 13 to execute the following estimation process.

The control unit 12 acquires an oral cavity image through the input unit 11 (step S201) and performs preprocessing on the acquired oral cavity image (step S202). The control unit 12 can perform, for example, gamma correction as preprocessing on the oral cavity image. That is, assuming that the brightness value of each pixel before correction is x, the brightness value of each pixel after correction is y, and the gamma value is γ, the control unit 12 determines the brightness value y=x ^{1 /γ} of each pixel after correction. calculate. When an image of the inside of the oral cavity is picked up by the image pickup device, strong light such as strobe light may be emitted with a short emission time. In this case, the entire oral image becomes unnecessarily bright (whitened), and the phenomenon that strobe light is reflected by the mucous membrane in the oral cavity and the gradation of the bright part is lost (whiteout) occurs, It may be difficult to accurately identify the lesion. On the other hand, if the gamma value is set to a small value, such as less than 0.2, the entire image becomes too dark and it becomes difficult to find the lesion. Therefore, the gamma value is preferably set to a value of 0.2 or more and less than 1.0. Note that the control unit 12 may statistically analyze the brightness value of each pixel with respect to the oral image before correction, and set the gamma value based on the median value of the brightness values, for example. Further, the control unit 12 may acquire model information of the imaging device, the strobe, and the like used when imaging the inside of the oral cavity, and set the model information according to the acquired model information. In the latter case, a table that defines the correspondence between the model information of the image pickup device and the gamma value is prepared in the storage unit 13, and the gamma value may be determined by referring to this table.

Also, level correction may be performed as pre-processing for the oral image. If the oral image to be corrected is, for example, an image having a mountain of the histogram in the range of 100 to 240 in the input level of 0 to 255, the shadow point is changed to the inside of the mountain of the histogram (for example, 110). You can make a correction. By such pre-processing, it is possible to correct the gradation value of the image captured as a whole whitish due to the influence of the strobe light, etc., and the corrected oral cavity image can be used to satisfactorily identify the lesion area. Is possible. The highlight point may not be corrected.

The control unit 12 executes the calculation by the learning model 130 by giving the preprocessed oral cavity image to the input layer 131 of the learning model 130 (step S203). That is, the control unit 12 executes the calculation using the weight and the bias in each node of the input layer 131, the intermediate layer 132, and the output layer 133 that configure the learning model 130, as in the first embodiment.

The control unit 12 acquires a calculation result from the learning model 130 and estimates the presence/absence of a lesion in the oral mucosa based on the acquired calculation result (step S204). From each node forming the output layer 133 of the learning model 130, the probability of each lesion set as a category to be classified is output. The control unit 12 estimates the presence or absence of a lesion based on the probability output from each node of the output layer 133.

The control unit 12 outputs the estimation result through the output unit 14 (step S205). Specifically, the control unit 12 causes the display device 140 to display the estimation result by generating display data for displaying the estimation result on the display device 140 and outputting the generated display data to the display device 140. .. The display mode of the estimation result is the same as that in the first embodiment.

Note that, in the present embodiment, the configuration in which gamma correction is performed as the preprocessing of the oral cavity image has been described, but the configuration is not limited to gamma correction, and the luminance correction may be performed according to a preset tone curve. Further, not only the brightness correction but also pre-processing such as contrast correction and saturation correction may be performed.

(Embodiment 3)
In the third embodiment, a configuration in which the estimation device 1 extracts a region corresponding to the oral mucosa from the oral image and estimates the presence or absence of a lesion in the oral mucosa from the oral image of the extracted region.

FIG. 7 is a block diagram illustrating the configuration of the estimation device 1 according to the third embodiment. The estimation device 1 includes an input unit 11, a control unit 12, a storage unit 13, an output unit 14, a communication unit 15, and an operation unit 16. Since these configurations are similar to those in the first embodiment, detailed description thereof will be omitted.

The storage unit 13 stores a region extraction program P2 in addition to the learning model 130 and the estimation processing program P1 described above. The region extraction program P2 is a computer program for causing the estimation device 1 to execute a process of extracting a region corresponding to the oral mucosa of the subject from the oral image. A well-known area extraction algorithm is used for the area extraction program P2.

An example of the area extraction algorithm is the GrabCut algorithm. The GrabCut algorithm learns the distribution of pixel values in the foreground area and background area using a mixed normal distribution model (GMM:GaussianMixtureModel), and calculates statistical values of pixel values for pixels set as unknown areas. Based on the relationship between the foreground region and the background region, the foreground region and the background region are calculated to separate and extract the foreground region and the background region.

The estimation device 1 according to the present embodiment extracts a region corresponding to the oral mucosa of all image regions of the oral image as a foreground region and a region excluding the region corresponding to the oral mucosa of the entire image region as a background region. To extract. Although it is possible that the subject's teeth, the photographer's finger, various appliances, etc. are reflected in the background region, the foreground region is separated from the background region by using a region extraction algorithm such as the GrabCut algorithm. Can be extracted.

In the present embodiment, the region corresponding to the oral mucosa (foreground region) and the other region (background region) are separated, but the region corresponding to a specific oral mucosa (for example, tongue) is set. You may extract as a foreground area|region, and extract the area|region containing other oral mucous membranes as a background area|region.

FIG. 8 is a schematic diagram showing an example of extraction. The example of FIG. 8 shows the result of dividing the entire image region of the oral image into a region (foreground region) corresponding to the oral mucosa of the subject and a region (background region) other than the region. In FIG. 8, the background area is shown as a hatched area. It can be seen that the background region includes a region corresponding to the tooth of the subject and a region outside the oral cavity. The estimation device 1 estimates the presence or absence of a lesion in the oral mucosa by passing the oral image of the region (foreground region) corresponding to the oral mucosa of the subject to the learning model 130.

FIG. 9 is a flowchart illustrating a procedure of processing executed by the estimation device 1 according to the third embodiment. The control unit 12 of the estimation device 1 executes the estimation process program P1 and the region extraction program P2 stored in the storage unit 13 to perform the following estimation process.

The control unit 12 acquires an oral cavity image through the input unit 11 (step S301), and extracts a region corresponding to the oral mucosa from the entire image region of the acquired oral cavity image (step S302). By the process of step S302, the tooth of the subject, the finger of the imager, and the parts corresponding to various appliances are removed from the oral image.

Next, the control unit 12 gives an oral cavity image in which a region corresponding to the oral region is extracted (an image in which a portion corresponding to the tooth of the subject is removed) to the input layer 131 of the learning model 130, thereby learning model. Calculation using 130 is executed (step S303). The data of the oral cavity image given to the input layer 131 of the learning model 130 is sent to the intermediate layer 132. In the intermediate layer 132, an operation using an activation function including weights and biases between nodes is executed. Image features are extracted in the convolutional layer 132a and the pooling layer 132b of the intermediate layer 132. The data of the characteristic part extracted by the convolutional layer 132a and the pooling layer 132b is combined with each node which the total connection layer 132c comprises, and converted into a characteristic variable by an activation function. The converted feature variable is output to the output layer 133 through the fully connected layer 132c. The output layer 133 converts the feature variables input from the fully connected layer 132c of the intermediate layer 132 into probabilities using a softmax function, and outputs the probabilities belonging to each category from each node.

The control unit 12 acquires a calculation result from the learning model 130 and estimates the presence or absence of a lesion in the oral mucosa based on the acquired calculation result (step S304). As described above, the probability of each lesion set as a category to be classified is output from each node forming the output layer 133 of the learning model 130. The control unit 12 can estimate the presence or absence of a lesion based on the probability output from each node of the output layer 133.

The control unit 12 outputs the estimation result through the output unit 14 (step S305). Specifically, the control unit 12 causes the display device 140 to display the estimation result by generating display data for displaying the estimation result on the display device 140 and outputting the generated display data to the display device 140. .. The display mode of the estimation result can be set arbitrarily. For example, the control unit 12 generates display data including a character or a graphic representing the presence or absence of a specific lesion (for example, oral malignant tumor) and outputs the display data to the display device 140, and the display device 140 displays the presence or absence of the specific lesion as a character. Alternatively, it may be displayed by a graphic. In addition, the control unit 12 generates display data including a probability value corresponding to each lesion, outputs the display data to the display device 140, and displays the probability value corresponding to each lesion on the display device 140 as numerical information. Good.

As described above, in the third embodiment, the estimation process can be executed after removing the portion unnecessary for the lesion estimation, so that the estimation accuracy can be improved.

Note that, also in the third embodiment, image processing such as gamma correction may be performed as preprocessing for inputting the oral cavity image to the learning model 130.

(Embodiment 4)
In the fourth embodiment, a method of generating the learning model 130 will be described.

The learning model 130 used in the estimation device 1 is generated, for example, in the server device 2 communicatively connected to the estimation device 1.

FIG. 10 is a block diagram illustrating the configuration of the server device 2. The server device 2 includes a control unit 21, a storage unit 22, an input unit 23, a communication unit 24, an operation unit 25, and a display unit 26.

The control unit 21 includes, for example, a CPU, ROM, RAM and the like. The ROM included in the control unit 21 stores a control program or the like for controlling the operation of each hardware unit included in the server device 2. The CPU in the control unit 21 executes the control program stored in the ROM and various programs stored in the storage unit 22 to control the operation of each unit of the hardware.

The control unit 21 is not limited to the above configuration. The control unit 21 is not limited to the configuration including the CPU, the ROM, and the RAM. The control unit 21 may be, for example, one or more control circuits or arithmetic circuits including a GPU, FPGA, DSP, volatile or non-volatile memory, or the like. Further, the control unit 21 may have functions such as a clock for outputting date and time information, a timer for measuring an elapsed time from giving a measurement start instruction to giving a measurement end instruction, and a counter for counting the number.

The storage unit 22 includes a storage device such as a hard disk drive. The storage unit 22 stores various computer programs executed by the control unit 21, various data used by the computer programs, data acquired from the outside, and the like. An example of the computer program stored in the storage unit 22 is a model generation program P3 for generating a learning model. The storage unit 22 also includes an oral cavity image database (oral cavity image DB) 220 that stores the oral cavity image and the annotation of the oral cavity image in association with each other.

The input unit 23 includes an input interface for acquiring data and programs from a recording medium that records various data or programs. Various data and programs input through the input unit 23 are stored in the storage unit 22.

The communication unit 24 includes a communication interface connected to the communication network N. The communication network N is an internet network, a LAN or WAN (Wide Area Network) for a specific purpose, or the like. The communication unit 24 transmits the data to be transmitted to the estimation device 1 to the estimation device 1 via the communication network N. The communication unit 24 also receives, via the communication network N, the data transmitted from the estimation device 1 with the server device 2 as the destination.

The operation unit 25 has an input interface such as a keyboard and a mouse, and receives various operation information and setting information. The control unit 21 performs appropriate control based on the operation information input from the operation unit 25, and stores the setting information in the storage unit 22 as necessary.

The display unit 26 includes a display device such as a liquid crystal display panel and an organic EL display panel, and displays information to be notified to the administrator of the server device 2 based on the control signal output from the control unit 21. ..

In addition, in the present embodiment, the server device 2 is configured to include the operation unit 25 and the display unit 26, but the operation unit 25 and the display unit 26 are not essential, and the operation is accepted and notified through the computer connected to the outside. The information to be output may be output to an external computer.

FIG. 11 is a conceptual diagram showing an example of the oral cavity image database 220. The oral cavity image database 220 stores the oral cavity image and the annotation for the oral cavity image in association with each other. The oral cavity image includes, for example, an image of the oral cavity in which a malignant tumor has developed, an image of the oral cavity having a morphology (eg, ulcer, erosion, bulge, etc.) specific to an oral mucosal disease. The annotation includes the doctor's diagnosis result. The diagnosis result includes a pathological diagnosis result or a definitive diagnosis result, and in the present embodiment, it is used as label data indicating that the oral image stored in association with each other is normal, or as label data indicating which lesion. To be Further, the annotation may include information such as the subject ID and the subject name.

Hereinafter, a procedure for generating a learning model in the server device 2 will be described.
FIG. 12 is a flowchart illustrating a learning model generation procedure. The control unit 21 of the server device 2 accesses the oral cavity image database 220 of the storage unit 22 and acquires the training data used for generating the learning model (step S401). The training data includes, for example, an oral cavity image and an annotation for the oral cavity image. In the initial stage of generating the learning model, the training data set by the operator of the server device 2 or the like is set. Further, as the learning progresses, the estimation result by the learning model 130 and the oral cavity image used for the estimation process may be acquired from the estimation device 1, and the acquired data may be set as the training data.

Next, the control unit 21 inputs the image data included as training data into the learning model for learning (step S402), and acquires the calculation result from the learning model (step S403). Before the learning is started, it is assumed that an initial setting value is given to the definition information that describes the learning model. The calculation by this learning model is the same as the calculation of the learning model 130 in the estimation processing.

Next, the control unit 21 evaluates the calculation result obtained in step S403 (step S404) and determines whether the learning is completed (step S405). Specifically, the control unit 21 can evaluate the calculation result using an error function (also called an objective function, a loss function, a cost function) based on the calculation result and the training data obtained in step S403. The control unit 21 is a process of optimizing (minimizing or maximizing) the error function by a gradient descent method such as the steepest descent method, and when the error function becomes equal to or less than a threshold value (or more than a threshold value), learning is completed. to decide. In order to avoid the problem of overlearning, methods such as cross validation and early censoring may be introduced and learning may be terminated at an appropriate timing.

When it is determined that the learning is not completed (S405: NO), the control unit 21 updates the weights and biases between the nodes of the learning model (step S406) and returns the process to step S401. The control unit 21 can update the weights and biases between the nodes by using the error backpropagation method that sequentially updates the weights and biases between the nodes from the output layer of the learning model to the input layer.

When it is determined that the learning is completed (S405: YES), the control unit 21 stores the learned model in the storage unit 22 (step S407), and ends the process according to this flowchart.

As described above, in the present embodiment, the learning model 130 used in the estimation device 1 can be generated in the server device 2. The server device 2 transmits the generated learning model to the estimation device 1 in response to a request from the estimation device 1. The estimation device 1 receives the learning model from the server device 2, stores it in the storage unit 13, and then executes the estimation processing program P1 to execute the lesion estimation process.

Further, the server device 2 may be configured to newly collect an oral cavity image and an annotation for the oral cavity image and re-learn the learning model using these data at an appropriate timing after the learning is completed. The oral cavity image may be an oral cavity image (see FIG. 2) obtained by capturing at least a part of the oral mucosa, and an oral cavity image obtained by extracting a region corresponding to the oral mucosa (FIG. 8). ). Further, when displaying the estimation result, the estimation device 1 may accept a selection (diagnosis result) as to whether the estimation result is correct and transmit the accepted diagnosis result to the server device 2 as an annotation. The procedure for re-learning is exactly the same as the procedure for generating the learning model, in which the oral image included in the training data is input to the learning model, and the calculation result obtained as the output of the learning model and the annotation included in the training data are input. Re-learning is performed by evaluating the error of.

(Embodiment 5)
In the fifth embodiment, a configuration for estimating the presence or absence of a lesion in the oral mucosa using an enlarged image of the oral cavity will be described.

FIG. 13 is a block diagram illustrating the configuration of the estimation device 1 according to the fifth embodiment. The estimation device 1 includes an input unit 11, a control unit 12, a storage unit 13, an output unit 14, a communication unit 15, and an operation unit 16. Since these configurations are similar to those in the first embodiment, detailed description thereof will be omitted.

In the fifth embodiment, a learning model is prepared for each part of the oral cavity such as the tongue, upper lip, hard palate, soft palate, uvula, tonsils, buccal mucosa, floor of the mouth, gums, and lower lip. The learning model 130-k (k=1, 2,..., M; m is an integer of 1 or more) for each part is stored in the storage unit 13. The learning model 130-k is configured to output information regarding a lesion in the oral mucosa in response to the input of an enlarged image described later. In the present embodiment, the learning model 130-k is prepared for each part of the oral cavity, but a common learning model may be used for some parts. For example, a common learning model may be prepared for the hard palate and the soft palate, and a common learning model may be prepared for the upper lip and the lower lip. Further, different learning models may be used for each of the divided areas obtained by dividing a specific part. For example, different learning models may be prepared for the upper surface area and the lower surface area of the tongue.

FIG. 14 is a schematic diagram showing an example of an enlarged image. The enlarged image in the present embodiment is an oral cavity image obtained by enlarging the inside of the oral cavity of the subject. Such a magnified image can be obtained by imaging the inside of the oral cavity of the subject using a contact-type cell observation device such as a contact-type endoscope system. In the present embodiment, in order to observe the shape and distribution of epithelial cell nuclei, the observation site is stained with a staining agent such as methylene blue, and 500 times in a state in which the imaging unit of the contact-type cell observation device is pressed against the observation site. The observation site was imaged at a magnifying power of some degree.

The example of FIG. 14 shows a magnified image of a part of the tongue of the subject magnified 500 times. Although the enlarged image is shown in gray scale for the sake of space, the actual enlarged image obtained from the contact-type cell observation device is a color image. In the magnified image, the epithelial cell nucleus stained with the stain is observed. The example of FIG. 14 shows a state in which a large number of epithelial cell nuclei are observed as a small region having a circular shape or an oblong shape in which the circular shape is elongated and elongated. In the present embodiment, characteristics such as shape, staining property, distribution and arrangement, nucleus/cytoplasm ratio of normal epithelial cell nuclei, and shape, staining property, distribution and arrangement, nucleus/cytoplasm ratio of epithelial cell nuclei in the lesion area are included. The learning model 130-k is generated by learning the features and the above-mentioned parts for each part. The method of generating the learning model 130-k will be described in detail later.

FIG. 15 is a schematic diagram showing a configuration example of the learning model 130-k. The configuration of the learning model 130-k is the same as the configuration of the learning model 130 described in the first embodiment. That is, the learning model 130-k is a CNN learning model, and includes an input layer 131, an intermediate layer 132, and an output layer 133. The learning model 130 is configured such that, when the above-described magnified image is input to the input layer 131, the intermediate layer 132 performs a calculation and outputs information regarding a lesion in the oral mucosa from the output layer 133.

The example of FIG. 15 shows a learning model 130-k in which n categories are set as categories for classifying enlarged images. The learning model 130-k outputs the probability X1 of oral malignant tumor, the probability X1 of leukoplakia X2, the probability X3 of lichen planus, and the probability Xn of normal from each node of the output layer 133. Is configured. The number of categories (=n) to be set may be one or more. Further, the category for classifying the enlarged images may be set for each target part. For example, when the target site is the tongue, oral malignant tumors, leukoplakia, lichen planus, and lesions belonging to the organic changes of the tongue surface mucosa such as geographic tongue, groove tongue, black hair tongue, and midline rhomboiditis. May be set as a category for classifying. When the target site is gingiva, it may be set as a category for classifying oral malignant tumor, leukoplakia, lichen planus, gingivitis, periodontitis, and the like. The same applies to other target parts, and a category for classifying each target part may be set.

Note that although the CNN learning model 130 is shown in the example of FIG. 15, a machine learning model for constructing the learning model 130-k can be set arbitrarily. For example, instead of CNN, a learning model based on R-CNN, YOLO, SSD, etc. may be set. Further, information on the region estimated to be a lesion may be output using an algorithm such as U-Net.

The learning model 130-k is generated in the server device 2 accessible from the estimation device 1, for example. Prior to the generation of the learning model 130-k, the server device 2 collects data including the enlarged image, the subject ID, the subject name, and the diagnosis result of the doctor for each target site, and associates these information with each other. Register in the oral cavity image database 220. FIG. 16 is a conceptual diagram showing an example of the oral cavity image database 220 according to the fifth embodiment.

The server device 2 uses the data registered in the oral cavity image database 220 as the training data to generate the learning model 130-k. FIG. 17 is a flowchart illustrating the procedure for generating the learning model 130-k. The control unit 21 of the server device 2, for example, receives the designation of the target portion through the operation unit 25 (step S501), and associates the designated target portion with the enlarged image stored in the oral cavity image database 220 and the annotation for the enlarged image. Data such as (diagnosis result) is selected as training data (step S502). In the initial stage of generating the learning model 130-k, the training data set by the operator of the server device 2 or the like is set. Further, as the learning progresses, the estimation result by the learning model 130-k and the enlarged image used for the estimation process may be acquired from the estimation device 1, and the acquired data may be set as the training data.

The control unit 21 inputs the image data of the enlarged image selected as the training data to the learning model 130-k for learning (step S503), and acquires the calculation result from the learning model 130-k (step S504). Before the learning is started, it is assumed that the definition information describing the learning model 130-k is given an initial setting value. The calculation procedure using this learning model 130-k is the same as that in the fourth embodiment.

Next, the control unit 21 evaluates the calculation result obtained in step S504 (step S505) and determines whether the learning is completed (step S506). Specifically, the control unit 21 can evaluate the calculation result using an error function (also called an objective function, a loss function, or a cost function) based on the calculation result and the training data obtained in step S504. The control unit 21 is a process of optimizing (minimizing or maximizing) the error function by a gradient descent method such as the steepest descent method, and when the error function becomes equal to or less than a threshold value (or more than a threshold value), learning is completed. to decide. In order to avoid the problem of overlearning, methods such as cross validation and early censoring may be introduced and learning may be terminated at an appropriate timing.

When it is determined that the learning is not completed (S506: NO), the control unit 21 updates the weight and bias between the nodes of the learning model (step S507) and returns the process to step S502. The control unit 21 can update the weights and biases between the nodes by using the error backpropagation method that sequentially updates the weights and biases between the nodes from the output layer of the learning model to the input layer.

When it is determined that the learning is completed (S506: YES), the control unit 21 stores the learned learning model 130-k in the storage unit 22 (step S508), and ends the process according to this flowchart.

The server device 2 transmits the generated learning model 130-k to the estimation device 1 together with the information on the target part. The estimation device 1 stores the learning model 130-k transmitted from the server device 2 in the storage unit 13 in association with the information of the target part.

FIG. 18 is a flowchart showing an estimation procedure using the learning model 130-k. The control unit 12 of the estimation device 1 executes the estimation process program P1 stored in the storage unit 13 to execute the following estimation process.

The control unit 12 acquires an enlarged image obtained by imaging the inside of the oral cavity using, for example, a contact-type cell observation device (step S511). The control unit 12 may acquire the enlarged image via the input unit 11 or may acquire the enlarged image via the communication unit 15. In addition, when the estimation result using the normal-magnification oral image (for example, the estimation result as shown in FIG. 5) is obtained in advance, the control unit 12 performs the observation using the contact-type cell observation device. The subject ID, the subject name, the estimated lesion name, and the lesion position may be displayed on the display device 140 to present the observer with information on the observed site. For example, in a primary medical institution, a normal magnification oral cavity image is given to the learning model 130 to perform pre-estimation, and in a secondary medical institution, when performing observation using a contact-type cell observation device, it is obtained in the primary medical institution. The obtained estimation result may be presented to the secondary medical institution. In this case, the estimation device 1 in which the learning model 130 is installed may be installed in the primary medical institution, and the estimation device 1 in which the learning model 130-k is installed may be installed in the secondary medical institution. Further, the learning model 130 and the learning model 130-k may be installed in a server device accessible from both the primary medical care organization and the secondary medical care organization. FIG. 19 is a schematic diagram showing an example of presentation of an observation site. The observer may adjust the observation position based on the information of the observation site presented on the display device 140 and image the inside of the oral cavity using the contact-type cell observation device.

Next, the control unit 12 receives the designation of the target part (step S512). The control unit 12 may display an interface screen on the display device 140 when accepting the designation of the target part. FIG. 20 is a schematic diagram showing an example of an interface screen 1400 for accepting designation of a target part. On the interface screen 1400, for example, a pull-down menu 1401 for accepting selection of a target part and a start button 1402 for accepting a start instruction of estimation processing are arranged. The pull-down menu 1401 displays the names of target regions that can be designated (tongue, upper lip, hard palate, soft palate, uvula, palatine tonsils, cheek mucous membrane, floor of the mouth, gums, lower lip, etc.) by operation using the operation unit 16. Then, the designation of the estimation target site is accepted from the displayed name of the target site. The start button 1402 is configured to receive an instruction to start the estimation process by an operation using the operation unit 16. Note that the example of FIG. 20 shows an oral cavity image taken at a normal magnification, but an enlarged image taken by using a contact-type cell observation device may be displayed on the interface screen 1400.

When the designation of the target portion is received in step S512 and the estimation processing start instruction is received, the control unit 12 selects the learning model 130-k according to the designated target portion (step S513), and the selected learning model is selected. By giving the enlarged image acquired in step S511 to 130-k, the calculation by the learning model 130-k is executed (step S514). The calculation procedure using the learning model 130-k is the same as that in the first embodiment.

The control unit 12 acquires a calculation result from the learning model 130-k and estimates the presence or absence of a lesion in the oral mucosa based on the acquired calculation result (step S515). As described above, the probability of each lesion set as a category to be classified is output from each node forming the output layer 133 of the learning model 130-k. The control unit 12 can estimate the presence or absence of a lesion based on the probability output from each node of the output layer 133.

The control unit 12 outputs the estimation result through the output unit 14 (step S516). Specifically, the control unit 12 causes the display device 140 to display the estimation result by generating display data for displaying the estimation result on the display device 140 and outputting the generated display data to the display device 140. .. The display mode of the estimation result can be set arbitrarily. For example, the control unit 12 generates display data including a character or a graphic representing the presence or absence of a specific lesion (for example, oral malignant tumor) and outputs the display data to the display device 140, and the display device 140 displays the presence or absence of the specific lesion as a character. Alternatively, it may be displayed by a graphic. In addition, the control unit 12 generates display data including a probability value corresponding to each lesion, outputs the display data to the display device 140, and displays the probability value corresponding to each lesion on the display device 140 as numerical information. Good. An output example of the estimation device 1 is the same as that in FIG.

As described above, in the fifth embodiment, by using the learning model 130-k, based on the features such as the shape, staining property, distribution and arrangement, and nucleus/cytoplasm ratio of the epithelial cell nuclei observed in the enlarged image, the oral cavity The presence or absence of lesions on the mucous membrane can be estimated. In addition, by estimating the presence or absence of a lesion using such a learning model 130-k, it is possible that a minimally invasive and accurate diagnosis can be performed, and a tissue diagnosis comparable to a biopsy diagnosis can be performed. ing. In addition, the diagnosis result by the learning model 130 may be accumulated in the storage unit 13 and may be useful for clarifying the risk of developing a lesion in the oral mucosa, the prognostic factor, and the like.

The embodiments disclosed this time are to be considered as illustrative in all points and not restrictive. The scope of the present invention is shown not by the meanings described above but by the claims, and is intended to include meanings equivalent to the claims and all modifications within the scope.

DESCRIPTION OF SYMBOLS 1 estimation device 2 server device 11 input unit 12 control unit 13 storage unit 14 output unit 15 communication unit 16 operation unit 21 control unit 22 storage unit 23 input unit 24 communication unit 25 operation unit 26 display unit 130 learning model 130-1, 130 -2,..., 130-m Learning model 220 Oral image database P1 estimation processing program P2 region extraction program P3 model generation program

Claims

An acquisition unit that acquires an oral cavity image obtained by imaging the inside of the oral cavity of the subject,
Using a learning model configured to output information regarding lesions in the oral mucosa in response to the input of the oral cavity image, the presence or absence of lesions in the oral mucosa of the subject is estimated from the oral cavity image acquired by the acquisition unit. An estimation section,
And an output unit that outputs the estimation result of the estimation unit.
Further comprising a region extraction unit that extracts a region corresponding to the oral mucosa of the subject from the oral image,
The estimation device according to claim 1, wherein the estimation unit estimates the presence or absence of a lesion in the oral mucosa of the subject from the oral cavity image of the region extracted by the region extraction unit.
An image processing unit for pre-processing the oral image is further provided,
The estimation device according to claim 1, wherein the estimation unit inputs the oral cavity image preprocessed by the image processing unit to the learning model.
The estimation device according to claim 1, wherein the oral cavity image is an enlarged image obtained by enlarging an observation target site to an extent that the shape of the epithelial cell nucleus can be observed.
The estimation device according to claim 4, wherein a learning model that is individually learned according to the observation target region is set.
The estimation result includes information indicating the position of the lesion in the oral cavity,
The estimation device according to claim 4 or 5, wherein when capturing the image of the observation target portion, the output unit outputs information indicating a position of the lesion portion estimated in advance.
The estimation part is oral malignant tumor, precancerous lesion, benign tumor, traumatic ulcer, inflammatory disease, viral disease, fungal infection, autoimmune disease, stomatitis, stomatitis, pressure ulcer, tongue surface mucosa The estimation device according to any one of claims 1 to 6, which estimates whether there is at least one lesion belonging to a graft change or a graft-versus-host disease.
The said learning model is a learning model which learned the relationship between the said oral cavity image and the information regarding the said lesion|path, using the oral cavity image and the annotation with respect to this oral cavity image as teacher data. The estimation device according to one.
The estimation device according to any one of claims 1 to 8, wherein the learning model is a learning model learned by using a convolutional neural network.
An input layer into which an oral image obtained by imaging the inside of the oral cavity of the subject is input,
An output layer that outputs information about lesions in the oral mucosa, and a relationship between the oral image that is input to the input layer and the information that the output layer outputs, using the oral image and the annotation for the oral image as teacher data. With the learned middle class,
A learning model that causes a computer to function so that when an oral cavity image is input to the input layer, the intermediate layer calculates the information and outputs information about lesions in the oral mucosa from the output layer.
Using a computer
An oral cavity image obtained by imaging the inside of the oral cavity of the subject, and acquiring teacher data including an annotation for the oral cavity image,
A learning model generation method that generates a learning model that outputs information regarding lesions in the oral mucosa in response to the input of an oral image based on the acquired teacher data.
On the computer,
Acquire the oral cavity image obtained by imaging the inside of the oral cavity of the subject,
Using a learning model configured to output information on lesions in the oral mucosa in response to the input of the oral cavity image, from the acquired oral cavity image, the presence or absence of lesions in the oral mucosa is estimated,
A computer program for executing the process of outputting the estimation result.