CN111191684A

CN111191684A - Visual otoscope with intelligent image classification diagnosis function based on deep learning

Info

Publication number: CN111191684A
Application number: CN201911278422.4A
Authority: CN
Inventors: 蔡跃新; 郑亿庆; 余晋刚; 李远清; 刘楚
Original assignee: South China University of Technology SCUT; Sun Yat Sen Memorial Hospital Sun Yat Sen University
Current assignee: South China University of Technology SCUT; Sun Yat Sen Memorial Hospital Sun Yat Sen University
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-22
Anticipated expiration: 2039-12-12
Also published as: CN111191684B

Abstract

The invention discloses a visual ear endoscope with an intelligent image classification diagnosis function based on deep learning, which comprises an endoscope body and a detection pen, wherein the endoscope body is provided with a display screen, an image classification diagnostor is arranged in the endoscope body and is connected with the detection pen through a data transmission line, the detection pen is used for detecting and obtaining an ear endoscope image of a patient and transmitting the detected ear endoscope image to the image classification diagnostor, the image classification diagnostor comprises a construction module, a training module, a verification module and a diagnosis module, and the visual ear endoscope can realize visual detection and intelligent diagnosis of an ear endoscope in the patient.

Description

Visual otoscope with intelligent image classification diagnosis function based on deep learning

Technical Field

The invention relates to medical equipment, in particular to a visual ear endoscope with an intelligent image classification diagnosis function based on deep learning.

Background

Artificial Intelligence (AI) has become well developed in the field of computer medicine, and has been rapidly developed in recent years, making it competitive with doctors in the accuracy of image processing. Under the large background of domestic medical informatization and hierarchical diagnosis and treatment, the market space of artificial intelligence and medical images is continuously increasing.

Otitis media is the most common ear disease, and chronic suppurative otitis media and secretory otitis media are the most common. The chronic suppurative otitis media is the suppurative inflammation of middle ear mucosa, periosteum or bone, is mainly characterized by repeated suppurative otitis, hearing loss and perforation of tympanic membrane, can prevent deterioration by early investigation and avoid development of nerve deafness or bone destruction, and is divided into an active period and a static period by the latest classification mode for the chronic suppurative otitis media, the operation scheme is slightly different, the static period can be directly operated, and the operation can be carried out after the active period needs to be firstly provided with anti-inflammatory treatment to stop pus for 3 months. Secretory otitis media is particularly frequent in children, and the result of statistical analysis of the popularity of secretory otitis media shows that 90% of preschool children in China have secretory otitis media, most of the children can self-heal, and part of the children cannot self-heal, if the children with secretory otitis media cannot be treated in time, the speech function development disorder of the children can be caused, and the early screening has very important significance; adult secretory otitis media is low in incidence rate and is often accompanied with nasopharynx diseases.

However, the traditional medical service mode is single, the overall service capability is low, and the screening and diagnosis of eye diseases of large-scale people cannot be loaded, so that the contradiction between limited medical resources and the huge medical requirements of ear disease people in China is increasingly prominent. Therefore, a change of medical service mode is urgently needed to improve the general medical service capability of otology in China and break the vicious circle of unbalanced medical supply and demand.

Disclosure of Invention

The invention aims to provide a visual otoscope with an intelligent image classification diagnosis function based on deep learning, which can realize visual detection and intelligent diagnosis of an otoscope in a patient.

The purpose of the invention is realized by the following technical scheme: a visual ear endoscope with an intelligent image classification diagnosis function based on deep learning comprises an endoscope body and a detection pen, wherein the endoscope body is provided with a display screen, and is characterized in that an image classification diagnostor is arranged in the endoscope body and connected with the detection pen through a data transmission line, the detection pen is used for detecting and obtaining an ear endoscope image of a patient and transmitting the detected ear endoscope image to the image classification diagnostor, the image classification diagnostor comprises a construction module, a training module, a verification module and a diagnosis module,

constructing a module: the system is used for selecting an otoscope image from a hospital case database to construct an otoscope data set, and dividing the data set into a test set and a training set;

a training module: the training set is used for loading a pre-trained neural network model, fine-tuning the pre-trained neural network model on the obtained training set, and obtaining the trained neural network model;

a verification module: the neural network model is used for verifying the performance of the neural network model obtained by training of the training module on the test set and screening out the optimal neural network model;

a diagnostic module: the optimal neural network model obtained through the verification module is used for carrying out intelligent classification diagnosis on the otoscope image of the patient detected by the detection pen, outputting the classification diagnosis result of the otoscope image, and displaying the otoscope image of the patient and the classification diagnosis result of the otoscope image through the display screen, so that the visual detection and the intelligent diagnosis of the otoscope are realized.

In the invention, the specific process of constructing the data set by the construction module is as follows:

(a1) acquiring images of an ear endoscope to form an ear endoscope data set by utilizing an ear endoscope database of an ear-nose-throat department of a grandson impatiens commemorative hospital at Zhongshan university;

(a2) dividing the otoscope image into four types of normal otitis media, secretory otitis media, chronic suppurative otitis media in active period and chronic suppurative otitis media in resting period according to the lesion degree, and labeling;

(a3) screening out various types of blurred images and in-ear endoscope images of lesion parts which are not shot;

(a4) randomly selecting a case of an otoscope data set, dividing a plurality of test sets and a training set by using a cross validation method, wherein one case in the otoscope data set can contain a plurality of case images, and ensuring that the plurality of case images of the same case can not appear in the training set and the test set at the same time during specific division, namely all the pictures contained in the case are either all in the training set or all in the test set.

The specific process of training the neural network model by the training module is as follows:

(b1) loading a pre-trained neural network model, wherein the neural network model comprises but is not limited to an IncepotionV 3 model, a ResNet50 model and the like;

(b2) fine-tuning the pre-trained neural network model: removing the last full-connection layer in the neural network model, replacing the full-connection layer with the number of output types, and randomly initializing the weight of the full-connection layer with the number of output types to obtain a new neural network model for classifying the images of the in-ear endoscope;

(b3) and training the obtained new neural network model on each constructed training set to obtain the trained neural network model.

In the step (b3), the training method for training the obtained new neural network model generally adopts a stochastic gradient descent method.

The verification module verifies the specific process of the performance of the neural network model obtained by training the training module as follows:

(c1) evaluating the performance of the neural network model obtained by training in the multi-fold cross validation by using the accuracy, wherein the neural network model with the highest accuracy is the optimal neural network model;

(c2) verifying the performance of the optimal neural network model: and drawing an ROC curve graph of the optimal neural network model between two difficult conditions of the normal otitis media and the secretory otitis media and the active stage and the resting stage of the chronic otitis media, drawing the true positive rate and the false positive rate of the doctor participating in verification at corresponding positions in the ROC curve graph, and if the ROC curve of the optimal neural network model surrounds the result point of the doctor, indicating that the optimal neural network model can reach or exceed the performance of a human expert, and can be used for intelligent classification of the otoendoscope images of actual patients.

The specific diagnosis process of the diagnosis module is as follows: inputting the otoscope picture of the patient into the obtained optimal neural network model, and outputting the classification diagnosis results of the otoscope picture of the patient corresponding to the secretory otitis media, the active period of chronic suppurative otitis media and the resting period of chronic suppurative otitis media through the optimal neural network model.

In the invention, the evaluation is the action (evaluate) that a person serves as a third party to see whether the model is good or bad, namely the model is tested in cross validation, and the performance index result is the average accuracy.

The meaning of multi-fold cross validation is that for a certain model, a data set is randomly divided for many times before, and a plurality of groups of training sets-testing sets are generated. The performance index of the model is the average value of multiple groups of accuracy rates, namely the average accuracy rate, obtained by training the model on each group of training sets and testing on the corresponding test set.

The implication of validation is to use the model for this action (validation) in each folded test set. As a preferred embodiment, the performance of the optimal neural network model is verified using AUC, which means the area enclosed under the ROC curve.

The AI product developed by the project can be applied to large hospitals, primary hospitals, ordinary families and other occasions, and simultaneously solves the problems of heavy load of people circulation in the large hospitals, insufficient equipment in the primary hospitals, delay of medical treatment in the families and the like. The classification diagnosis and treatment can be promoted and implemented, a new thought and method is developed, the medical diagnosis can be classified into three grades, and the first-grade diagnosis is realized, namely when the otitis media identification method is used at home, an AI method can be automatically used for primary screening when the ear of a patient is unconscious, so that medical guidance is provided, and the second-grade diagnosis is realized, namely AI identification and preliminary judgment and treatment decision are carried out according to the judgment of doctors in the primary hospital, and the classification diagnosis and treatment decision making is mainly finished in the primary hospital; the three-stage triage and the image processing expert mode are completed, the expert intervenes in the disease treatment of the patient to further complete the image processing, and the grading diagnosis and treatment can greatly improve the disease diagnosis rate and diagnosis and treatment efficiency and save the diagnosis and treatment resource expenditure of doctors. Therefore, the method for processing the intelligent images of the otoendoscopic otitis media has great significance to the society

Compared with the prior art, the visual in-ear endoscope has the beneficial effects that: the image processing model with the accuracy reaching the expert level can be trained in a data driving mode by utilizing the existing data of a hospital, and the qualitative interpretation picture of the image processing result can be output by utilizing the weak supervision technology, so that the image processing model is convenient for human to understand and can realize the visual detection and intelligent diagnosis of the otoscope of the patient.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a schematic view of the overall structure of a visual in-ear mirror according to the present invention;

FIG. 2 is a frame diagram of an image classification diagnostor in the visual in-ear endoscope according to the present invention;

FIG. 3 is a block diagram illustrating the steps of an image classification diagnotor performing image intelligent classification processing in the visual in-ear endoscope according to the present invention;

FIG. 4 is an image of an otoscope of the present invention taken by a visual otoscope in various categories;

FIG. 5 is an exemplary illustration of an output of a method for intelligently classifying and processing images using a visual in-ear endoscope according to the present invention;

FIG. 6 is a ROC graph comparing the performance of the optimal Incepison V3 model and human experts used in visual ear endoscopes according to the present invention;

FIG. 7 is another ROC graph comparing the performance of the optimal Incepison V3 model and human experts for use in a visual ear endoscope in accordance with the present invention.

Detailed Description

As shown in fig. 1 to 2, the visual ear endoscope with intelligent image classification diagnosis function based on deep learning comprises an endoscope body 1 and a detection pen 2, wherein the endoscope body 1 is provided with a display screen 11, and is characterized in that an image classification diagnostor is arranged in the endoscope body 1 and is connected with the detection pen 2 through a data transmission line 3, the detection pen 2 is used for detecting and obtaining an ear endoscope image of a patient and transmitting the detected ear endoscope image to the image classification diagnostor, the image classification diagnostor comprises a construction module, a training module, a verification module and a diagnosis module, wherein,

a verification module: the test set is used for verifying the performance of the neural network model obtained by training of the training module and screening out the optimal neural network model;

a diagnostic module: the optimal neural network model obtained through the verification module is used for carrying out intelligent classification diagnosis on the otoscope image of the patient detected by the detection pen 2, outputting the classification diagnosis result of the otoscope image, and displaying the otoscope image of the patient and the classification diagnosis result of the otoscope image through the display screen 11, so that the visual detection and the intelligent diagnosis of the otoscope are realized.

The block diagram of the image classification diagnostor for intelligently classifying and processing images in the visible in-ear endoscope is shown in fig. 1, and the specific steps are as follows:

step A: selecting an otoscope image from a hospital case database to construct an otoscope data set, and dividing the data set into a test set and a training set;

and B: loading a pre-trained neural network model, and fine-tuning the pre-trained neural network model on the obtained training set to obtain the trained neural network model;

and C: verifying the performance of the neural network model obtained by training in the step B on a test set, and screening out an optimal neural network model;

step D: and C, intelligently classifying the otoscope images of the patients through the optimal neural network model obtained in the step C, and outputting a classification result.

The steps A to C are the construction method of the in-ear endoscope image neural network model based on deep learning.

In this embodiment, the method for constructing the data set in step a is as follows:

(a4) randomly selecting cases of an otoscope data set, dividing the cases into a test set and a training set, repeatedly generating five training sets and five test sets by using a five-fold cross validation method, wherein one case in the otoscope data set can contain a plurality of case images, and ensuring that the plurality of case images of the same case can not appear in the training set and the test set at the same time during specific division, namely all the pictures contained in the case are either all in the training set or all in the test set.

The ear endoscope otitis media image data set adopted by the invention is from the Zhongshan university Sunyanxian commemorative hospital, the ear-nose-throat department of the hospital has rich medical information resources, each image is a 4-type image which is labeled by a plurality of authoritative doctors, accurate labeling is ensured by means of other hearing facilities such as pure tone audiometry and acoustic immittance, the data set has extremely high reliability, a label of a lesion type is reliable, the ear endoscope image data set can be used for subsequent model training, and the acquired ear endoscope images of various types are shown in figure 4.

In the present invention, the four-classification evaluation criteria are as follows:

(1) the image of the in-ear endoscope has no lesion, the normal tympanic membrane is an oval gray semitransparent film, the tensed part has a light cone, and the classification is 0;

(2) negative pressure invagination of tympanic cavity, effusion of tympanic cavity, and loss of normal luster of tympanic membrane of secretory otitis media, yellowish, orange red or amber, visible bubble sign or gas-liquid plane when the tympanic cavity is not full, and projection of malleus short process, with shortened light cone, and can be identified as category 1;

(3) the chronic suppurative otitis media can be seen in the active period with tympanic membrane perforation, pus discharge and red swelling inflammation symptoms, and is classified as type 2;

(4) perforations of tympanic membrane were observed in resting stage of chronic suppurative otitis media, but were dry, without pus discharge and inflammation symptoms, and was judged as category 3.

In this embodiment, the training method of the neural network model in step B is as follows:

(b1) loading a pre-trained neural network model, wherein the neural network model adopts an IncepotionV 3 model, and can also adopt models such as a ResNet50 model and the like;

(b3) and (3) training the obtained new neural network model on each constructed training set to obtain the trained neural network model, specifically, using an Adam optimizer, setting the batch size to be 72, setting the initial learning rate to be 0.001, decreasing the learning rate once every 8 periods, and setting the new learning rate to be 0.1 time of the original learning rate.

In an embodiment of the invention, the input image is scaled to 299 x 299 pixels for IncepotionV 3, and 224 x 224 input resolution is used for ResNet-50 and MobileNet-V2. All scaling operations maintain the aspect ratio by adding black edges and use a bilinear sampling method to maximally retain information.

In this embodiment, the data enhancement operation performed on the input image during network training includes: translation, zoom, rotation, left-right flipping, and left-right stretching. The specific parameters are random translation and left-right stretching with maximum value of 0.1 times image width, random scaling from 0.9 to 1.1 times, random rotation between 0 degree and 30 degrees, and left-right flipping with 50% probability. Ensure that training samples that the network never sees are input in each training batch.

In this embodiment, the method for verifying the performance of the neural network model trained in step B in step C is as follows:

(c2) the performance of the optimal neural network model was verified using AUC: and drawing an ROC curve graph of the optimal neural network model between two difficult conditions of the activity period and the stationary period of the normal otitis media and the secretory otitis media and the chronic otitis media suppurativa, drawing the true positive rate and the false positive rate of the doctor participating in verification at corresponding positions in the ROC curve graph, and if the ROC curve of the optimal neural network model surrounds the result point of the doctor, indicating that the optimal neural network model can reach or exceed the performance of a human expert, and can be used for processing the actual otoendoscope image.

In this embodiment, the specific process of step D is: inputting the otoscope picture of the patient into the obtained optimal neural network model, outputting the classification diagnosis results of the otoscope picture of the patient corresponding to the secretory otitis media, the chronic suppurative otitis media in the active period and the chronic suppurative otitis media in the resting period through the optimal neural network model, and providing an auxiliary explanatory diagram of the classification diagnosis results.

The generation method of the auxiliary explanatory diagram comprises the following steps: in the neural network, for the output node with the largest classification score, the gradient is calculated to the input layer, and the size of each element of the matrix obtained by the method and the size of the input image indicates the contribution of the pixel to the classification score of the class, so that the matrix can be used for explaining the reason of the result generated by the neural network.

Specifically, for each input picture, after obtaining a gradient matrix, an absolute value is taken at each pixel, and absolute values of gradients of three relative RGB components are added to be used as a saliency value of the pixel. After the whole matrix is normalized, a 71 × 71 gaussian kernel is used to blur the matrix, and the blurred matrix is colored to obtain an auxiliary explanatory diagram of the picture. An example of a specific implementation thereof is shown in fig. 5.

To further illustrate the advantages of the classification method of the present invention, the method proposed by the present invention will be finally applied to the real data set, and compared with the general deep convolutional neural network. The real data set contains 6065 otoscope images of sizes ranging from 500 × 500 pixels to 700 × 700 pixels. 1040 images belong to a normal tympanic membrane, 2613 images belong to secretory otitis media, 1661 images belong to chronic suppurative otitis media resting stage, and 751 images belong to chronic suppurative otitis media active stage.

Five-fold cross validation is used in the experiment, so that different pictures of the same case can not appear in a training set and a test set simultaneously when the data set is segmented. The experiment uses the accuracy as an evaluation index, namely the number of correctly classified images in the test set accounts for the total number of images in the test set.

The final evaluated performance is the average of the performance over all fold test sets. The results obtained are shown in table 1.

Table 1: performance comparison using different network architectures

Selecting the best of the IncepotionV 3 and comparing the results of the two-classification image processing of human experts on key categories, it can be seen from the ROC graphs of FIGS. 6 and 7 that the model is significantly better than the human expert level.

The above-described embodiments of the present invention are not intended to limit the scope of the present invention, and the embodiments of the present invention are not limited thereto, and various other modifications, substitutions and alterations can be made to the above-described structure of the present invention without departing from the basic technical concept of the present invention as described above, according to the common technical knowledge and conventional means in the field of the present invention.

Claims

1. A visual ear endoscope with intelligent image classification diagnosis function based on deep learning comprises an endoscope body (1) and a detection pen (2), wherein the endoscope body (1) is provided with a display screen (11), and is characterized in that an image classification diagnostor is arranged in the endoscope body (1) and connected with the detection pen (2) through a data transmission line (3), the detection pen (2) is used for detecting and obtaining an ear endoscope image of a patient and transmitting the detected ear endoscope image to the image classification diagnostor, the image classification diagnostor comprises a construction module, a training module, a verification module and a diagnosis module,

a diagnostic module: the optimal neural network model obtained through the verification module is used for carrying out intelligent classification diagnosis on the otoscope image of the patient detected by the detection pen (2), outputting the classification diagnosis result of the otoscope image, and displaying the otoscope image of the patient and the classification diagnosis result of the otoscope image through the display screen (11), so that the visual detection and the intelligent diagnosis of the otoscope are realized.

2. The deep learning based visual ear endoscope with intelligent image classification diagnosis function according to claim 1, characterized in that: the specific process of the construction module for constructing the data set is as follows:

(a1) acquiring images of an otoscope to form an otoscope data set by using an otoscope database of an otorhinolaryngology department of a hospital;

3. The deep learning based visual ear endoscope with intelligent image classification diagnosis function according to claim 2, characterized in that: the specific process of training the neural network model by the training module is as follows:

(b1) loading a pre-trained neural network model, wherein the neural network model adopts an IncepotionV 3 model or a ResNet50 model;

4. The deep learning based visual ear endoscope with intelligent image classification diagnosis function according to claim 3, characterized in that: the verification module verifies the specific process of the performance of the neural network model obtained by training the training module as follows:

5. The deep learning based visual ear endoscope with intelligent image classification diagnosis function according to claim 4, characterized in that: the specific diagnosis process of the diagnosis module is as follows: inputting the otoscope picture of the patient into the obtained optimal neural network model, and outputting the classification diagnosis results of the otoscope picture of the patient corresponding to the secretory otitis media, the active period of chronic suppurative otitis media and the resting period of chronic suppurative otitis media through the optimal neural network model.