WO2020088288A1 - 内窥镜图像的处理方法、系统及计算机设备 - Google Patents
内窥镜图像的处理方法、系统及计算机设备 Download PDFInfo
- Publication number
- WO2020088288A1 WO2020088288A1 PCT/CN2019/112202 CN2019112202W WO2020088288A1 WO 2020088288 A1 WO2020088288 A1 WO 2020088288A1 CN 2019112202 W CN2019112202 W CN 2019112202W WO 2020088288 A1 WO2020088288 A1 WO 2020088288A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- endoscopic image
- image
- training
- deep convolutional
- convolutional network
- Prior art date
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00002—Operational features of endoscopes
- A61B1/00043—Operational features of endoscopes provided with output arrangements
- A61B1/00045—Display arrangement
- A61B1/0005—Display arrangement combining images e.g. side-by-side, superimposed or tiled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00002—Operational features of endoscopes
- A61B1/00004—Operational features of endoscopes characterised by electronic signal processing
- A61B1/00009—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope
- A61B1/000094—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope extracting biological structures
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B1/00—Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
- A61B1/00002—Operational features of endoscopes
- A61B1/00004—Operational features of endoscopes characterised by electronic signal processing
- A61B1/00009—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope
- A61B1/000096—Operational features of endoscopes characterised by electronic signal processing of image signals during a use of endoscope using artificial intelligence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10068—Endoscopic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
Definitions
- the present invention relates to the field of image processing technology, and in particular, to an endoscopic image processing method, system, and computer equipment.
- gastric cancer and esophageal cancer rank among the top five among the types of malignant tumors with high incidence in China and globally.
- Gastric cancer and esophageal cancer are malignant tumors in the upper digestive tract.
- the doctor performs an electronic examination through an endoscope, enters the endoscope from the oral cavity into the subject's upper digestive tract, and the strong light emitted by the light source turns the light through the light guide fiber to enable the doctor to Observe the health of the organs in the upper digestive tract.
- the embodiments of the present invention provide an endoscopic image processing method, system, and computer equipment, which make the prediction process more intelligent, more robust, and improve the resource utilization of the processing device.
- the invention provides an endoscope image processing method, including:
- the organ category corresponding to the current endoscopic image is determined.
- the invention also provides an endoscope image processing system, including: a human body detection device and an endoscope image processing device, wherein,
- the human body detection device is used to detect human body parts and send the detected at least one first endoscope image to the endoscope image processing device;
- the endoscopic image processing device is used to acquire the at least one first endoscopic image from the human detection device; create a depth convolution network for predicting the endoscopic image, and according to the at least one first An endoscopic image and at least one second endoscopic image after transforming the at least one first endoscopic image, to determine the training parameters of the deep convolutional network; and, to obtain the current internal content of the user to be inspected
- For the endoscopic image use the deep convolutional network and predict the current endoscopic image based on the training parameters to determine the organ class corresponding to the current endoscopic image.
- the present invention also provides a computer-readable storage medium that stores computer-readable instructions.
- the computer-readable instructions are executed by at least one processor, the at least one processor is loaded and executed to implement the following step:
- the organ category corresponding to the current endoscopic image is determined.
- the present invention also provides a computer device including at least one memory and at least one processor, where the at least one memory stores at least one program code, and the at least one program code is loaded and executed by the at least one processor To achieve the following steps:
- the organ category corresponding to the current endoscopic image is determined.
- the method provided by the embodiments of the present invention enables the feature extraction process to be completely learned by the model of the deep convolutional network without the need for researchers to deeply understand the medical image, reducing the dependence on the professional level of the doctor, making The entire prediction process is more intelligent; at the same time, it can reduce the amount of labeled data used in the training process, increase the speed of training convergence, provide clean and usable data for the next disease diagnosis, and diagnose the disease under different organs
- the available integrated modules are provided to improve the resource utilization rate of the processing device.
- FIG. 1 is a schematic structural diagram of an endoscope image processing system according to an embodiment of the present invention.
- FIG. 2 is a schematic flowchart of a method for processing an endoscopic image in an embodiment of the present invention
- FIG. 3 is a schematic structural diagram of a deep convolution network in an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a deep convolutional network in another embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of a processing layer in another embodiment of the present invention.
- FIG. 6 is a schematic flowchart of a method for processing an endoscopic image in another embodiment of the present invention.
- FIG. 7 is a schematic diagram of a label image in an embodiment of the invention.
- FIG. 8 is a schematic flowchart of training a deep convolutional network in an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of an endoscope image processing device according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of an endoscope image processing device according to another embodiment of the present invention.
- FIG. 1 is a schematic structural diagram of an endoscope image processing system according to an embodiment of the present invention.
- an endoscope image processing system 100 includes a user 101 to be inspected, a human body detection device 102 including an endoscope 1021, an endoscope image processing device 103, and a doctor 104.
- the endoscope image processing device 103 may include a real-time prediction sub-device 1031, an offline training sub-device 1032, and an endoscope image database 1033.
- the human body detection device 102 detects a certain human body part of the user 101 to be inspected through the endoscope 1021.
- the human body detection device 102 sends the collected endoscope image to the endoscope image processing device 103, specifically, it can be sent to the real-time prediction sub-device 1031 as the current endoscope image to be predicted, or it can be sent to the endoscope
- the mirror image database 1033 is stored, and the images stored in the endoscope image database 1033 are used for offline training.
- the real-time prediction sub-device 1031 when the doctor 104 wants to diagnose the current endoscopic image to be predicted, the real-time prediction sub-device 1031 first needs to obtain training parameters from the offline training sub-device 1032, and then based on the training parameters and the created depth The convolutional network predicts the current endoscopic image and determines the organ type corresponding to the current endoscopic image.
- the organ type may be the duodenum in the upper digestive tract.
- the offline training sub-device 1032 uses the same deep convolutional network as the real-time prediction sub-device 1031 to obtain the images collected through the endoscope and the labeled label images from the endoscope image database 1033. Perform offline training based on the images collected by the endoscope and each labeled label image, and output the training parameters of the deep convolutional network.
- the above-mentioned human detection device 102 refers to a medical terminal device equipped with an endoscope 1021 and an image acquisition function.
- the endoscope 1021 may include an image sensor, an optical lens, a light source illumination, a mechanical device, and the like.
- the endoscope image processing device 103 may be a server or a cloud server, and has image storage and processing functions. Operating systems are installed on these terminal devices, including but not limited to: Android operating system, Symbian operating system, Windows mobile operating system, and Apple iPhone OS operating system, etc.
- the human body detection device 102 and the endoscope image processing device 103 can communicate through a wired or wireless network.
- FIG. 2 is a schematic flowchart of an endoscope image processing method according to an embodiment of the present invention. The method is applied to a computer device, and the computer device is taken as an example for description. The embodiment includes the following steps.
- Step 201 The server acquires at least one first endoscopic image for a human body part.
- the at least one first endoscopic image corresponds to a body part.
- Step 202 the server creates a deep convolutional network for predicting endoscopic images.
- Step 203 The server determines the training parameters of the deep convolution network according to at least one first endoscopic image and at least one second endoscopic image after transforming the at least one first endoscopic image.
- Step 204 The server obtains the current endoscopic image of the user to be checked, uses a deep convolutional network and predicts the current endoscopic image based on the training parameters, and determines the organ type corresponding to the current endoscopic image.
- the training parameters are determined based on at least one first endoscopic image and at least one second endoscopic image converted from the at least one first endoscopic image.
- At least one first endoscopic image can be obtained by detecting a body part using a detection device including an endoscope.
- the human body part includes one or more organs.
- the human body part is the upper digestive tract part, and the upper digestive tract part includes five organs, namely the pharynx, esophagus, stomach, cardia, and duodenum.
- the image captured by the detection device may be a picture or a video, and the acquired first endoscope image may be a white light RGB image.
- the deep convolutional network for classifying endoscopic images is a deep learning-based convolutional neural network.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer.
- FIG. 3 is a schematic structural diagram of a deep convolution network in an embodiment of the present invention.
- the input layer 301 determines at least one endoscopic image as an input;
- the processing layer 302 performs feature extraction on each input endoscopic image;
- the classification layer 303 outputs each input The organ type predicted by the speculum image.
- the at least one endoscope image may include each first endoscope image captured by the detection device, and of course, may also include each second endoscope image obtained by transforming each first endoscope image, thereby Ability to enrich sample size.
- the convolution layer 3021 performs feature extraction on the endoscopic image through the convolution matrix as a filter to obtain a feature image; the pooling layer 3022 is used to simplify the information output by the convolution layer and reduce the data dimension, Reduce computational overhead and control overfitting.
- the fully connected layer 3031 is used to detect to which organ category the acquired feature image is closest.
- the softmax layer 3032 outputs a 1 ⁇ M-dimensional classification vector.
- the softmax layer is used for exponential normalization.
- M is the number of alternative organ categories. For example, there are six alternative organ categories: non-organ diagram, pharynx, esophagus, stomach, cardia, and duodenum.
- the element in the classification vector takes the value [0,1], and the i-th element represents the probability that the input endoscope image belongs to the i-th candidate organ category.
- the server when creating a deep convolutional network, may add at least one dense connection layer to the processing layer.
- the dense connection layer includes multiple connection sublayers; for each connection sublayer, the The features output by other connection sub-layers before the connection sub-layer are used as the input of the connection sub-layer.
- FIG. 4 is a schematic structural diagram of a deep convolution network in another embodiment of the present invention.
- the processing layer 312 includes Y dense connection layers 3121-312Y.
- Each dense connection layer includes multiple connection sublayers, as shown by the solid circles in boxes 3121-312Y in the figure.
- the six classification probabilities shown in block 3131 are output in the output layer 313.
- FIG. 5 is a schematic structural diagram of a processing layer in another embodiment of the present invention. As shown in FIG. 5, in the structure of the processing layer 400, there are K dense connection layers 4021 to 402K between the convolution layer 401 and the pooling layer 404. In the same dense connection layer, each connection sublayer outputs The features are all input to the subsequent connection sublayers.
- H j H j ([z 0 , z 1 , ..., z j-1 ]) (1) where [z 0 , z 1 , ..., z j-1 ] means that the sequence number is from 0 to
- H j can be operations such as block normalization (Batch Normalization, BN, also known as batch normalization), ReLU excitation, and 3 ⁇ 3 convolution.
- the number of channels input to the densely connected layer is k 0
- the number of channels in the jth layer is k 0 + (j-1) ⁇ k, where k is the growth rate, and as the number of connected sub-layers increases, the number of channels As k increases linearly.
- a transition layer when at least one dense connection layer is added to the processing layer, in order to further compress the parameters, a transition layer may be added between two adjacent dense connection layers. As shown in FIG. 5, a transition layer 403 is added between the dense connection layer 4021 and the dense connection layer 4022. If there are K densely connected layers, the number of transition layers is K-1. And, the characteristic compression ratio of the transition layer can be set according to the preset prediction accuracy. Since the compression ratio affects the number of parameters and the accuracy of prediction, the value used for the characteristic compression ratio is set according to the prediction accuracy preset for the endoscopic image. For example, set to 0.5.
- the server determines the specific parameters of the processing layer and the classification layer in the deep convolution network according to the number of endoscope images to be predicted, prediction accuracy, and adjustment of hyperparameters during training.
- Table 1 is an example of the structure and parameters of a deep convolutional network, including a total of 4 densely connected layers and 3 transition layers. Among them, the growth rate of each dense connection layer can be set to 24; before the 3 ⁇ 3 convolution operation, a 1 ⁇ 1 convolution operation can also be performed first, which can reduce the number of input feature maps and can also merge various channels Characteristics. The 1 ⁇ 1 convolution operation in the transition layer can halve the number of input channels.
- the server may determine the depth based on at least one first endoscopic image and at least one second endoscopic image transformed from the at least one first endoscopic image Convolutional network training parameters.
- the server first transforms at least one first endoscopic image to obtain the transformed at least one second endoscopic image; and then converts at least one first endoscopic image and at least one second endoscopic image At the same time, input to the deep convolutional network for training to obtain the training parameters of the deep convolutional network.
- the transformation performed by the server on the at least one first endoscope image may include at least one of cropping, rotation, brightness jitter, color jitter, or contrast jitter.
- This transformation operation plays a role in data enhancement.
- the number of transformations made may be determined according to the number of candidate organ categories and / or preset prediction accuracy.
- 3011 is the first endoscopic image acquired from the detection device, and 3011 is subjected to two transformations, including: rotation transformation, to obtain the transformed second endoscope Mirror image 3012; there is also color conversion to obtain the converted second endoscopic image 3013, and the images 3011, 3012, and 3013 are simultaneously input into the processing layer 302 for feature extraction.
- step 204 the server acquires the current endoscopic image of the user to be checked, uses a deep convolutional network and predicts the current endoscopic image based on the training parameters, and determines the organ type corresponding to the current endoscopic image.
- the current endoscopic image input in the input layer 301 is predicted to be classified into the "esophagus” category; or, the image is an ineffective medical image, does not correspond to any organ, and belongs to the "non-organ map” Category, so that the doctor does not need to refer to the image when diagnosing the disease.
- the training parameters of the deep convolutional network are determined according to at least one first endoscopic image and at least one second endoscopic image transformed from the at least one first endoscopic image, and the transformed Each second endoscopic image as auxiliary data may be used for classification training of each first endoscopic image. From the overall plan, the following technical effects can be obtained:
- adding at least one densely connected layer can maximize the information flow between all layers in the network, to a certain extent, alleviate the problem of gradient dissipation during training, and, due to a large number of features Being multiplexed, a large number of features can be generated using a small number of convolution kernels, and the final model size is relatively small, reducing the number of parameters.
- FIG. 6 is a schematic flowchart of a method for processing an endoscopic image in another embodiment of the present invention. As shown in Figure 6, it includes the following steps:
- Step 501 The server acquires at least one first endoscopic image for a body part.
- Step 502 The server transforms at least one first endoscopic image to obtain at least one second endoscopic image after transformation.
- step 503 the server creates a deep convolutional network for predicting endoscopic images.
- steps 501-503 reference may be made to the above steps 201, 203, 202, which will not be repeated here.
- Step 504 The server determines at least one candidate organ category according to the structure of the human body part and the preset diagnosis target.
- the human body part may include multiple organs.
- multiple candidate organ categories need to be determined in advance. Specifically, when dividing a human body part, multiple regions can be divided by referring to a preset diagnosis target, and then multiple candidate organ categories can be determined. For example, currently, the most common types of malignant tumors are gastric cancer and esophageal cancer. Then the diagnosis target is set to confirm the diagnosis of these two organs. Then, the candidate organ type can be set to stomach, esophagus and other three types.
- Step 505 The server acquires the label image corresponding to each candidate organ category.
- these tag images may be obtained from a medical image database and manually annotated; or, those images with typical characteristics of candidate organs may be filtered from the collected first endoscopic image.
- 7 is a schematic diagram of a label image in an embodiment of the invention. As shown in FIG. 7, multiple label images of the duodenum, esophagus, stomach, and eyes are given.
- Step 506 When training the deep convolutional network, the server uses at least one first endoscopic image and at least one second endoscopic image as input samples, and uses each tag image as an ideal output sample (that is, the target output sample ) Perform training to obtain the training parameters of the deep convolutional network.
- the deep neural network gradually adjusts the weights in an iterative manner according to the input image samples and the ideal output samples during the training process until convergence.
- Step 507 The server obtains the current endoscopic image of the user to be checked, uses a deep convolutional network and predicts the current endoscopic image based on the training parameters, and determines the organ category corresponding to the current endoscopic image.
- This step is the same as the above step 204 and will not be repeated here.
- FIG. 8 is a schematic flowchart of training a deep convolution network in an embodiment of the present invention. As shown in Figure 8, it includes the following steps:
- Step 701 The server acquires at least one first endoscopic image for a human body part.
- Step 702 The server transforms at least one first endoscopic image to obtain at least one second endoscopic image after transformation.
- Step 703 the server creates a deep convolutional network for predicting endoscopic images.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer.
- the parameters can be adjusted through the back-propagation algorithm and iterative to convergence.
- the back-propagation algorithm can be divided into 4 different parts: forward transfer, calculation of loss function, reverse transfer, and update of parameters.
- forward transfer input initial sample data, including at least one first endoscopic image and at least one second endoscopic image after transformation, and transfer it in the processing layer.
- loss function the deep convolutional network can help update the training parameters until convergence.
- Step 704 the server constructs a loss function for training the deep convolutional network in advance.
- a loss function is constructed according to a preset convergence strategy.
- the convergence strategy is specifically a consistency constraint strategy, that is, the same endoscopic image through different transformations, the model
- the extracted features should be very close.
- the convergence strategy is specifically a central aggregation strategy, that is, between the first endoscopic images belonging to the same organ category
- the distance decreases, that is, the intra-class distance decreases, and at the same time, the distance between endoscopic images between different organ categories becomes larger, that is, the inter-class distance becomes larger.
- Step 705 The server inputs at least one first endoscopic image and at least one second endoscopic image, and initializes the deep convolution network.
- initializing the deep convolutional network includes two initialization processes:
- the initial training parameters and central features will cause the loss function to take a very high value.
- the purpose of training a deep neural network is to hope that the predicted value is the same as the real value. To this end, the value of the loss function needs to be minimized. The smaller the loss value, the closer the prediction result. In this process, iteratively adjust the training parameters and central features, calculate the value of the loss function at each iteration, and finally make the loss of the entire network to a minimum.
- the following steps 706 and 707 correspond to the above consistency constraint strategy; the following step 708 corresponds to the above central aggregation strategy.
- Step 706 The server acquires at least one processed feature obtained by processing at least one first endoscopic image by the processing layer.
- Step 707 The server calculates the value of the loss function at this iteration based on at least one processed feature and at least one feature of the second endoscopic image.
- the above-mentioned consistency constraint strategy is expressed in the loss function, respectively calculating a plurality of first distances between the processed features of each first endoscopic image and features of each second endoscopic image, and passing The distance constrains the consistency between the first endoscopic image and the second endoscopic image.
- the loss function L (w) can be as follows Perform iterative calculations:
- n is the number of input first endoscopic images
- m is the number of second endoscopic images obtained by transforming the first endoscopic images
- y i logf (x i ; w) represents the classification cross entropy loss
- h 0 is the feature vector of the first endoscopic image output through the processing layer, that is, the processed feature vector
- h k is the k-th second endoscopic image feature vector
- r and ⁇ are super Parameters, both of which are greater than 0.
- i is an integer greater than or equal to 1 and less than or equal to n
- k is an integer greater than or equal to 1 and less than or equal to m.
- the first distance is h 0 -h k
- the expression It reflects the consistency constraints between the endoscopic images before and after transformation.
- Step 708 The server calculates a plurality of second distances between the features of each first endoscopic image and the central features corresponding to each first endoscopic image.
- the center feature corresponding to the first endoscopic image is L C , then the second distance is Characterize the center loss.
- Step 709 The server calculates the value of the loss function according to the plurality of first distances and the plurality of second distances.
- the value of the loss function can be calculated by referring to the above formula (2).
- the value of the loss function is calculated according to the first distance and the second distance. The specific calculation is
- step 710 the server determines whether the training process ends according to the value of the loss function. If yes, go to step 713; otherwise, go to steps 711 and 712.
- the loss function is minimized, namely min L (w).
- Step 711 the server updates the training parameters. Then, step 706 is further executed to perform the next iteration process.
- Step 712 the server updates the center feature. Then, step 708 is further executed to perform the next iteration process.
- Step 713 the server obtains the training parameters of the deep convolutional network after the training.
- a loss function for training a deep convolutional network is constructed in advance, and the loss function is calculated for each iteration according to the processed feature of the first endoscopic image and the feature of each second endoscopic image Value, introducing consistency constraints, you can first find more stable features, accelerate the convergence speed of the training process until you get the optimal solution.
- the central aggregation strategy in the loss function can ensure that the features learned by the model for each category are more stable and cohesive, further improving the generalization ability of the model in the real environment.
- the device 800 includes:
- the obtaining module 810 is configured to obtain a first endoscopic image for a human body part through a detection device including an endoscope; obtain a current endoscopic image of the user to be inspected;
- the creating module 820 is used to create a deep convolutional network for predicting an endoscopic image, and according to the first endoscopic image acquired by the acquiring module 810 and at least one second after transforming the first endoscopic image Endoscopic images to determine training parameters for deep convolutional networks; and,
- the prediction module 830 is used to predict the current endoscopic image based on the training parameters using the deep convolutional network created by the creation module 820, and determine the organ category corresponding to the current endoscopic image.
- the device 800 further includes:
- the determining module 840 is configured to determine at least one candidate organ category according to the structure of the human body part and the preset diagnosis target; obtain a label image corresponding to each candidate organ category;
- the creation module 820 is configured to use the first endoscopic image and at least one second endoscopic image as input samples, and use each tag image determined by the determination module 840 as an ideal output sample to obtain training parameters.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer.
- the creation module 820 is used to add at least one dense connection layer to the processing layer.
- the dense connection layer includes multiple connection sublayers; Each connection sublayer uses the features output by other connection sublayers before the connection sublayer as the input of the connection sublayer.
- the creation module 820 is used to add a transition layer between two adjacent densely connected layers, and set the value of the characteristic compression ratio of the transition layer according to the preset prediction accuracy.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer
- the device 800 further includes:
- Construction module 850 which is used to construct a loss function for training a deep convolutional network in advance
- the creation module 820 is used to iteratively execute the following processes when training the deep convolutional network: acquiring the processed features obtained by processing the first endoscopic image from the processing layer; according to the processed features and the features of each second endoscopic image Calculate the value of the loss function at this iteration; determine whether the training process ends according to the value of the loss function, where, when the training process is determined to end, the training parameters are obtained.
- the creation module 820 is further configured to initialize the central feature of the organ category to which the first endoscopic image belongs; calculate the first feature between the processed feature and the feature of each second endoscopic image, respectively Distance; calculate the second distance between the feature of the first endoscopic image and the central feature corresponding to the first endoscopic image; calculate the value based on the first distance and the second distance.
- the transformation performed on the first endoscopic image includes at least one of cropping, rotation, brightness jitter, color jitter, and contrast jitter.
- the device 900 includes a processor 910, a memory 920, a port 930, and a bus 940.
- the processor 910 and the memory 920 are interconnected by a bus 940.
- the processor 910 may receive and transmit data through the port 930. among them,
- the processor 910 is used to execute a machine-readable instruction module stored in the memory 920.
- the memory 920 stores machine-readable instruction modules executable by the processor 910.
- the instruction modules executable by the processor 910 include: an acquisition module 921, a creation module 922, and a prediction module 923. among them,
- the acquisition module 921 When the acquisition module 921 is executed by the processor 910, it may be: used to acquire a first endoscopic image for a human body part through a detection device including an endoscope; and acquire a current endoscopic image of the user to be examined;
- the creation module 922 When the creation module 922 is executed by the processor 910, it may be: used to create a deep convolutional network for predicting an endoscopic image, and according to the first endoscopic image acquired by the acquisition module 921 and the first endoscopic image Performing at least one second endoscopic image after transformation to determine the training parameters of the deep convolution network;
- the deep convolutional network created by the creation module 922 may be used to predict the current endoscopic image based on the training parameters to determine the organ category corresponding to the current endoscopic image.
- the instruction module executable by the processor 910 further includes: a determination module 924, wherein, when the determination module 924 is executed by the processor 910, it may be: according to the structure of the human body part and the preset diagnosis target, determine At least one candidate organ category; before training the deep convolutional network, obtain the label image corresponding to each candidate organ category;
- the first endoscope image and at least one second endoscope image may be used as input samples, and each label image determined by the determination module 924 may be used as an ideal output sample to obtain training parameters. .
- the instruction module executable by the processor 910 further includes: a construction module 925, where the construction module 925 when executed by the processor 910 may be: pre-construct a loss function for training a deep convolution network;
- the creation module 922 When the creation module 922 is executed by the processor 910, it may be: when training the deep convolutional network, iteratively execute the following processes: acquiring the processed features obtained by processing the first endoscopic image from the processing layer; according to the processed features and each second The characteristics of the endoscopic image are used to calculate the value of the loss function at this iteration; according to the value of the loss function, it is determined whether the training process is ended, where, when the training process is determined to end, the training parameters are obtained.
- each functional module in each embodiment of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules may be integrated into one unit.
- the above integrated unit may be implemented in the form of hardware or software functional unit.
- an endoscopic image processing system which includes: a human body detection device and an endoscopic image processing device, wherein the human body detection device is used to detect a human body part and detect at least at least A first endoscopic image is sent to the endoscopic image processing device;
- the endoscopic image processing device is used to acquire the at least one first endoscopic image from the human detection device; create a depth convolution network for predicting the endoscopic image, based on the at least one first endoscopic image And at least one second endoscopic image transformed from the at least one first endoscopic image, to determine the training parameters of the depth convolution network; and, to obtain the current endoscopic image of the user to be inspected, use the depth
- the convolutional network predicts the current endoscopic image based on the training parameters, and determines the organ type corresponding to the current endoscopic image.
- the endoscopic image processing device is further used to determine at least one candidate organ category based on the structure of the human body part and the preset diagnosis target; obtain the label image corresponding to each candidate organ category
- the at least one first endoscopic image and the at least one second endoscopic image are used as input samples, and each tag image is used as a target output sample for training to obtain the training parameters.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer.
- the endoscopic image processing device is further used to pre-construct a loss function for training the deep convolutional network;
- iteratively executes the following processes: acquiring at least one processed feature obtained by processing the at least one first endoscopic image by the processing layer; based on the at least one processed feature and the at least one second endoscopic image Feature, calculate the value of the loss function at this iteration; determine whether the training process ends according to the value of the loss function, where the training parameters are obtained when the end of the training process is determined.
- the endoscopic image processing device when training the deep convolutional network, is further used to initialize the central features of the organ category to which the at least one first endoscopic image belongs; calculate the processed features and Multiple first distances between features of each second endoscopic image; calculating multiple second distances between features of each first endoscopic image and central features corresponding to each first endoscopic image; The plurality of first distances and the plurality of second distances calculate the value of the loss function.
- a computer device including at least one memory and at least one processor, where the at least one memory stores at least one program code, and the at least one program code is loaded and executed by the at least one processor to implement The following steps:
- the organ type corresponding to the current endoscopic image is determined.
- the at least one processor is used to perform the following steps:
- the at least one processor is used to perform the following steps:
- the at least one first endoscopic image and the at least one second endoscopic image are used as input samples, and each tag image is used as a target output sample for training to obtain the training parameters.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer, and the at least one processor is used to perform the following steps:
- At least one dense connection layer is added to the processing layer, and the dense connection layer includes multiple connection sublayers;
- connection sublayer For each connection sublayer, the features output by other connection sublayers before the connection sublayer are used as the input of the connection sublayer.
- the at least one processor is used to perform the following steps:
- a transition layer is added between two adjacent densely connected layers, and the characteristic compression ratio of the transition layer is set according to the preset prediction accuracy.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer, and the at least one processor is used to perform the following steps:
- the at least one processor when training the deep convolutional network, is used to perform the following steps:
- the value of the loss function is calculated.
- the transformation performed on the at least one first endoscopic image includes at least one of cropping, rotation, brightness jitter, color jitter, or contrast jitter.
- each embodiment of the present invention can be realized by a data processing program executed by a data processing device such as a computer.
- the data processing program constitutes the present invention.
- the data processing program usually stored in one storage medium is executed by directly reading the program out of the storage medium or by installing or copying the program into a storage device (such as a hard disk and or memory) of the data processing device. Therefore, such a storage medium also constitutes the present invention.
- Storage media can use any type of recording method, such as paper storage media (such as paper tape, etc.), magnetic storage media (such as floppy disk, hard disk, flash memory, etc.), optical storage media (such as CD-ROM, etc.), magneto-optical storage media ( Such as MO, etc.).
- the present invention also discloses a storage medium in which a data processing program is stored, and the data processing program is used to perform any one of the embodiments of the above method of the present invention.
- the storage medium may be a computer-readable storage medium that stores computer-readable instructions.
- the at least one processor may be loaded and executed To achieve the following steps:
- the organ type corresponding to the current endoscopic image is determined.
- the at least one processor is used to perform the following steps:
- the at least one processor is used to perform the following steps:
- the at least one first endoscopic image and the at least one second endoscopic image are used as input samples, and each tag image is used as a target output sample for training to obtain the training parameters.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer, and the at least one processor is used to perform the following steps:
- At least one dense connection layer is added to the processing layer, and the dense connection layer includes multiple connection sublayers;
- connection sublayer For each connection sublayer, the features output by other connection sublayers before the connection sublayer are used as the input of the connection sublayer.
- the at least one processor is used to perform the following steps:
- a transition layer is added between two adjacent densely connected layers, and the characteristic compression ratio of the transition layer is set according to the preset prediction accuracy.
- the deep convolutional network includes an input layer, a processing layer, and a classification layer, and the at least one processor is used to perform the following steps:
- the at least one processor when training the deep convolutional network, is used to perform the following steps:
- the value of the loss function is calculated.
- the transformation performed on the at least one first endoscopic image includes at least one of cropping, rotation, brightness jitter, color jitter, or contrast jitter.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Animal Behavior & Ethology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Heart & Thoracic Surgery (AREA)
- Optics & Photonics (AREA)
- Pathology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
- Endoscopes (AREA)
Abstract
本申请公开了一种内窥镜图像的处理方法、系统及计算机设备。该方法包括:获取待检用户的当前内窥镜图像;使用深度卷积网络基于训练参数对当前内窥镜图像进行预测,该训练参数根据至少一个第一内窥镜图像和对至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,该至少一个内窥镜图像对应于人体部位;确定出当前内窥镜图像所对应的器官类别。本发明的这种方法,使得预测过程更加智能化,鲁棒性更强,提高了处理装置的资源利用率。
Description
本申请要求于2018年10月30日提交的申请号为201811276885.2、发明名称为“内窥镜图像的处理方法、装置、系统及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及图像处理技术领域,特别涉及一种内窥镜图像的处理方法、系统及计算机设备。
当前,在我国以及全球高发的恶性肿瘤类型中,胃癌、食管癌位居前五。胃癌和食管癌,都是上消化道部位的恶性肿瘤。在实际临床中,医生通过内窥镜做电子检查,将内窥镜由口腔进入受检者的上消化道部位,藉由光源器发出的强光,经由导光纤维使光转弯,使得医生得以观察上消化道内各器官的健康状况。
但是,通过内窥镜拍摄得到医学图像,由于采集的环境、探测设备和医生拍摄习惯的差异,会导致同一个器官的内窥镜图像在视觉表现上千差万别,而不同器官的局部表现有可能会非常相似,因此,严重影响医生了进行疾病的诊断。
相关技术中,为了识别出医学图像中不同的器官,通常可以使用计算机视觉技术,抽取诸如颜色、纹理、梯度、局部二值模式(Local Binary Patterns,LBP)等特征,然后通过支持向量机(Support Vector Machine,SVM)分类方法进行器官的分类识别。但是,这种技术需要研究人员深刻理解医疗图像,才能根据图像的固有特点制定出可用的特征抽取方案,技术门槛较高。此外,所抽取的特征更偏向于通用特征,并非针对待诊断的具体身体部位而有目的的抽取特定的器官特征,使得覆盖面不全,方案的鲁棒性不够好。
发明内容
有鉴于此,本发明实施例提供了一种内窥镜图像的处理方法、系统及计算 机设备,使得预测过程更加智能化,鲁棒性更强,提高了处理装置的资源利用率。
具体地,本发明实施例的技术方案是这样实现的:
本发明提供了一种内窥镜图像的处理方法,包括:
获取待检用户的当前内窥镜图像;
使用深度卷积网络基于训练参数对所述当前内窥镜图像进行预测,所述训练参数根据至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,所述至少一个第一内窥镜图像对应于人体部位;
确定出所述当前内窥镜图像所对应的器官类别。
本发明又提供了一种内窥镜图像处理系统,包括:人体探测设备和内窥镜图像处理装置,其中,
所述人体探测设备用于,对人体部位进行探测,将探测到的至少一个第一内窥镜图像发送给所述内窥镜图像处理装置;
所述内窥镜图像处理装置用于,从所述人体探测设备获取所述至少一个第一内窥镜图像;创建用于预测内窥镜图像的深度卷积网络,并根据所述至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定所述深度卷积网络的训练参数;及,获取待检用户的当前内窥镜图像,使用所述深度卷积网络并基于所述训练参数对所述当前内窥镜图像进行预测,确定出所述当前内窥镜图像所对应的器官类别。
此外,本发明还提供了一种计算机可读存储介质,存储有计算机可读指令,所述计算机可读指令被至少一个处理器执行时,使得所述至少一个处理器加载并执行以实现下述步骤:
获取待检用户的当前内窥镜图像;
使用深度卷积网络基于训练参数对所述当前内窥镜图像进行预测,所述训练参数根据至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,所述至少一个第一内窥镜图像对应于人体部位;
确定出所述当前内窥镜图像所对应的器官类别。
此外,本发明还提供了一种计算机设备,包括至少一个存储器和至少一个处理器,所述至少一个存储器存储有至少一条程序代码,所述至少一条程序代 码由所述至少一个处理器加载并执行以实现下述步骤:
获取待检用户的当前内窥镜图像;
使用深度卷积网络基于训练参数对所述当前内窥镜图像进行预测,所述训练参数根据至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,所述至少一个第一内窥镜图像对应于人体部位;
确定出所述当前内窥镜图像所对应的器官类别。
由上述技术方案可见,本发明实施例提供的方法,使得特征抽取过程完全由深度卷积网络的模型自主学习到,而无需研究人员深刻理解医疗图像,减少了对医生的专业水平的依赖,使得整个预测过程更加的智能化;同时,可以减少训练过程中所使用的标注数据的数量,提高了训练收敛的速度,为下一步的疾病诊断提供干净可用的数据,以及为不同器官下的疾病诊断提供可用的集成模块,提升了处理装置的资源使用率。
图1为本发明一个实施例所涉及的内窥镜图像处理系统的结构示意图;
图2为本发明一个实施例中内窥镜图像的处理方法的流程示意图;
图3为本发明一个实施例中深度卷积网络的结构示意图;
图4为本发明另一个实施例中深度卷积网络的结构示意图;
图5为本发明又一个实施例中处理层的结构示意图;
图6为本发明另一个实施例中内窥镜图像的处理方法的流程示意图;
图7为本发明一个实施例中标签图像的示意图;
图8为本发明一个实施例中训练深度卷积网络的流程示意图;
图9为本发明一个实施例中内窥镜图像的处理装置的结构示意图;
图10为本发明另一个实施例中内窥镜图像的处理装置的结构示意图。
为使本发明的目的、技术方案及优点更加清楚明白,以下参照附图并举实施例,对本发明进一步详细说明。
图1为本发明一个实施例所涉及的内窥镜图像处理系统的结构示意图。如图1所示,在内窥镜图像处理系统100中包括待检用户101、包含有内窥镜1021 的人体探测设备102、内窥镜图像处理装置103以及医生104。其中,内窥镜图像处理装置103可以包括实时预测子装置1031、离线训练子装置1032以及内窥镜图像数据库1033。
根据本发明的实施例,人体探测设备102通过内窥镜1021对待检用户101的某个人体部位进行探测。人体探测设备102将采集到的内窥镜图像发送给内窥镜图像处理装置103,具体而言,可以发送给实时预测子装置1031作为待预测的当前内窥镜图像,也可以发送给内窥镜图像数据库1033进行存储,内窥镜图像数据库1033中存储的图像用于进行离线训练。
根据本发明的实施例,当医生104欲对待预测的当前内窥镜图像进行疾病诊断时,实时预测子装置1031首先需要从离线训练子装置1032获得训练参数,然后基于该训练参数以及创建的深度卷积网络对当前内窥镜图像进行预测,确定该当前内窥镜图像所对应的器官类别,例如,该器官类别可以是上消化道中的十二指肠。离线训练子装置1032在生成训练参数时,会使用和实时预测子装置1031相同的深度卷积网络,从内窥镜图像数据库1033获取到经由内窥镜采集到的图像以及经过标注的标签图像,根据内窥镜采集到的图像以及各个经过标注的标签图像进行离线训练,输出深度卷积网络的训练参数。
这里,上述人体探测设备102是指安装有内窥镜1021、具有图像采集功能的医学终端设备,其中,内窥镜1021可以包括图像传感器、光学镜头、光源照明、机械装置等。内窥镜图像处理装置103可以是服务器,或者云服务器,具备图像存储及处理功能。这些终端设备上都安装有操作系统,包括但不限于:Android操作系统、Symbian操作系统、Windows mobile操作系统、以及苹果iPhone OS操作系统等等。人体探测设备102和内窥镜图像处理装置103之间可以通过有线或者无线网络进行通信。
图2为本发明一个实施例中内窥镜图像的处理方法的流程示意图。该方法应用于计算机设备,以计算机设备为服务器为例进行说明,该实施例包括以下步骤。
步骤201,服务器获取针对人体部位的至少一个第一内窥镜图像。
也即是说,所述至少一个第一内窥镜图像对应于人体部位。
步骤202,服务器创建用于预测内窥镜图像的深度卷积网络。
步骤203,服务器根据至少一个第一内窥镜图像和对至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定深度卷积网络的训练参数。
步骤204,服务器获取待检用户的当前内窥镜图像,使用深度卷积网络并基于训练参数对当前内窥镜图像进行预测,确定出当前内窥镜图像所对应的器官类别。
也即是说,上述训练参数根据至少一个第一内窥镜图像和对至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定。
下面将基于实施例对上述步骤进行详细描述。
在上述步骤201中,可以通过使用包含有内窥镜的探测设备对人体部位进行探测,得到至少一张第一内窥镜图像。其中,人体部位包括一个或多个器官,例如,人体部位是上消化道部位,上消化道部位中包括5个器官,即咽部、食管、胃部、贲门、十二指肠。探测设备所拍摄到的可以为图片或者视频,而获取的第一内窥镜图像可以为白光RGB图像。
在上述步骤202中,用于对内窥镜图像进行分类的深度卷积网络是一种基于深度学习的卷积神经网络。具体地,深度卷积网络包括输入层、处理层和分类层。图3为本发明一个实施例中深度卷积网络的结构示意图。如图3所示,深度卷积网络中,输入层301确定作为输入的至少一个内窥镜图像;处理层302对输入的各个内窥镜图像进行特征提取;分类层303输出对输入的各个内窥镜图像预测得到的器官类别。其中,上述至少一个内窥镜图像可以包括探测设备拍摄的各个第一内窥镜图像,当然,也可以包括对各个第一内窥镜图像进行变换后得到的各个第二内窥镜图像,从而能够丰富样本容量。
其中,在处理层302中,卷积层3021通过卷积矩阵作为过滤器对内窥镜图像进行特征提取,获得特征图像;池化层3022用于简化卷积层输出的信息,减少数据维度,降低计算开销,以及控制过拟合。
在分类层303中,全连接层3031用于检测获取到的特征图像与哪种器官类别最相近。softmax层3032输出一个1×M维的分类向量,该softmax层用于进行指数归一化。其中,M为备选器官类别的个数,例如备选器官类别有六类:非器官图、咽部、食管、胃部、贲门、十二指肠。分类向量中的元素取值为[0,1],第i个元素代表的是作为输入的内窥镜图像属于第i个备选器官类别的概率。
在本发明一实施例中,服务器在创建深度卷积网络时,可以在处理层中加入至少一个密集连接层,该密集连接层包括多个连接子层;对于每个连接子层,将处于该连接子层之前的其他连接子层所输出的特征作为该连接子层的输入。
图4为本发明另一个实施例中深度卷积网络的结构示意图。如图4所示, 处理层312中包括Y个密集连接层3121~312Y。每个密集连接层包括多个连接子层,如图中3121~312Y方框内实心圆圈所示。在输出层313中输出如方框3131内所示的六个分类的概率。
图5为本发明又一个实施例中处理层的结构示意图。如图5所示,在处理层400的结构中,处于卷积层401和池化层404之间有K个密集连接层4021~402K,在同一密集连接层中,每个连接子层输出的特征都输入到后续的其他连接子层中。
假设密集连接层包括J个连接子层,第j个连接子层的处理函数为H
j,j=1,…,J,那么,第j个连接子层输出的特征z
j可以按照如下公式计算得到:
z
j=H
j([z
0,z
1,...,z
j-1]) (1)其中,[z
0,z
1,...,z
j-1]表示将序号为0到j-1的连接子层输出的特征进行级联。其中,H
j可以为块归一化(Batch Normalization,BN,也称为批量归一化)、ReLU激励和3×3卷积等操作。若输入该密集连接层的通道数为k
0,那么第j层的通道数为k
0+(j-1)×k,其中,k为成长率,随着连接子层数量的增加,通道数随着k线性增加。
在本发明一实施例中,在处理层中加入至少一个密集连接层时,为了进一步压缩参数,还可以在相邻两个密集连接层之间加入过渡层。如图5所示,在密集连接层4021和密集连接层4022之间加入了过渡层403。若有K个密集连接层,那么过渡层的个数为K-1。并且,可以根据预设的预测精度设置该过渡层的特征压缩比。由于压缩比会影响参数的数量以及预测的精度,那么根据针对内窥镜图像预设的预测精度,来设置特征压缩比采用的数值。例如,设置为0.5。
在本发明另一实施例中,服务器根据待预测的内窥镜图像的数量、预测精度以及训练过程中对超参数的调整,确定出深度卷积网络中处理层和分类层的具体参数。表1为一个深度卷积网络的结构和参数示例,一共包括4个密集连接层和3个过渡层。其中,每个密集连接层的成长率可以设置为24;在3×3卷积操作之前还可以先执行一个1×1卷积操作,从而可以减少输入的特征图的数量,也能融合各个通道的特征。而过渡层中的1×1卷积操作可以将输入的通道数量减半。
表1深度卷积网络结构及参数示例
在步骤203中,对深度卷积网络进行训练时,服务器可以根据至少一个第一内窥镜图像和对至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定深度卷积网络的训练参数。
具体而言,服务器先对至少一个第一内窥镜图像进行变换,得到变换后的至少一个第二内窥镜图像;再将至少一个第一内窥镜图像和至少一个第二内窥镜图像同时输入到深度卷积网络进行训练,获得深度卷积网络的训练参数。
在本发明实施例中,服务器对该至少一个第一内窥镜图像所做的变换可以包括剪裁、旋转、亮度抖动、颜色抖动或者对比度抖动中的至少一项。这种变换操作起到了数据增强的作用。在实际应用时,所做的变换的数量可以根据备选器官类别的数量和/或预设的预测精度来确定。
例如,如图3所示,在输入层301中,3011为从探测设备处获取到的第一内窥镜图像,将3011进行两种变换,包括:旋转变换,得到变换后的第二内窥镜图像3012;还有颜色变换,得到变换后的第二内窥镜图像3013,将3011、3012、3013同时作为输入的图像进入处理层302进行特征提取。
在步骤203中得到的训练参数以及步骤202中创建的深度卷积网络,用于后续的实时预测。在步骤204中,服务器获取待检用户的当前内窥镜图像,使用深度卷积网络并基于训练参数对当前内窥镜图像进行预测,确定出当前内窥镜图像所对应的器官类别。
例如,在输入层301中输入的当前内窥镜图像,经过预测,例如被分为“食管”类;或者,该图像为非有效的医学图像,不对应任何一个器官,属于“非器官图”类,这样医生在诊断疾病时无需参考该图像。
通过上述实施例,根据至少一个第一内窥镜图像和对至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定深度卷积网络的训练参数,将变换后的各个第二内窥镜图像作为辅助数据,可以用于对各个第一内窥镜图像的分类训练。从整体方案来看,可以获得如下技术效果:
1)使得特征抽取过程完全由深度卷积网络的模型自主学习到,而无需研究人员深刻理解医疗图像,减少了对医生的专业水平的依赖,使得整个预测过程更加的智能化;
2)可以减少训练过程中所使用的标注数据的数量,提高了训练收敛的速度,加快了图像分类的速度,提高了处理装置的资源利用率;
3)迭代结束后的训练参数会更加准确,进而通过该训练参数进行实时预测的分类结果更精准,为下一步的疾病诊断提供干净可用的数据;
4)通过这种深度卷积网络,既可以抽取到较低级的图片特征,如颜色、纹理等,也可抽取到更加抽象的语义特征,如粘膜光滑与否、是否存在大量褶皱等,具有很强的鲁棒性,可以适应不同医院不同医生拍摄同一部位的不同角度及拍摄手法等原因造成的干扰。
5)获得精准的分类结果后,可以为不同器官下的疾病诊断提供可用的集成模块,比如对于食管器官,将分类后属于食管类别的所有内窥镜图像用于诊断食管癌症筛查;对于胃部器官,将分类后属于胃部类别的所有内窥镜图像用于胃炎、胃癌等疾病的筛查;
此外,在创建深度卷积网络时,加入至少一个密集连接层,可以最大化网 络中所有层之间的信息流,在一定程度上减轻了训练过程中梯度消散的问题,并且,由于大量的特征被复用,使得使用少量的卷积核就可以生成大量的特征,最终模型的尺寸也比较小,减少了参数数量。
图6为本发明另一个实施例中内窥镜图像的处理方法的流程示意图。如图6所示,包括如下步骤:
步骤501,服务器获取针对人体部位的至少一个第一内窥镜图像。
步骤502,服务器对至少一个第一内窥镜图像进行变换,得到变换后的至少一个第二内窥镜图像。
步骤503,服务器创建用于预测内窥镜图像的深度卷积网络。
这里,步骤501-503可参照上述步骤201、203、202,在此不再赘述。
步骤504,服务器根据人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别。
本步骤中,人体部位可以包括多个器官,在利用深度卷积网络对内窥镜图像进行预测时,需要事先确定出多个备选器官类别。具体而言,对人体部位进行划分时,可以参照预设的诊断目标,划分出多个区域,进而确定出多个备选器官类别。例如,目前高发的恶性肿瘤类型中,胃癌、食管癌最为广泛,那么诊断目标被设置为对这两个器官进行确诊,那么备选器官类别可以设为胃、食管及其他这三类。
步骤505,服务器获取每个备选器官类别所对应的标签图像。
本发明实施例中,这些标签图像可以从医学图像数据库中获取,人工进行标注;或者,也可以从采集到的第一内窥镜图像中过滤出具备备选器官典型特征的那些图像。图7为本发明一个实施例中标签图像的示意图。如图7所示,分别给出了十二指肠、食管、胃和眼部的多张标签图像。
步骤506,在训练深度卷积网络时,服务器将至少一个第一内窥镜图像和至少一个第二内窥镜图像作为输入样本,将各个标签图像作为理想的输出样本(也即是目标输出样本)进行训练,得到深度卷积网络的训练参数。
本发明实施例中,深度神经网络在训练的过程中根据输入的图像样本和理想的输出样本,通过迭代的方式逐步进行权值的调整,直到收敛。
步骤507,服务器获取待检用户的当前内窥镜图像,使用深度卷积网络并基于训练参数对当前内窥镜图像进行预测,确定出当前内窥镜图像所对应的器官类别。
此步骤和上述步骤204相同,在此不再赘述。
通过上述实施例,考虑到同一器官的医学图像可能差异很大,通过合理的设计备选器官类别,以及输入样本中包含对各个第一内窥镜图像进行变换产生畸变了的各个第二内窥镜图像,可以大大减少标签图像的数量,解决了在训练深度卷积网络时作为标注数据的标签图像数量有限的问题。
图8为本发明一个实施例中训练深度卷积网络的流程示意图。如图8所示,包括如下步骤:
步骤701,服务器获取针对人体部位的至少一个第一内窥镜图像。
步骤702,服务器对至少一个第一内窥镜图像进行变换,得到变换后的至少一个第二内窥镜图像。
步骤703,服务器创建用于预测内窥镜图像的深度卷积网络,深度卷积网络包括输入层、处理层和分类层。
在训练深度卷积网络时,可以通过反向传播算法来调整参数,迭代至收敛。反向传播算法可以分成4个不同的部分:向前传递、计算损失函数、反向传递以及更新参数。在向前传递的过程中,输入初始样本数据,包括至少一个第一内窥镜图像和变换后的至少一个第二内窥镜图像,在处理层中传递它。通过构建损失函数,可以帮助深度卷积网络更新训练参数直至收敛。
步骤704,服务器预先构建用于训练深度卷积网络的损失函数。
本步骤中,根据预设的收敛策略构建损失函数。
在本发明一实施例中,针对输入的第一内窥镜图像以及变换后的第二内窥镜图像,收敛策略具体为一致性约束策略,即通过不同变换的同一张内窥镜图像,模型所抽取到的特征应该很接近。
在本发明另一实施例中,针对输入的第一内窥镜图像以及所归属的器官类别的特征中心,收敛策略具体为中心聚合策略,即属于同一器官类别的第一内窥镜图像之间的距离减少,即类内距离减少,同时,不同器官类别之间的内窥镜图像之间的距离变大,即类间距离变大。
步骤705,服务器输入至少一个第一内窥镜图像和至少一个第二内窥镜图像,并初始化深度卷积网络。
本步骤中,初始化深度卷积网络包括两个初始化过程:
1)初始化深度卷积网络的训练参数w,包括处理层和输出层中各个子层的权值。例如,采取随机初始化的方式,将训练参数的初始值确定为随机值[0.3,0.1, 0.4,0.2,0.3....]。
2)初始化第一内窥镜图像对应的中心特征,例如,将各个类别的标签图像的平均值作为中心特征的初始值。
在训练刚开始时,初始化的训练参数和中心特征会导致损失函数的取值很高。而训练深度神经网络的目的是希望预测值和真实值一样。为此,需要尽量减少损失函数的取值,损失值越小就说明预测结果越接近。在这一个过程中,将迭代调整训练参数和中心特征,在每次迭代时计算损失函数的取值,最终使整个网络的损失达到最小值。
在计算损失函数时,下述步骤706、707对应于上述的一致性约束策略;下述步骤708对应的是上述中心聚合策略。
步骤706,服务器获取处理层处理至少一个第一内窥镜图像得到的至少一个处理后特征。
步骤707,服务器根据至少一个处理后特征以及至少一个第二内窥镜图像的特征,计算此次迭代时损失函数的取值。
上述的一致性约束策略表现为在损失函数中,分别计算各个第一内窥镜图像的处理后特征与各个第二内窥镜图像的特征之间的多个第一距离,通过多个第一距离来约束第一内窥镜图像和第二内窥镜图像之间的一致性。
具体而言,若训练参数为w,输入的第i个第一内窥镜图像的特征向量为x
i,第i个标签图像的特征向量为y
i,损失函数L(w)可以按照如下公式进行迭代计算:
其中,n为输入的第一内窥镜图像的数量;m为对第一内窥镜图像进行变换得到的第二内窥镜图像的数量;y
ilogf(x
i;w)表示分类交叉熵损失;
表示参数L2正则;h
0为第一内窥镜图像经由处理层输出的特征向量,即处理后特征的向量;h
k为第k个第二内窥镜图像的特征向量;r和λ为超参数,二者都为大于0的数值。
此外,i为大于或等于1且小于或等于n的整数,k为大于或等于1且小于或等于m的整数。
步骤708,服务器计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离。
步骤709,服务器根据该多个第一距离和该多个第二距离,计算损失函数的取值。
若只考虑一致性约束,则可以参照上述公式(2)计算得到损失函数的取值。
当同时考虑一致性约束和中心聚合策略,则根据第一距离和第二距离计算损失函数的取值,具体计算为
其中,有
步骤710,服务器根据损失函数的取值确定训练过程是否结束。若是,执行步骤713;否则,执行步骤711和712。
在逐次迭代的过程中,最小化损失函数,即min L(w)。通过判断损失函数的取值是否达到可接受的阈值,来判断是否停止迭代。在停止迭代后,整个训练过程则结束。
步骤711,服务器更新训练参数。然后,进一步执行步骤706,进行下一次的迭代处理。
步骤712,服务器更新中心特征。然后,进一步执行步骤708,进行下一次的迭代处理。
步骤713,服务器在训练结束后获得深度卷积网络的训练参数。
通过上述实施例,预先构建用于训练深度卷积网络的损失函数,根据第一内窥镜图像的处理后特征以及每个第二内窥镜图像的特征,计算每次迭代时损失函数的取值,引入了一致性约束,可以优先找到更稳定的特征,加速训练过程的收敛速度,直到获得最优解。此外,在损失函数中考虑中心聚合策略,可以保证模型针对每个类别学习的特征更加稳定和内聚,进一步提升模型在真实环境中的泛化能力。
图9为本发明一个实施例中内窥镜图像的处理装置的结构示意图。如图9所示,装置800包括:
获取模块810,用于经由包含有内窥镜的探测设备,获取针对人体部位的第一内窥镜图像;获取待检用户的当前内窥镜图像;
创建模块820,用于创建用于预测内窥镜图像的深度卷积网络,并根据获取模块810获取到的第一内窥镜图像和对第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定深度卷积网络的训练参数;及,
预测模块830,用于使用创建模块820创建的深度卷积网络并基于训练参数对当前内窥镜图像进行预测,确定出当前内窥镜图像所对应的器官类别。
在本发明一实施例中,装置800进一步包括:
确定模块840,用于根据人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;获取每个备选器官类别所对应的标签图像;
创建模块820用于,将第一内窥镜图像和至少一个第二内窥镜图像作为输入样本,将确定模块840确定的各个标签图像作为理想的输出样本,得到训练参数。
在本发明一实施例中,深度卷积网络包括输入层、处理层和分类层,创建模块820用于,在处理层中加入至少一个密集连接层,密集连接层包括多个连接子层;对于每个连接子层,将处于该连接子层之前的其他连接子层所输出的特征作为该连接子层的输入。
在本发明一实施例中,创建模块820用于,在相邻两个密集连接层之间加入过渡层,并根据预设的预测精度设置该过渡层的特征压缩比的数值。
在本发明一实施例中,深度卷积网络包括输入层、处理层和分类层,装置800进一步包括:
构建模块850,用于预先构建用于训练深度卷积网络的损失函数;
创建模块820用于,在训练深度卷积网络时,迭代执行如下处理:获取处理层处理第一内窥镜图像得到的处理后特征;根据处理后特征以及每个第二内窥镜图像的特征,计算此次迭代时损失函数的取值;根据损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得训练参数。
在本发明一实施例中,创建模块820进一步用于,初始化第一内窥镜图像所属器官类别的中心特征;分别计算处理后特征以及每个第二内窥镜图像的特征之间的第一距离;计算第一内窥镜图像的特征以及第一内窥镜图像对应的中 心特征之间的第二距离;根据第一距离和第二距离,计算取值。
在本发明一实施例中,对第一内窥镜图像所做的变换包括剪裁、旋转、亮度抖动、颜色抖动、对比度抖动中的至少一项。
图10为本发明另一个实施例中内窥镜图像的处理装置的结构示意图。如图10所示,装置900包括:处理器910、存储器920、端口930以及总线940。处理器910和存储器920通过总线940互联。处理器910可通过端口930接收和发送数据。其中,
处理器910用于执行存储器920存储的机器可读指令模块。
存储器920存储有处理器910可执行的机器可读指令模块。处理器910可执行的指令模块包括:获取模块921、创建模块922和预测模块923。其中,
获取模块921被处理器910执行时可以为:用于经由包含有内窥镜的探测设备,获取针对人体部位的第一内窥镜图像;获取待检用户的当前内窥镜图像;
创建模块922被处理器910执行时可以为:用于创建用于预测内窥镜图像的深度卷积网络,并根据获取模块921获取到的第一内窥镜图像和对第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定深度卷积网络的训练参数;
预测模块923被处理器910执行时可以为:使用创建模块922创建的深度卷积网络并基于训练参数对当前内窥镜图像进行预测,确定出当前内窥镜图像所对应的器官类别。
在本发明一实施例中,处理器910可执行的指令模块还包括:确定模块924,其中,确定模块924被处理器910执行时可以为:根据人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;在训练深度卷积网络之前,获取每个备选器官类别所对应的标签图像;
创建模块922被处理器910执行时可以为:将第一内窥镜图像和至少一个第二内窥镜图像作为输入样本,将确定模块924确定的各个标签图像作为理想的输出样本,得到训练参数。
在本发明一实施例中,处理器910可执行的指令模块还包括:构建模块925,其中,构建模块925被处理器910执行时可以为:预先构建用于训练深度卷积网络的损失函数;
创建模块922被处理器910执行时可以为:在训练深度卷积网络时,迭代执行如下处理:获取处理层处理第一内窥镜图像得到的处理后特征;根据处理后特征以及每个第二内窥镜图像的特征,计算此次迭代时损失函数的取值;根 据损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得训练参数。
由此可以看出,当存储在存储器920中的指令模块被处理器910执行时,可实现前述各个实施例中获取模块、创建模块、预测模块、确定模块和构建模块的各种功能。
上述装置实施例中,各个模块及单元实现自身功能的具体方法在方法实施例中均有描述,这里不再赘述。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
在一个实施例中,提供了一种内窥镜图像处理系统,包括:人体探测设备和内窥镜图像处理装置,其中,该人体探测设备用于,对人体部位进行探测,将探测到的至少一个第一内窥镜图像发送给该内窥镜图像处理装置;
该内窥镜图像处理装置用于,从该人体探测设备获取该至少一个第一内窥镜图像;创建用于预测内窥镜图像的深度卷积网络,根据该至少一个第一内窥镜图像和对该至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定该深度卷积网络的训练参数;及,获取待检用户的当前内窥镜图像,使用该深度卷积网络并基于该训练参数对该当前内窥镜图像进行预测,确定出该当前内窥镜图像所对应的器官类别。
在一个实施例中,该内窥镜图像处理装置进一步用于,根据该人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;获取每个备选器官类别所对应的标签图像;将该至少一个第一内窥镜图像和该至少一个第二内窥镜图像作为输入样本,将各个标签图像作为目标输出样本进行训练,得到该训练参数。
在一个实施例中,该深度卷积网络包括输入层、处理层和分类层,该内窥镜图像处理装置进一步用于,预先构建用于训练该深度卷积网络的损失函数;在训练该深度卷积网络时,迭代执行如下处理:获取该处理层处理该至少一个第一内窥镜图像得到的至少一个处理后特征;根据该至少一个处理后特征以及该至少一个第二内窥镜图像的特征,计算此次迭代时该损失函数的取值;根据该损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获 得该训练参数。
在一个实施例中,在训练该深度卷积网络时,该内窥镜图像处理装置进一步用于,初始化该至少一个第一内窥镜图像所属器官类别的中心特征;分别计算各个处理后特征以及各个第二内窥镜图像的特征之间的多个第一距离;计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离;根据该多个第一距离和该多个第二距离,计算该损失函数的取值。
在一个实施例中,提供了一种计算机设备,包括至少一个存储器和至少一个处理器,该至少一个存储器存储有至少一条程序代码,该至少一条程序代码由该至少一个处理器加载并执行以实现下述步骤:
获取待检用户的当前内窥镜图像;
使用深度卷积网络基于训练参数对该当前内窥镜图像进行预测,该训练参数根据至少一个第一内窥镜图像和对该至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,该至少一个第一内窥镜图像对应于人体部位;
确定出该当前内窥镜图像所对应的器官类别。
在一个实施例中,该至少一个处理器用于执行下述步骤:
获取针对人体部位的至少一个第一内窥镜图像;
创建用于预测内窥镜图像的深度卷积网络,根据该至少一个第一内窥镜图像和对该至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定该深度卷积网络的训练参数。
在一个实施例中,该至少一个处理器用于执行下述步骤:
根据该人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;
获取每个备选器官类别所对应的标签图像;
将该至少一个第一内窥镜图像和该至少一个第二内窥镜图像作为输入样本,将各个标签图像作为目标输出样本进行训练,得到该训练参数。
在一个实施例中,该深度卷积网络包括输入层、处理层和分类层,该至少一个处理器用于执行下述步骤:
在该处理层中加入至少一个密集连接层,该密集连接层包括多个连接子层;
对于每个连接子层,将处于该连接子层之前的其他连接子层所输出的特征作为该连接子层的输入。
在一个实施例中,该至少一个处理器用于执行下述步骤:
在相邻两个密集连接层之间加入过渡层,根据预设的预测精度设置该过渡层的特征压缩比。
在一个实施例中,该深度卷积网络包括输入层、处理层和分类层,该至少一个处理器用于执行下述步骤:
预先构建用于训练该深度卷积网络的损失函数;
在训练该深度卷积网络时,迭代执行如下处理:
获取该处理层处理该至少一个第一内窥镜图像得到的至少一个处理后特征;
根据该至少一个处理后特征以及该至少一个第二内窥镜图像的特征,计算此次迭代时该损失函数的取值;
根据该损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得该训练参数。
在一个实施例中,在训练该深度卷积网络时,该至少一个处理器用于执行下述步骤:
初始化该至少一个第一内窥镜图像所属器官类别的中心特征;
分别计算各个处理后特征以及各个第二内窥镜图像的特征之间的多个第一距离;
计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离;
根据该多个第一距离和该多个第二距离,计算该损失函数的取值。
在一个实施例中,对该至少一个第一内窥镜图像所做的变换包括剪裁、旋转、亮度抖动、颜色抖动或者对比度抖动中的至少一项。
另外,本发明的每一个实施例可以通过由数据处理设备如计算机执行的数据处理程序来实现。显然,数据处理程序构成了本发明。此外,通常存储在一个存储介质中的数据处理程序通过直接将程序读取出存储介质或者通过将程序安装或复制到数据处理设备的存储设备(如硬盘和或内存)中执行。因此,这样的存储介质也构成了本发明。存储介质可以使用任何类别的记录方式,例如纸张存储介质(如纸带等)、磁存储介质(如软盘、硬盘、闪存等)、光存储介质(如CD-ROM等)、磁光存储介质(如MO等)等。
因此,本发明还公开了一种存储介质,其中存储有数据处理程序,该数据处理程序用于执行本发明上述方法的任何一种实施例。
在一些实施例中,该存储介质可以为一种计算机可读存储介质,存储有计算机可读指令,当该计算机可读指令被至少一个处理器执行时,使得该至少一个处理器可以加载并执行以实现下述步骤:
获取待检用户的当前内窥镜图像;
使用深度卷积网络基于训练参数对该当前内窥镜图像进行预测,该训练参数根据至少一个第一内窥镜图像和对该至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,该至少一个第一内窥镜图像对应于人体部位;
确定出该当前内窥镜图像所对应的器官类别。
在一个实施例中,该至少一个处理器用于执行下述步骤:
获取针对人体部位的至少一个第一内窥镜图像;
创建用于预测内窥镜图像的深度卷积网络,根据该至少一个第一内窥镜图像和对该至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定该深度卷积网络的训练参数。
在一个实施例中,该至少一个处理器用于执行下述步骤:
根据该人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;
获取每个备选器官类别所对应的标签图像;
将该至少一个第一内窥镜图像和该至少一个第二内窥镜图像作为输入样本,将各个标签图像作为目标输出样本进行训练,得到该训练参数。
在一个实施例中,该深度卷积网络包括输入层、处理层和分类层,该至少一个处理器用于执行下述步骤:
在该处理层中加入至少一个密集连接层,该密集连接层包括多个连接子层;
对于每个连接子层,将处于该连接子层之前的其他连接子层所输出的特征作为该连接子层的输入。
在一个实施例中,该至少一个处理器用于执行下述步骤:
在相邻两个密集连接层之间加入过渡层,根据预设的预测精度设置该过渡层的特征压缩比。
在一个实施例中,该深度卷积网络包括输入层、处理层和分类层,该至少一个处理器用于执行下述步骤:
预先构建用于训练该深度卷积网络的损失函数;
在训练该深度卷积网络时,迭代执行如下处理:
获取该处理层处理该至少一个第一内窥镜图像得到的至少一个处理后特 征;
根据该至少一个处理后特征以及该至少一个第二内窥镜图像的特征,计算此次迭代时该损失函数的取值;
根据该损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得该训练参数。
在一个实施例中,在训练该深度卷积网络时,该至少一个处理器用于执行下述步骤:
初始化该至少一个第一内窥镜图像所属器官类别的中心特征;
分别计算各个处理后特征以及各个第二内窥镜图像的特征之间的多个第一距离;
计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离;
根据该多个第一距离和该多个第二距离,计算该损失函数的取值。
在一个实施例中,对该至少一个第一内窥镜图像所做的变换包括剪裁、旋转、亮度抖动、颜色抖动或者对比度抖动中的至少一项。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
Claims (20)
- 一种内窥镜图像的处理方法,其特征在于,包括:获取待检用户的当前内窥镜图像;使用深度卷积网络基于训练参数对所述当前内窥镜图像进行预测,所述训练参数根据至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,所述至少一个第一内窥镜图像对应于人体部位;确定出所述当前内窥镜图像所对应的器官类别。
- 根据权利要求1所述的方法,其特征在于,所述使用深度卷积网络基于训练参数对所述当前内窥镜图像进行预测之前,所述方法还包括:获取针对人体部位的至少一个第一内窥镜图像;创建用于预测内窥镜图像的深度卷积网络,根据所述至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定所述深度卷积网络的训练参数。
- 根据权利要求2所述的方法,进一步包括:根据所述人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;获取每个备选器官类别所对应的标签图像;所述根据所述至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定所述深度卷积网络的训练参数包括:将所述至少一个第一内窥镜图像和所述至少一个第二内窥镜图像作为输入样本,将各个标签图像作为目标输出样本进行训练,得到所述训练参数。
- 根据权利要求2所述的方法,其中,所述深度卷积网络包括输入层、处理层和分类层,所述创建用于预测内窥镜图像的深度卷积网络包括:在所述处理层中加入至少一个密集连接层,所述密集连接层包括多个连接子层;对于每个连接子层,将处于该连接子层之前的其他连接子层所输出的特征作为该连接子层的输入。
- 根据权利要求4所述的方法,其中,所述在所述处理层中加入至少一个 密集连接层包括:在相邻两个密集连接层之间加入过渡层,根据预设的预测精度设置该过渡层的特征压缩比。
- 根据权利要求2所述的方法,其中,所述深度卷积网络包括输入层、处理层和分类层,所述方法进一步包括:预先构建用于训练所述深度卷积网络的损失函数;所述根据所述至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定所述深度卷积网络的训练参数包括:在训练所述深度卷积网络时,迭代执行如下处理:获取所述处理层处理所述至少一个第一内窥镜图像得到的至少一个处理后特征;根据所述至少一个处理后特征以及所述至少一个第二内窥镜图像的特征,计算此次迭代时所述损失函数的取值;根据所述损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得所述训练参数。
- 根据权利要求6所述的方法,其中,在训练所述深度卷积网络时,所述方法进一步包括:初始化所述至少一个第一内窥镜图像所属器官类别的中心特征;所述根据所述至少一个处理后特征以及所述至少一个第二内窥镜图像的特征,计算此次迭代时所述损失函数的取值包括:分别计算各个处理后特征以及各个第二内窥镜图像的特征之间的多个第一距离;计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离;根据所述多个第一距离和所述多个第二距离,计算所述损失函数的取值。
- 根据权利要求1所述的方法,其中,对所述至少一个第一内窥镜图像所做的变换包括剪裁、旋转、亮度抖动、颜色抖动或者对比度抖动中的至少一项。
- 一种内窥镜图像处理系统,其特征在于,包括:人体探测设备和内窥镜图像处理装置,其中,所述人体探测设备用于,对人体部位进行探测,将探测到的至少一个第一内窥镜图像发送给所述内窥镜图像处理装置;所述内窥镜图像处理装置用于,从所述人体探测设备获取所述至少一个第一内窥镜图像;创建用于预测内窥镜图像的深度卷积网络,根据所述至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定所述深度卷积网络的训练参数;及,获取待检用户的当前内窥镜图像,使用所述深度卷积网络并基于所述训练参数对所述当前内窥镜图像进行预测,确定出所述当前内窥镜图像所对应的器官类别。
- 根据权利要求9所述的系统,其中,所述内窥镜图像处理装置进一步用于,根据所述人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;获取每个备选器官类别所对应的标签图像;将所述至少一个第一内窥镜图像和所述至少一个第二内窥镜图像作为输入样本,将各个标签图像作为目标输出样本进行训练,得到所述训练参数。
- 根据权利要求9所述的系统,其中,所述深度卷积网络包括输入层、处理层和分类层,所述内窥镜图像处理装置进一步用于,预先构建用于训练所述深度卷积网络的损失函数;在训练所述深度卷积网络时,迭代执行如下处理:获取所述处理层处理所述至少一个第一内窥镜图像得到的至少一个处理后特征;根据所述至少一个处理后特征以及所述至少一个第二内窥镜图像的特征,计算此次迭代时所述损失函数的取值;根据所述损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得所述训练参数。
- 根据权利要求11所述的系统,其中,在训练所述深度卷积网络时,所述内窥镜图像处理装置进一步用于,初始化所述至少一个第一内窥镜图像所属器官类别的中心特征;分别计算各个处理后特征以及各个第二内窥镜图像的特征之间的多个第一距离;计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离;根据所述多个第一距离和所述多个第二距离,计算所述损失函数的取值。
- 一种计算机设备,其特征在于,包括至少一个存储器和至少一个处理器,所述至少一个存储器存储有至少一条程序代码,所述至少一条程序代码由所述至少一个处理器加载并执行以实现下述步骤:获取待检用户的当前内窥镜图像;使用深度卷积网络基于训练参数对所述当前内窥镜图像进行预测,所述训练参数根据至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像而确定,所述至少一个第一内窥镜图像对应于人体部位;确定出所述当前内窥镜图像所对应的器官类别。
- 根据权利要求13所述的计算机设备,其特征在于,所述至少一个处理器用于执行下述步骤:获取针对人体部位的至少一个第一内窥镜图像;创建用于预测内窥镜图像的深度卷积网络,根据所述至少一个第一内窥镜图像和对所述至少一个第一内窥镜图像进行变换后的至少一个第二内窥镜图像,确定所述深度卷积网络的训练参数。
- 根据权利要求14所述的计算机设备,其特征在于,所述至少一个处理器用于执行下述步骤:根据所述人体部位的结构以及预设的诊断目标,确定至少一个备选器官类别;获取每个备选器官类别所对应的标签图像;将所述至少一个第一内窥镜图像和所述至少一个第二内窥镜图像作为输入样本,将各个标签图像作为目标输出样本进行训练,得到所述训练参数。
- 根据权利要求14所述的计算机设备,其特征在于,所述深度卷积网络包括输入层、处理层和分类层,所述至少一个处理器用于执行下述步骤:在所述处理层中加入至少一个密集连接层,所述密集连接层包括多个连接子层;对于每个连接子层,将处于该连接子层之前的其他连接子层所输出的特征作为该连接子层的输入。
- 根据权利要求16所述的计算机设备,其特征在于,所述至少一个处理器用于执行下述步骤:在相邻两个密集连接层之间加入过渡层,根据预设的预测精度设置该过渡层的特征压缩比。
- 根据权利要求14所述的计算机设备,其特征在于,所述深度卷积网络包括输入层、处理层和分类层,所述至少一个处理器用于执行下述步骤:预先构建用于训练所述深度卷积网络的损失函数;在训练所述深度卷积网络时,迭代执行如下处理:获取所述处理层处理所述至少一个第一内窥镜图像得到的至少一个处理后特征;根据所述至少一个处理后特征以及所述至少一个第二内窥镜图像的特征,计算此次迭代时所述损失函数的取值;根据所述损失函数的取值确定训练过程是否结束,其中,当确定训练过程结束时,获得所述训练参数。
- 根据权利要求18所述的计算机设备,其特征在于,在训练所述深度卷积网络时,所述至少一个处理器用于执行下述步骤:初始化所述至少一个第一内窥镜图像所属器官类别的中心特征;分别计算各个处理后特征以及各个第二内窥镜图像的特征之间的多个第一距离;计算各个第一内窥镜图像的特征以及各个第一内窥镜图像对应的中心特征之间的多个第二距离;根据所述多个第一距离和所述多个第二距离,计算所述损失函数的取值。
- 根据权利要求13所述的计算机设备,其特征在于,对所述至少一个第一内窥镜图像所做的变换包括剪裁、旋转、亮度抖动、颜色抖动或者对比度抖动中的至少一项。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020560333A JP7214291B2 (ja) | 2018-10-30 | 2019-10-21 | コンピュータデバイスの作動方法、コンピュータデバイス、およびコンピュータプログラム、ならびに、内視鏡画像処理システム |
EP19879131.1A EP3876190B1 (en) | 2018-10-30 | 2019-10-21 | Endoscopic image processing method and system and computer device |
US17/078,826 US11849914B2 (en) | 2018-10-30 | 2020-10-23 | Endoscopic image processing method and system, and computer device |
US18/506,545 US20240081618A1 (en) | 2018-10-30 | 2023-11-10 | Endoscopic image processing |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811276885.2 | 2018-10-30 | ||
CN201811276885.2A CN109523522B (zh) | 2018-10-30 | 2018-10-30 | 内窥镜图像的处理方法、装置、系统及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/078,826 Continuation US11849914B2 (en) | 2018-10-30 | 2020-10-23 | Endoscopic image processing method and system, and computer device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020088288A1 true WO2020088288A1 (zh) | 2020-05-07 |
Family
ID=65774370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/112202 WO2020088288A1 (zh) | 2018-10-30 | 2019-10-21 | 内窥镜图像的处理方法、系统及计算机设备 |
Country Status (5)
Country | Link |
---|---|
US (2) | US11849914B2 (zh) |
EP (1) | EP3876190B1 (zh) |
JP (1) | JP7214291B2 (zh) |
CN (1) | CN109523522B (zh) |
WO (1) | WO2020088288A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814655A (zh) * | 2020-07-03 | 2020-10-23 | 浙江大华技术股份有限公司 | 目标重识别方法及其网络训练方法、相关装置 |
CN113469959A (zh) * | 2021-06-16 | 2021-10-01 | 北京理工大学 | 基于质量缺陷成像模型的对抗训练优化方法及装置 |
CN113706526A (zh) * | 2021-10-26 | 2021-11-26 | 北京字节跳动网络技术有限公司 | 内窥镜图像特征学习模型、分类模型的训练方法和装置 |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109523522B (zh) * | 2018-10-30 | 2023-05-09 | 腾讯医疗健康(深圳)有限公司 | 内窥镜图像的处理方法、装置、系统及存储介质 |
CN110084279A (zh) * | 2019-03-29 | 2019-08-02 | 广州思德医疗科技有限公司 | 一种确定分类标签的方法及装置 |
CN110097083A (zh) * | 2019-03-29 | 2019-08-06 | 广州思德医疗科技有限公司 | 一种确定分类标签的方法及装置 |
CN110084280B (zh) * | 2019-03-29 | 2021-08-31 | 广州思德医疗科技有限公司 | 一种确定分类标签的方法及装置 |
CN110490856B (zh) * | 2019-05-06 | 2021-01-15 | 腾讯医疗健康(深圳)有限公司 | 医疗内窥镜图像的处理方法、系统、机器设备和介质 |
CN110495847B (zh) * | 2019-08-23 | 2021-10-08 | 重庆天如生物科技有限公司 | 基于深度学习的消化道早癌辅助诊断系统和检查装置 |
EP3786765A1 (en) * | 2019-08-29 | 2021-03-03 | Leica Instruments (Singapore) Pte. Ltd. | Microscope, control circuit, method and computer program for generating information on at least one inspected region of an image |
CN113288007B (zh) * | 2019-12-06 | 2022-08-09 | 腾讯科技(深圳)有限公司 | 内窥镜移动时间确定方法、装置和计算机设备 |
CN110859624A (zh) * | 2019-12-11 | 2020-03-06 | 北京航空航天大学 | 一种基于结构磁共振影像的大脑年龄深度学习预测系统 |
CN110974142B (zh) * | 2019-12-20 | 2020-08-18 | 山东大学齐鲁医院 | 共聚焦激光显微内镜实时同步内镜病变定位系统 |
CN113143168A (zh) * | 2020-01-07 | 2021-07-23 | 日本电气株式会社 | 医疗辅助操作方法、装置、设备和计算机存储介质 |
CN111860542B (zh) * | 2020-07-22 | 2024-06-28 | 海尔优家智能科技(北京)有限公司 | 用于识别物品类别的方法及装置、电子设备 |
CN112907726B (zh) * | 2021-01-25 | 2022-09-20 | 重庆金山医疗技术研究院有限公司 | 一种图像处理方法、装置、设备及计算机可读存储介质 |
CN112906682A (zh) * | 2021-02-07 | 2021-06-04 | 杭州海康慧影科技有限公司 | 控制光源亮度的方法、装置及计算机存储介质 |
CN113486990B (zh) * | 2021-09-06 | 2021-12-21 | 北京字节跳动网络技术有限公司 | 内窥镜图像分类模型的训练方法、图像分类方法和装置 |
CN113822894B (zh) * | 2021-11-25 | 2022-02-08 | 武汉大学 | 十二指肠胰头图像识别方法和十二指肠胰头图像识别装置 |
CN114464316B (zh) * | 2022-04-11 | 2022-07-19 | 武汉大学 | 胃部异常风险等级预测方法、装置、终端及可读存储介质 |
CN114511749B (zh) * | 2022-04-19 | 2022-06-28 | 武汉大学 | 图像处理方法、装置、计算机设备及存储介质 |
CN117974668B (zh) * | 2024-04-02 | 2024-08-13 | 青岛美迪康数字工程有限公司 | 基于ai的新型胃黏膜可视度评分量化方法、装置和设备 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106022221A (zh) * | 2016-05-09 | 2016-10-12 | 腾讯科技(深圳)有限公司 | 一种图像处理方法及处理系统 |
CN106097340A (zh) * | 2016-06-12 | 2016-11-09 | 山东大学 | 一种基于卷积分类器的自动检测并勾画肺结节所在位置的方法 |
WO2017175282A1 (ja) * | 2016-04-04 | 2017-10-12 | オリンパス株式会社 | 学習方法、画像認識装置およびプログラム |
CN107730489A (zh) * | 2017-10-09 | 2018-02-23 | 杭州电子科技大学 | 无线胶囊内窥镜小肠病变计算机辅助检测系统及检测方法 |
CN108615037A (zh) * | 2018-05-31 | 2018-10-02 | 武汉大学人民医院(湖北省人民医院) | 基于深度学习的可控胶囊内镜操作实时辅助系统及操作方法 |
CN109523522A (zh) * | 2018-10-30 | 2019-03-26 | 腾讯科技(深圳)有限公司 | 内窥镜图像的处理方法、装置、系统及存储介质 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6292791B1 (en) * | 1998-02-27 | 2001-09-18 | Industrial Technology Research Institute | Method and apparatus of synthesizing plucked string instruments using recurrent neural networks |
US10861151B2 (en) * | 2015-08-07 | 2020-12-08 | The Arizona Board Of Regents On Behalf Of Arizona State University | Methods, systems, and media for simultaneously monitoring colonoscopic video quality and detecting polyps in colonoscopy |
JP6528608B2 (ja) * | 2015-08-28 | 2019-06-12 | カシオ計算機株式会社 | 診断装置、及び診断装置における学習処理方法、並びにプログラム |
WO2017055412A1 (en) * | 2015-09-30 | 2017-04-06 | Siemens Healthcare Gmbh | Method and system for classification of endoscopic images using deep decision networks |
US10007866B2 (en) * | 2016-04-28 | 2018-06-26 | Microsoft Technology Licensing, Llc | Neural network image classifier |
US10803582B2 (en) * | 2016-07-04 | 2020-10-13 | Nec Corporation | Image diagnosis learning device, image diagnosis device, image diagnosis method, and recording medium for storing program |
CN106920227B (zh) * | 2016-12-27 | 2019-06-07 | 北京工业大学 | 基于深度学习与传统方法相结合的视网膜血管分割方法 |
WO2018225448A1 (ja) * | 2017-06-09 | 2018-12-13 | 智裕 多田 | 消化器官の内視鏡画像による疾患の診断支援方法、診断支援システム、診断支援プログラム及びこの診断支援プログラムを記憶したコンピュータ読み取り可能な記録媒体 |
US20190005377A1 (en) * | 2017-06-30 | 2019-01-03 | Advanced Micro Devices, Inc. | Artificial neural network reduction to reduce inference computation time |
CN108304936B (zh) * | 2017-07-12 | 2021-11-16 | 腾讯科技(深圳)有限公司 | 机器学习模型训练方法和装置、表情图像分类方法和装置 |
CN107977969B (zh) * | 2017-12-11 | 2020-07-21 | 北京数字精准医疗科技有限公司 | 一种内窥镜荧光图像的分割方法、装置及存储介质 |
CN108108807B (zh) * | 2017-12-29 | 2020-06-02 | 北京达佳互联信息技术有限公司 | 学习型图像处理方法、系统及服务器 |
CN108256450A (zh) * | 2018-01-04 | 2018-07-06 | 天津大学 | 一种基于深度学习的人脸识别和人脸验证的监督学习方法 |
CN108596090B (zh) * | 2018-04-24 | 2019-08-27 | 北京达佳互联信息技术有限公司 | 人脸图像关键点检测方法、装置、计算机设备及存储介质 |
EP3853764A1 (en) * | 2018-09-20 | 2021-07-28 | NVIDIA Corporation | Training neural networks for vehicle re-identification |
-
2018
- 2018-10-30 CN CN201811276885.2A patent/CN109523522B/zh active Active
-
2019
- 2019-10-21 JP JP2020560333A patent/JP7214291B2/ja active Active
- 2019-10-21 EP EP19879131.1A patent/EP3876190B1/en active Active
- 2019-10-21 WO PCT/CN2019/112202 patent/WO2020088288A1/zh unknown
-
2020
- 2020-10-23 US US17/078,826 patent/US11849914B2/en active Active
-
2023
- 2023-11-10 US US18/506,545 patent/US20240081618A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017175282A1 (ja) * | 2016-04-04 | 2017-10-12 | オリンパス株式会社 | 学習方法、画像認識装置およびプログラム |
CN106022221A (zh) * | 2016-05-09 | 2016-10-12 | 腾讯科技(深圳)有限公司 | 一种图像处理方法及处理系统 |
CN106097340A (zh) * | 2016-06-12 | 2016-11-09 | 山东大学 | 一种基于卷积分类器的自动检测并勾画肺结节所在位置的方法 |
CN107730489A (zh) * | 2017-10-09 | 2018-02-23 | 杭州电子科技大学 | 无线胶囊内窥镜小肠病变计算机辅助检测系统及检测方法 |
CN108615037A (zh) * | 2018-05-31 | 2018-10-02 | 武汉大学人民医院(湖北省人民医院) | 基于深度学习的可控胶囊内镜操作实时辅助系统及操作方法 |
CN109523522A (zh) * | 2018-10-30 | 2019-03-26 | 腾讯科技(深圳)有限公司 | 内窥镜图像的处理方法、装置、系统及存储介质 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3876190A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814655A (zh) * | 2020-07-03 | 2020-10-23 | 浙江大华技术股份有限公司 | 目标重识别方法及其网络训练方法、相关装置 |
CN111814655B (zh) * | 2020-07-03 | 2023-09-01 | 浙江大华技术股份有限公司 | 目标重识别方法及其网络训练方法、相关装置 |
CN113469959A (zh) * | 2021-06-16 | 2021-10-01 | 北京理工大学 | 基于质量缺陷成像模型的对抗训练优化方法及装置 |
CN113706526A (zh) * | 2021-10-26 | 2021-11-26 | 北京字节跳动网络技术有限公司 | 内窥镜图像特征学习模型、分类模型的训练方法和装置 |
CN113706526B (zh) * | 2021-10-26 | 2022-02-08 | 北京字节跳动网络技术有限公司 | 内窥镜图像特征学习模型、分类模型的训练方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
EP3876190A1 (en) | 2021-09-08 |
JP7214291B2 (ja) | 2023-01-30 |
CN109523522A (zh) | 2019-03-26 |
US20210052135A1 (en) | 2021-02-25 |
CN109523522B (zh) | 2023-05-09 |
US11849914B2 (en) | 2023-12-26 |
US20240081618A1 (en) | 2024-03-14 |
EP3876190A4 (en) | 2021-12-29 |
JP2021519663A (ja) | 2021-08-12 |
EP3876190B1 (en) | 2024-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020088288A1 (zh) | 内窥镜图像的处理方法、系统及计算机设备 | |
WO2020103676A1 (zh) | 图像识别方法、装置、系统及存储介质 | |
Tania et al. | Advances in automated tongue diagnosis techniques | |
Sainju et al. | Automated bleeding detection in capsule endoscopy videos using statistical features and region growing | |
CN113496489B (zh) | 内窥镜图像分类模型的训练方法、图像分类方法和装置 | |
Younas et al. | A deep ensemble learning method for colorectal polyp classification with optimized network parameters | |
WO2018120942A1 (zh) | 一种多模型融合自动检测医学图像中病变的系统及方法 | |
CN110517256B (zh) | 一种基于人工智能的早期癌辅助诊断系统 | |
Jain et al. | Detection of abnormality in wireless capsule endoscopy images using fractal features | |
CN109427060A (zh) | 一种影像识别的方法、装置、终端设备和医疗系统 | |
CN111369501B (zh) | 一种基于视觉特征识别口腔鳞状细胞癌的深度学习方法 | |
KR102407248B1 (ko) | 데이터 증대 및 이미지 분할을 활용한 딥러닝 기반 위 병변 분류시스템 | |
CN108427963B (zh) | 一种基于深度学习的黑色素瘤皮肤病的分类识别方法 | |
WO2020232374A1 (en) | Automated anatomic and regional location of disease features in colonoscopy videos | |
CN113781489B (zh) | 一种息肉影像语义分割方法及装置 | |
CN112001894B (zh) | 一种甲状腺边界平滑度检测装置 | |
CN117689949A (zh) | 一种基于少样本学习的消化道内镜图像分类算法 | |
Du et al. | Improving the classification performance of esophageal disease on small dataset by semi-supervised efficient contrastive learning | |
CN117058467B (zh) | 一种胃肠道病变类型识别方法及系统 | |
WO2022169503A1 (en) | System and method of using right and left eardrum otoscopy images for automated otoscopy image analysis to diagnose ear pathology | |
Xue et al. | A deep clustering method for analyzing uterine cervix images across imaging devices | |
CN112001896B (zh) | 一种甲状腺边界不规则度检测装置 | |
CN114972297A (zh) | 口腔健康监测方法及装置 | |
Batra et al. | A brief overview on deep learning methods for lung cancer detection using medical imaging | |
CN117994596B (zh) | 基于孪生网络的肠造口图像识别与分类系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19879131 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020560333 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2019879131 Country of ref document: EP Effective date: 20210531 |